locan.locan_io.files.Files¶
- class locan.locan_io.files.Files(df=None, directory=None, exists=True, column='file_path')[source]¶
Bases:
objectWrapper for a pandas.DataFrame with selected methods to identify, match, and group file paths.
Note
Iteration and indexing is implemented in a way that integer indexing or iterating over the Files instance returns a single row (as Series or namedtuple). Slice indexing returns a new Files instance with selected rows.
- Parameters:
df (pd.DataFrame | dict[str, str] | None) – file names
directory (str | os.PathLike[Any] | None) – base directory
exists (bool) – raise FileExistsError if file in df does not exist
column (str) – key/column in df from which to take a file list
- Variables:
df (pd.DataFrame) – dataframe carrying file paths
directory (Path) – base directory
Methods
__init__([df, directory, exists, column])add_glob([pattern, regex, column])Search for file paths using glob and/or regex pattern in base directory and provide files in new column.
concatenate([files, directory, exists])Concatenate the file lists from multiple File instances and set the base directory without further action.
exclude([stoplist, column, column_stoplist])Exclude files in self.df.column according to stoplist.
from_glob([directory, pattern, regex, column])Instantiate Files from a search with glob and/or regex patterns.
from_path([files, directory, column])Instantiate Files from a collection of file paths.
Get categories defined in self.df.group.
grouped()Get groupby instance based on group_identifiers.
match_file_upstream([column, pattern, ...])Find a matching file by applying
locan.find_file_upstream()on each file in self.df[column].match_files(files[, column, other_column])Add files in new column.
Print summary of Files.
set_group_identifier([name, pattern, glob, ...])Set group_identifier name for files in column as identified by string pattern and/or glob pattern and/or regex and keep them in column "group".
- add_glob(pattern='*.txt', regex=None, column='other_file_path')[source]¶
Search for file paths using glob and/or regex pattern in base directory and provide files in new column.
A logging.warning is given if the number of found files and those in self.df are different.
- Parameters:
pattern (
str|None) – glob pattern passed toPath.glob()regex (
str|None) – regex pattern passed tore.search()and applied in addition to glob patterncolumn (str) – Name of column in Files.df carrying these files
- Return type:
Self
- classmethod concatenate(files=None, directory=None, exists=True)[source]¶
Concatenate the file lists from multiple File instances and set the base directory without further action.
- exclude(stoplist=None, column='file_path', column_stoplist='file_path')[source]¶
Exclude files in self.df.column according to stoplist.
- Parameters:
stoplist (
Files|Iterable[bool|str|PathLike[Any]] |None) – Files to be excludedcolumn (
str) – key/column in df from which to exclude filescolumn_stoplist (
str) – key/column in stoplist from which to take files
- Return type:
Self
- classmethod from_glob(directory=None, pattern='*.txt', regex=None, column='file_path')[source]¶
Instantiate Files from a search with glob and/or regex patterns.
- Parameters:
pattern (
str) – glob pattern passed toPath.glob()regex (
str|None) – regex pattern passed tore.search()and applied in addition to glob patterndirectory (
str|PathLike[Any] |None) – new base directory in which to searchcolumn (
str) – Name of column in Files.df carrying these files
- Return type:
- classmethod from_path(files=None, directory=None, column='file_path')[source]¶
Instantiate Files from a collection of file paths.
- Parameters:
files (
Sequence[str|PathLike[Any]] |str|PathLike[Any] |None) – sequence with File instancesdirectory (
str|PathLike[Any] |None) – new base directorycolumn (
str) – Name of column in Files.df carrying these files
- Return type:
- grouped()[source]¶
Get groupby instance based on group_identifiers.
- Return type:
pandas.api.typing.DataFrameGroupBy
- match_file_upstream(column='file_path', pattern='*.toml', regex=None, directory=None, other_column='metadata')[source]¶
Find a matching file by applying
locan.find_file_upstream()on each file in self.df[column].- Parameters:
column (
str) – Name of column in Files.df carrying files to matchpattern (
str|None) – glob pattern passed toPath.glob()regex (
str|None) – regex pattern passed tore.search()and applied in addition to glob patterndirectory (
str|PathLike[Any] |None) – top directory in which to searchother_column (
str) – Name of new column carrying files
- Return type:
Self
- match_files(files, column='file_path', other_column='other_file_path')[source]¶
Add files in new column.
A logging.warning is given if the number of files and those in self.df are different.
- Parameters:
files – New file list
column – Name of column in Files.df carrying files to match
other_column – Name of new column carrying files
- Return type:
Self
- set_group_identifier(name=None, pattern=None, glob=None, regex=None, column='file_path')[source]¶
Set group_identifier name for files in column as identified by string pattern and/or glob pattern and/or regex and keep them in column “group”.
- Parameters:
name (
str|None) – new group_identifierpattern (
str|None) – string patternglob (
str|None) – glob pattern passed toPath.match()regex (
str|None) – regex patterncolumn (
str) – Name of column in Files.df carrying files to match
- Return type:
Self