locan.locan_io.files.Files#
- class locan.locan_io.files.Files(df=None, directory=None, exists=True, column='file_path')[source]#
Bases:
object
Wrapper for a pandas.DataFrame with selected methods to identify, match, and group file paths.
Note
Iteration and indexing is implemented in a way that integer indexing or iterating over the Files instance returns a single row (as Series or namedtuple). Slice indexing returns a new Files instance with selected rows.
- Parameters:
df (pd.DataFrame | dict[str, str] | None) – file names
directory (str | os.PathLike[Any] | None) – base directory
exists (bool) – raise FileExistsError if file in df does not exist
column (str) – key/column in df from which to take a file list
- Variables:
df (pd.DataFrame) – dataframe carrying file paths
directory (Path) – base directory
Methods
__init__
([df, directory, exists, column])add_glob
([pattern, regex, column])Search for file paths using glob and/or regex pattern in base directory and provide files in new column.
concatenate
([files, directory, exists])Concatenate the file lists from multiple File instances and set the base directory without further action.
exclude
([stoplist, column, column_stoplist])Exclude files in self.df.column according to stoplist.
from_glob
([directory, pattern, regex, column])Instantiate Files from a search with glob and/or regex patterns.
from_path
([files, directory, column])Instantiate Files from a collection of file paths.
Get categories defined in self.df.group.
grouped
()Get groupby instance based on group_identifiers.
match_file_upstream
([column, pattern, ...])Find a matching file by applying
locan.find_file_upstream()
on each file in self.df[column].match_files
(files[, column, other_column])Add files in new column.
Print summary of Files.
set_group_identifier
([name, pattern, glob, ...])Set group_identifier name for files in column as identified by string pattern and/or glob pattern and/or regex and keep them in column "group".
- add_glob(pattern='*.txt', regex=None, column='other_file_path')[source]#
Search for file paths using glob and/or regex pattern in base directory and provide files in new column.
A logging.warning is given if the number of found files and those in self.df are different.
- Parameters:
pattern (
Optional
[str
]) – glob pattern passed toPath.glob()
regex (
Optional
[str
]) – regex pattern passed tore.search()
and applied in addition to glob patterncolumn (str) – Name of column in Files.df carrying these files
- Return type:
Self
- classmethod concatenate(files=None, directory=None, exists=True)[source]#
Concatenate the file lists from multiple File instances and set the base directory without further action.
- exclude(stoplist=None, column='file_path', column_stoplist='file_path')[source]#
Exclude files in self.df.column according to stoplist.
- Parameters:
stoplist (
UnionType
[Files
,Iterable
[bool
|str
|PathLike
[Any
]],None
]) – Files to be excludedcolumn (
str
) – key/column in df from which to exclude filescolumn_stoplist (
str
) – key/column in stoplist from which to take files
- Return type:
Self
- classmethod from_glob(directory=None, pattern='*.txt', regex=None, column='file_path')[source]#
Instantiate Files from a search with glob and/or regex patterns.
- Parameters:
pattern (
str
) – glob pattern passed toPath.glob()
regex (
Optional
[str
]) – regex pattern passed tore.search()
and applied in addition to glob patterndirectory (
UnionType
[str
,PathLike
[Any
],None
]) – new base directory in which to searchcolumn (
str
) – Name of column in Files.df carrying these files
- Return type:
- classmethod from_path(files=None, directory=None, column='file_path')[source]#
Instantiate Files from a collection of file paths.
- Parameters:
files (
UnionType
[Sequence
[str
|PathLike
[Any
]],str
,PathLike
[Any
],None
]) – sequence with File instancesdirectory (
UnionType
[str
,PathLike
[Any
],None
]) – new base directorycolumn (
str
) – Name of column in Files.df carrying these files
- Return type:
- grouped()[source]#
Get groupby instance based on group_identifiers.
- Return type:
pandas.core.groupby.DataFrameGroupBy
- match_file_upstream(column='file_path', pattern='*.toml', regex=None, directory=None, other_column='metadata')[source]#
Find a matching file by applying
locan.find_file_upstream()
on each file in self.df[column].- Parameters:
column (
str
) – Name of column in Files.df carrying files to matchpattern (
Optional
[str
]) – glob pattern passed toPath.glob()
regex (
Optional
[str
]) – regex pattern passed tore.search()
and applied in addition to glob patterndirectory (
UnionType
[str
,PathLike
[Any
],None
]) – top directory in which to searchother_column (
str
) – Name of new column carrying files
- Return type:
Self
- match_files(files, column='file_path', other_column='other_file_path')[source]#
Add files in new column.
A logging.warning is given if the number of files and those in self.df are different.
- Parameters:
files – New file list
column – Name of column in Files.df carrying files to match
other_column – Name of new column carrying files
- Return type:
Self
- set_group_identifier(name=None, pattern=None, glob=None, regex=None, column='file_path')[source]#
Set group_identifier name for files in column as identified by string pattern and/or glob pattern and/or regex and keep them in column “group”.
- Parameters:
name (
Optional
[str
]) – new group_identifierpattern (
Optional
[str
]) – string patternglob (
Optional
[str
]) – glob pattern passed toPath.match()
regex (
Optional
[str
]) – regex patterncolumn (
str
) – Name of column in Files.df carrying files to match
- Return type:
Self