files¶
File collections and files.
There used to be an UnnamedFileCollection, which has been removed in the move to the new datatype system. It used to be used mostly for input datatypes, which don’t exist any more. There may still be a use for this, though, so I may be added in future.
-
class
NamedFileCollection
(*args, **kwargs)[source]¶ Bases:
pimlico.datatypes.base.PimlicoDatatype
Datatypes that stores a fixed collection of files, which have fixed names (or at least names that can be determined from the class). Very many datatypes fall into this category. Overriding this base class provides them with some common functionality, including the possibility of creating a union of multiple datatypes.
The datatype option
filenames
should specify a list of filenames contained by the datatype. For typechecking, the provided type must have at least all the filenames of the type requirement, though it may include more.All files are contained in the datatypes data directory. If files are stored in subdirectories, this may be specified in the list of filenames using
/
s. (Always use forward slashes, regardless of the operating system.)-
datatype_name
= 'named_file_collection'¶
-
datatype_options
= {'filenames': {'default': [], 'help': 'Filenames contained in the collection', 'type': <function comma_separated_list.<locals>._fn>}}¶
-
datatype_supports_python2
= True¶
-
check_type
(supplied_type)[source]¶ Method used by datatype type-checking algorithm to determine whether a supplied datatype (given as an instance of a subclass of PimlicoDatatype) is compatible with the present datatype, which is being treated as a type requirement.
Typically, the present class is a type requirement on a module input and supplied_type is the type provided by a previous module’s output.
The default implementation simply checks whether supplied_type is a subclass of the present class. Subclasses may wish to impose different or additional checks.
Parameters: supplied_type – type provided where the present class is required, or datatype instance Returns: True if the check is successful, False otherwise
-
browse_file
(reader, filename)[source]¶ Return text for a particular file in the collection to show in the browser. By default, just reads in the file’s data and returns it, but subclasses might want to override this (perhaps conditioned on the filename) to format the data readably.
Parameters: - reader –
- filename –
Returns: file data to show
-
run_browser
(reader, opts)[source]¶ All NamedFileCollections provide a browser that just lets you see a list of the files and view them, in the case of text files.
Subclasses may override the way individual files are shown by overriding browse_file().
-
class
Reader
(datatype, setup, pipeline, module=None)[source]¶ Bases:
pimlico.datatypes.base.Reader
Reader class for NamedFileCollection
-
class
Setup
(datatype, data_paths)[source]¶ Bases:
pimlico.datatypes.base.Setup
Setup class for NamedFileCollection.Reader
-
get_required_paths
()[source]¶ May be overridden by subclasses to provide a list of paths (absolute, or relative to the data dir) that must exist for the data to be considered ready.
-
reader_type
¶ alias of
NamedFileCollection.Reader
-
-
process_setup
()[source]¶ Do any processing of the setup object (e.g. retrieving values and setting attributes on the reader) that should be done when the reader is instantiated.
-
absolute_paths
¶
-
absolute_filenames
¶ For backwards compatibility: use absolute_paths by preference
-
read_file
(filename=None, mode='r', text=False)[source]¶ Read a file from the collection.
Parameters: - filename – string filename, which should be one of the filenames specified for this collection; or an integer, in which case the ith file in the collection is read. If not given, the first file is read
- mode –
- text – if True, the file is treated as utf-8-encoded text and a unicode object is returned. Otherwise, a bytes object is returned.
Returns:
-
class
-
class
Writer
(*args, **kwargs)[source]¶ Bases:
pimlico.datatypes.base.Writer
Writer class for NamedFileCollection
-
write_file
(filename, data, text=False)[source]¶ If text=True, the data is expected to be unicode and is encoded as utf-8. Otherwise, data should be a bytes object.
-
file_written
(filename)[source]¶ Mark the given file as having been written, if write_file() was not used to write it.
-
absolute_paths
¶
-
metadata_defaults
= {}¶
-
writer_param_defaults
= {}¶
-
-
-
class
NamedFile
(*args, **kwargs)[source]¶ Bases:
pimlico.datatypes.files.NamedFileCollection
Like NamedFileCollection, but always has exactly one file.
The filename is given as the filename datatype option, which can also be given as the first init arg: NamedFile(“myfile.txt”).
Since NamedFile is a subtype of NamedFileCollection, it also has a “filenames” option. It is ignored if the filename option is given, and otherwise must have exactly one item.
-
datatype_name
= 'named_file'¶
-
datatype_options
= {'filename': {'help': "The file's name"}, 'filenames': {'default': [], 'help': 'Filenames contained in the collection', 'type': <function comma_separated_list.<locals>._fn>}}¶
-
datatype_supports_python2
= True¶
-
class
Reader
(datatype, setup, pipeline, module=None)[source]¶ Bases:
pimlico.datatypes.files.Reader
Reader class for NamedFile
-
process_setup
()[source]¶ Do any processing of the setup object (e.g. retrieving values and setting attributes on the reader) that should be done when the reader is instantiated.
-
absolute_path
¶
-
class
Setup
(datatype, data_paths)¶ Bases:
pimlico.datatypes.files.Setup
Setup class for NamedFile.Reader
-
get_required_paths
()¶ May be overridden by subclasses to provide a list of paths (absolute, or relative to the data dir) that must exist for the data to be considered ready.
-
reader_type
¶ alias of
NamedFile.Reader
-
-
-
class
Writer
(*args, **kwargs)[source]¶ Bases:
pimlico.datatypes.files.Writer
Writer class for NamedFile
-
write_file
(data, text=False)[source]¶ If text=True, the data is expected to be unicode and is encoded as utf-8. Otherwise, data should be a bytes object.
-
absolute_path
¶
-
metadata_defaults
= {}¶
-
writer_param_defaults
= {}¶
-
-
-
class
FilesInput
(min_files=1)[source]¶ Bases:
pimlico.datatypes.base.DynamicInputDatatypeRequirement
-
datatype_doc_info
= 'A file collection containing at least one file (or a given specific number). No constraint is put on the name of the file(s). Typically, the module will just use whatever the first file(s) in the collection is'¶
-
-
FileInput
¶ alias of
pimlico.datatypes.files.FilesInput
-
class
TextFile
(*args, **kwargs)[source]¶ Bases:
pimlico.datatypes.files.NamedFile
Simple dataset containing just a single utf-8 encoded text file.
-
datatype_name
= 'text_document'¶
-
datatype_options
= {'filename': {'default': 'data.txt', 'help': "The file's name. Typically left as the default. Default: data.txt"}, 'filenames': {'default': [], 'help': 'Filenames contained in the collection', 'type': <function comma_separated_list.<locals>._fn>}}¶
-
datatype_supports_python2
= True¶
-
class
Reader
(datatype, setup, pipeline, module=None)[source]¶ Bases:
pimlico.datatypes.files.Reader
Reader class for TextFile
-
read_file
(filename=None, mode='r', text=False)[source]¶ Read a file from the collection.
Parameters: - filename – string filename, which should be one of the filenames specified for this collection; or an integer, in which case the ith file in the collection is read. If not given, the first file is read
- mode –
- text – if True, the file is treated as utf-8-encoded text and a unicode object is returned. Otherwise, a bytes object is returned.
Returns:
-
class
Setup
(datatype, data_paths)¶ Bases:
pimlico.datatypes.files.Setup
Setup class for TextFile.Reader
-
get_required_paths
()¶ May be overridden by subclasses to provide a list of paths (absolute, or relative to the data dir) that must exist for the data to be considered ready.
-
reader_type
¶ alias of
TextFile.Reader
-
-
-