files

File collections and files.

There used to be an UnnamedFileCollection, which has been removed in the move to the new datatype system. It used to be used mostly for input datatypes, which don’t exist any more. There may still be a use for this, though, so I may be added in future.

class NamedFileCollection(*args, **kwargs)[source]

Bases: pimlico.datatypes.base.PimlicoDatatype

Datatypes that stores a fixed collection of files, which have fixed names (or at least names that can be determined from the class). Very many datatypes fall into this category. Overriding this base class provides them with some common functionality, including the possibility of creating a union of multiple datatypes.

The datatype option filenames should specify a list of filenames contained by the datatype. For typechecking, the provided type must have at least all the filenames of the type requirement, though it may include more.

All files are contained in the datatypes data directory. If files are stored in subdirectories, this may be specified in the list of filenames using / s. (Always use forward slashes, regardless of the operating system.)

datatype_name = 'named_file_collection'
datatype_options = {'filenames': {'default': [], 'help': 'Filenames contained in the collection', 'type': <function comma_separated_list.<locals>._fn>}}
datatype_supports_python2 = True
check_type(supplied_type)[source]

Method used by datatype type-checking algorithm to determine whether a supplied datatype (given as an instance of a subclass of PimlicoDatatype) is compatible with the present datatype, which is being treated as a type requirement.

Typically, the present class is a type requirement on a module input and supplied_type is the type provided by a previous module’s output.

The default implementation simply checks whether supplied_type is a subclass of the present class. Subclasses may wish to impose different or additional checks.

Parameters:supplied_type – type provided where the present class is required, or datatype instance
Returns:True if the check is successful, False otherwise
browse_file(reader, filename)[source]

Return text for a particular file in the collection to show in the browser. By default, just reads in the file’s data and returns it, but subclasses might want to override this (perhaps conditioned on the filename) to format the data readably.

Parameters:
  • reader
  • filename
Returns:

file data to show

run_browser(reader, opts)[source]

All NamedFileCollections provide a browser that just lets you see a list of the files and view them, in the case of text files.

Subclasses may override the way individual files are shown by overriding browse_file().

class Reader(datatype, setup, pipeline, module=None)[source]

Bases: pimlico.datatypes.base.Reader

Reader class for NamedFileCollection

class Setup(datatype, data_paths)[source]

Bases: pimlico.datatypes.base.Setup

Setup class for NamedFileCollection.Reader

get_required_paths()[source]

May be overridden by subclasses to provide a list of paths (absolute, or relative to the data dir) that must exist for the data to be considered ready.

reader_type

alias of NamedFileCollection.Reader

process_setup()[source]

Do any processing of the setup object (e.g. retrieving values and setting attributes on the reader) that should be done when the reader is instantiated.

get_absolute_path(filename)[source]
absolute_paths
absolute_filenames

For backwards compatibility: use absolute_paths by preference

read_file(filename=None, mode='r', text=False)[source]

Read a file from the collection.

Parameters:
  • filename – string filename, which should be one of the filenames specified for this collection; or an integer, in which case the ith file in the collection is read. If not given, the first file is read
  • mode
  • text – if True, the file is treated as utf-8-encoded text and a unicode object is returned. Otherwise, a bytes object is returned.
Returns:

read_files(mode='r', text=False)[source]
open_file(filename=None, mode='r')[source]
class Writer(*args, **kwargs)[source]

Bases: pimlico.datatypes.base.Writer

Writer class for NamedFileCollection

write_file(filename, data, text=False)[source]

If text=True, the data is expected to be unicode and is encoded as utf-8. Otherwise, data should be a bytes object.

file_written(filename)[source]

Mark the given file as having been written, if write_file() was not used to write it.

open_file(filename=None)[source]
get_absolute_path(filename=None)[source]
absolute_paths
metadata_defaults = {}
writer_param_defaults = {}
class NamedFile(*args, **kwargs)[source]

Bases: pimlico.datatypes.files.NamedFileCollection

Like NamedFileCollection, but always has exactly one file.

The filename is given as the filename datatype option, which can also be given as the first init arg: NamedFile(“myfile.txt”).

Since NamedFile is a subtype of NamedFileCollection, it also has a “filenames” option. It is ignored if the filename option is given, and otherwise must have exactly one item.

datatype_name = 'named_file'
datatype_options = {'filename': {'help': "The file's name"}, 'filenames': {'default': [], 'help': 'Filenames contained in the collection', 'type': <function comma_separated_list.<locals>._fn>}}
datatype_supports_python2 = True
class Reader(datatype, setup, pipeline, module=None)[source]

Bases: pimlico.datatypes.files.Reader

Reader class for NamedFile

process_setup()[source]

Do any processing of the setup object (e.g. retrieving values and setting attributes on the reader) that should be done when the reader is instantiated.

absolute_path
class Setup(datatype, data_paths)

Bases: pimlico.datatypes.files.Setup

Setup class for NamedFile.Reader

get_required_paths()

May be overridden by subclasses to provide a list of paths (absolute, or relative to the data dir) that must exist for the data to be considered ready.

reader_type

alias of NamedFile.Reader

class Writer(*args, **kwargs)[source]

Bases: pimlico.datatypes.files.Writer

Writer class for NamedFile

write_file(data, text=False)[source]

If text=True, the data is expected to be unicode and is encoded as utf-8. Otherwise, data should be a bytes object.

absolute_path
metadata_defaults = {}
writer_param_defaults = {}
class FilesInput(min_files=1)[source]

Bases: pimlico.datatypes.base.DynamicInputDatatypeRequirement

datatype_doc_info = 'A file collection containing at least one file (or a given specific number). No constraint is put on the name of the file(s). Typically, the module will just use whatever the first file(s) in the collection is'
check_type(supplied_type)[source]
FileInput

alias of pimlico.datatypes.files.FilesInput

class TextFile(*args, **kwargs)[source]

Bases: pimlico.datatypes.files.NamedFile

Simple dataset containing just a single utf-8 encoded text file.

datatype_name = 'text_document'
datatype_options = {'filename': {'default': 'data.txt', 'help': "The file's name. Typically left as the default. Default: data.txt"}, 'filenames': {'default': [], 'help': 'Filenames contained in the collection', 'type': <function comma_separated_list.<locals>._fn>}}
datatype_supports_python2 = True
class Reader(datatype, setup, pipeline, module=None)[source]

Bases: pimlico.datatypes.files.Reader

Reader class for TextFile

read_file(filename=None, mode='r', text=False)[source]

Read a file from the collection.

Parameters:
  • filename – string filename, which should be one of the filenames specified for this collection; or an integer, in which case the ith file in the collection is read. If not given, the first file is read
  • mode
  • text – if True, the file is treated as utf-8-encoded text and a unicode object is returned. Otherwise, a bytes object is returned.
Returns:

class Setup(datatype, data_paths)

Bases: pimlico.datatypes.files.Setup

Setup class for TextFile.Reader

get_required_paths()

May be overridden by subclasses to provide a list of paths (absolute, or relative to the data dir) that must exist for the data to be considered ready.

reader_type

alias of TextFile.Reader

class Writer(*args, **kwargs)[source]

Bases: pimlico.datatypes.files.Writer

Writer class for TextFile

metadata_defaults = {}
writer_param_defaults = {}
write_file(data, text=False)[source]

If text=True, the data is expected to be unicode and is encoded as utf-8. Otherwise, data should be a bytes object.