config¶
Reading of pipeline config from a file into the data structure used to run and manipulate the pipeline’s data.
-
class
PipelineConfig
(name, pipeline_config, local_config, filename=None, variant='main', available_variants=[], log=None, all_filenames=None, module_aliases={}, local_config_sources=None, section_headings=None)[source]¶ Bases:
object
Main configuration for a pipeline, read in from a config file.
For details on how to write config files that get read by this class, see Pipeline config.
-
modules
¶ List of module names, in the order they were specified in the config file.
-
module_dependencies
¶ Dictionary mapping a module name to a list of the names of modules that it depends on for its inputs.
-
module_dependents
¶ Opposite of module_dependencies. Returns a mapping from module names to a list of modules the depend on the module.
-
get_dependent_modules
(module_name, recurse=False, exclude=[])[source]¶ Return a list of the names of modules that depend on the named module for their inputs.
If exclude is given, we don’t perform a recursive call on any of the modules in the list. For each item we recurse on, we extend the exclude list in the recursive call to include everything found so far (in other recursive calls). This avoids unnecessary recursion in complex pipelines.
If exclude=None, it is also passed through to recursive calls as None. Its default value of [] avoids excessive recursion from the top-level call, by allowing things to be added to the exclusion list for recursive calls.
Parameters: recurse – include all transitive dependents, not just those that immediately depend on the module.
-
append_module
(module_info)[source]¶ Add a moduleinfo to the end of the pipeline. This is mainly for use while loaded a pipeline from a config file.
-
get_module_schedule
()[source]¶ Work out the order in which modules should be executed. This is an ordering that respects dependencies, so that modules are executed after their dependencies, but otherwise follows the order in which modules were specified in the config.
Returns: list of module names
-
reset_all_modules
()[source]¶ Resets the execution states of all modules, restoring the output dirs as if nothing’s been run.
-
path_relative_to_config
(path)[source]¶ Get an absolute path to a file/directory that’s been specified relative to a config file (usually within the config file).
If the path is already an absolute path, doesn’t do anything.
Parameters: path – relative path Returns: absolute path
-
short_term_store
¶ For backwards compatibility: returns output path
-
long_term_store
¶ For backwards compatibility: return storage location ‘long’ if it exists, else first storage location
-
named_storage_locations
¶
-
store_names
¶
-
output_path
¶
-
static
load
(filename, local_config=None, variant='main', override_local_config={}, only_override_config=False)[source]¶ Main function that loads a pipeline from a config file.
Parameters: - filename – file to read config from
- local_config – location of local config file, where we’ll read system-wide config. Usually not specified, in which case standard locations are searched. When loading programmatically, you might want to give this
- variant – pipeline variant to load
- override_local_config – extra configuration values to override the system-wide config
- only_override_config – don’t load local config from files, just use that given in override_local_config. Used for loading test pipelines
Returns:
-
static
load_local_config
(filename=None, override={}, only_override=False)[source]¶ Load local config parameters. These are usually specified in a .pimlico file, but may be overridden by other config locations, on the command line, or elsewhere programmatically.
If only_override=True, don’t load any files, just use the values given in override. The various locations for local config files will not be checked (which usually happens when filename=None). This is not useful for normal pipeline loading, but is used for loading test pipelines.
-
static
trace_load_local_config
(filename=None, override={}, only_override=False)[source]¶ Trace the process of loading local config file(s). Follows exactly the same logic as load_local_config(), but documents what it finds/doesn’t find.
-
static
empty
(local_config=None, override_local_config={}, override_pipeline_config={}, only_override_config=False)[source]¶ Used to programmatically create an empty pipeline. It will contain no modules, but provides a gateway to system info, etc and can be used in place of a real Pimlico pipeline.
Parameters: - local_config – filename to load local config from. If not given, the default locations are searched
- override_local_config – manually override certain local config parameters. Dict of parameter values
- only_override_config – don’t load any files, just use the values given in override. The various locations for local config files will not be checked (which usually happens when filename=None). This is not useful for normal pipeline loading, but is used for loading test pipelines.
Returns: the
PipelineConfig
instance
-
find_data_path
(path, default=None)[source]¶ Given a path to a data dir/file relative to a data store, tries taking it relative to various store base dirs. If it exists in a store, that absolute path is returned. If it exists in no store, return None. If the path is already an absolute path, nothing is done to it.
Searches all the specified storage locations.
Parameters: - path – path to data, relative to store base
- default – usually, return None if no data is found. If default is given, return the path relative to the named storage location if no data is found. Special value “output” returns path relative to output location, whichever of the storage locations that might be
Returns: absolute path to data, or None if not found in any store
-
find_data_store
(path, default=None)[source]¶ Like find_data_path(), searches through storage locations to see if any of them include the data that lives at this relative path. This method returns the name of the store in which it was found.
Parameters: - path – path to data, relative to store base
- default – usually, return None if no data is found. If default is given, return the path relative to the named storage location if no data is found. Special value “output” returns path relative to output location, whichever of the storage locations that might be
Returns: name of store
-
find_data
(path, default=None)[source]¶ Given a path to a data dir/file relative to a data store, tries taking it relative to various store base dirs. If it exists in a store, that absolute path is returned. If it exists in no store, return None. If the path is already an absolute path, nothing is done to it.
Searches all the specified storage locations.
Parameters: - path – path to data, relative to store base
- default – usually, return None if no data is found. If default is given, return the path relative to the named storage location if no data is found. Special value “output” returns path relative to output location, whichever of the storage locations that might be
Returns: (store, path), where store is the name of the store used and path is absolute path to data, or None for both if not found in any store
-
get_data_search_paths
(path)[source]¶ Like find_all_data_paths(), but returns a list of all absolute paths which this data path could correspond to, whether or not they exist.
Parameters: path – relative path within Pimlico directory structures Returns: list of string
-
step
¶
-
-
exception
PipelineConfigParseError
(*args, **kwargs)[source]¶ Bases:
Exception
General problems interpreting pipeline config
-
exception
PipelineStructureError
(*args, **kwargs)[source]¶ Bases:
Exception
Fundamental structural problems in a pipeline.
-
exception
PipelineCheckError
(cause, *args, **kwargs)[source]¶ Bases:
Exception
Error in the process of explicitly checking a pipeline for problems.
-
preprocess_config_file
(filename, variant='main', initial_vars={})[source]¶ Workhorse of the initial part of config file reading. Deals with all of our custom stuff for pipeline configs, such as preprocessing directives and includes.
Parameters: - filename – file from which to read main config
- variant – name of a variant to load. The default (main) loads the main variant, which always exists
- initial_vars – variable assignments to make available for substitution. This will be added to by any vars sections that are read.
Returns: tuple: raw config dict; list of variants that could be loaded; final vars dict; list of filenames that were read, including included files; dict of docstrings for each config section
-
check_for_cycles
(pipeline)[source]¶ Basic cyclical dependency check, always run on pipeline before use.
-
check_release
(release_str)[source]¶ Check a release name against the current version of Pimlico to determine whether we meet the requirement.
-
check_pipeline
(pipeline)[source]¶ Checks a pipeline over for metadata errors, cycles, module typing errors and other problems. Called every time a pipeline is loaded, to check the whole pipeline’s metadata is in order.
Raises a
PipelineCheckError
if anything’s wrong.
-
get_dependencies
(pipeline, modules, recursive=False, sources=False)[source]¶ Get a list of software dependencies required by the subset of modules given.
If recursive=True, dependencies’ dependencies are added to the list too.
Parameters: - pipeline –
- modules – list of modules to check. If None, checks all modules