config

Reading of pipeline config from a file into the data structure used to run and manipulate the pipeline’s data.

class PipelineConfig(name, pipeline_config, local_config, filename=None, variant='main', available_variants=[], log=None, all_filenames=None, module_aliases={}, local_config_sources=None, section_headings=None)[source]

Bases: object

Main configuration for a pipeline, read in from a config file.

For details on how to write config files that get read by this class, see Pipeline config.

modules

List of module names, in the order they were specified in the config file.

module_dependencies

Dictionary mapping a module name to a list of the names of modules that it depends on for its inputs.

module_dependents

Opposite of module_dependencies. Returns a mapping from module names to a list of modules the depend on the module.

get_dependent_modules(module_name, recurse=False, exclude=[])[source]

Return a list of the names of modules that depend on the named module for their inputs.

If exclude is given, we don’t perform a recursive call on any of the modules in the list. For each item we recurse on, we extend the exclude list in the recursive call to include everything found so far (in other recursive calls). This avoids unnecessary recursion in complex pipelines.

If exclude=None, it is also passed through to recursive calls as None. Its default value of [] avoids excessive recursion from the top-level call, by allowing things to be added to the exclusion list for recursive calls.

Parameters:recurse – include all transitive dependents, not just those that immediately depend on the module.
append_module(module_info)[source]

Add a moduleinfo to the end of the pipeline. This is mainly for use while loaded a pipeline from a config file.

get_module_schedule()[source]

Work out the order in which modules should be executed. This is an ordering that respects dependencies, so that modules are executed after their dependencies, but otherwise follows the order in which modules were specified in the config.

Returns:list of module names
reset_all_modules()[source]

Resets the execution states of all modules, restoring the output dirs as if nothing’s been run.

path_relative_to_config(path)[source]

Get an absolute path to a file/directory that’s been specified relative to a config file (usually within the config file).

If the path is already an absolute path, doesn’t do anything.

Parameters:path – relative path
Returns:absolute path
short_term_store

For backwards compatibility: returns output path

long_term_store

For backwards compatibility: return storage location ‘long’ if it exists, else first storage location

named_storage_locations
store_names
output_path
static load(filename, local_config=None, variant='main', override_local_config={}, only_override_config=False)[source]

Main function that loads a pipeline from a config file.

Parameters:
  • filename – file to read config from
  • local_config – location of local config file, where we’ll read system-wide config. Usually not specified, in which case standard locations are searched. When loading programmatically, you might want to give this
  • variant – pipeline variant to load
  • override_local_config – extra configuration values to override the system-wide config
  • only_override_config – don’t load local config from files, just use that given in override_local_config. Used for loading test pipelines
Returns:

static load_local_config(filename=None, override={}, only_override=False)[source]

Load local config parameters. These are usually specified in a .pimlico file, but may be overridden by other config locations, on the command line, or elsewhere programmatically.

If only_override=True, don’t load any files, just use the values given in override. The various locations for local config files will not be checked (which usually happens when filename=None). This is not useful for normal pipeline loading, but is used for loading test pipelines.

static trace_load_local_config(filename=None, override={}, only_override=False)[source]

Trace the process of loading local config file(s). Follows exactly the same logic as load_local_config(), but documents what it finds/doesn’t find.

static empty(local_config=None, override_local_config={}, override_pipeline_config={}, only_override_config=False)[source]

Used to programmatically create an empty pipeline. It will contain no modules, but provides a gateway to system info, etc and can be used in place of a real Pimlico pipeline.

Parameters:
  • local_config – filename to load local config from. If not given, the default locations are searched
  • override_local_config – manually override certain local config parameters. Dict of parameter values
  • only_override_config – don’t load any files, just use the values given in override. The various locations for local config files will not be checked (which usually happens when filename=None). This is not useful for normal pipeline loading, but is used for loading test pipelines.
Returns:

the PipelineConfig instance

find_data_path(path, default=None)[source]

Given a path to a data dir/file relative to a data store, tries taking it relative to various store base dirs. If it exists in a store, that absolute path is returned. If it exists in no store, return None. If the path is already an absolute path, nothing is done to it.

Searches all the specified storage locations.

Parameters:
  • path – path to data, relative to store base
  • default – usually, return None if no data is found. If default is given, return the path relative to the named storage location if no data is found. Special value “output” returns path relative to output location, whichever of the storage locations that might be
Returns:

absolute path to data, or None if not found in any store

find_data_store(path, default=None)[source]

Like find_data_path(), searches through storage locations to see if any of them include the data that lives at this relative path. This method returns the name of the store in which it was found.

Parameters:
  • path – path to data, relative to store base
  • default – usually, return None if no data is found. If default is given, return the path relative to the named storage location if no data is found. Special value “output” returns path relative to output location, whichever of the storage locations that might be
Returns:

name of store

find_data(path, default=None)[source]

Given a path to a data dir/file relative to a data store, tries taking it relative to various store base dirs. If it exists in a store, that absolute path is returned. If it exists in no store, return None. If the path is already an absolute path, nothing is done to it.

Searches all the specified storage locations.

Parameters:
  • path – path to data, relative to store base
  • default – usually, return None if no data is found. If default is given, return the path relative to the named storage location if no data is found. Special value “output” returns path relative to output location, whichever of the storage locations that might be
Returns:

(store, path), where store is the name of the store used and path is absolute path to data, or None for both if not found in any store

get_data_search_paths(path)[source]

Like find_all_data_paths(), but returns a list of all absolute paths which this data path could correspond to, whether or not they exist.

Parameters:path – relative path within Pimlico directory structures
Returns:list of string
step
enable_step()[source]

Enable super-verbose, interactive step mode.

::seealso:

Module :mod:pimlico.cli.debug
   The debug module defines the behaviour of step mode.
exception PipelineConfigParseError(*args, **kwargs)[source]

Bases: Exception

General problems interpreting pipeline config

exception PipelineStructureError(*args, **kwargs)[source]

Bases: Exception

Fundamental structural problems in a pipeline.

exception PipelineCheckError(cause, *args, **kwargs)[source]

Bases: Exception

Error in the process of explicitly checking a pipeline for problems.

preprocess_config_file(filename, variant='main', initial_vars={})[source]

Workhorse of the initial part of config file reading. Deals with all of our custom stuff for pipeline configs, such as preprocessing directives and includes.

Parameters:
  • filename – file from which to read main config
  • variant – name of a variant to load. The default (main) loads the main variant, which always exists
  • initial_vars – variable assignments to make available for substitution. This will be added to by any vars sections that are read.
Returns:

tuple: raw config dict; list of variants that could be loaded; final vars dict; list of filenames that were read, including included files; dict of docstrings for each config section

check_for_cycles(pipeline)[source]

Basic cyclical dependency check, always run on pipeline before use.

check_release(release_str)[source]

Check a release name against the current version of Pimlico to determine whether we meet the requirement.

check_pipeline(pipeline)[source]

Checks a pipeline over for metadata errors, cycles, module typing errors and other problems. Called every time a pipeline is loaded, to check the whole pipeline’s metadata is in order.

Raises a PipelineCheckError if anything’s wrong.

get_dependencies(pipeline, modules, recursive=False, sources=False)[source]

Get a list of software dependencies required by the subset of modules given.

If recursive=True, dependencies’ dependencies are added to the list too.

Parameters:
  • pipeline
  • modules – list of modules to check. If None, checks all modules
print_missing_dependencies(pipeline, modules)[source]

Check runtime dependencies for a subset of modules and output a table of missing dependencies.

Parameters:
  • pipeline
  • modules – list of modules to check. If None, checks all modules
Returns:

True if no missing dependencies, False otherwise

print_dependency_leaf_problems(dep, local_config)[source]