pimlico.core.config module

Reading of pipeline config from a file into the data structure used to run and manipulate the pipeline’s data.

class PipelineConfig(name, pipeline_config, local_config, filename=None, variant='main', available_variants=[], log=None, all_filenames=None, module_aliases={}, local_config_sources=None)[source]

Bases: object

Main configuration for a pipeline, read in from a config file.

For details on how to write config files that get read by this class, see Pipeline config.

modules

List of module names, in the order they were specified in the config file.

module_dependencies

Dictionary mapping a module name to a list of the names of modules that it depends on for its inputs.

module_dependents

Opposite of module_dependencies. Returns a mapping from module names to a list of modules the depend on the module.

get_dependent_modules(module_name, recurse=False, exclude=[])[source]

Return a list of the names of modules that depend on the named module for their inputs.

If exclude is given, we don’t perform a recursive call on any of the modules in the list. For each item we recurse on, we extend the exclude list in the recursive call to include everything found so far (in other recursive calls). This avoids unnecessary recursion in complex pipelines.

If exclude=None, it is also passed through to recursive calls as None. Its default value of [] avoids excessive recursion from the top-level call, by allowing things to be added to the exclusion list for recursive calls.

Parameters:recurse – include all transitive dependents, not just those that immediately depend on the module.
append_module(module_info)[source]

Add a moduleinfo to the end of the pipeline. This is mainly for use while loaded a pipeline from a config file.

get_module_schedule()[source]

Work out the order in which modules should be executed. This is an ordering that respects dependencies, so that modules are executed after their dependencies, but otherwise follows the order in which modules were specified in the config.

Returns:list of module names
reset_all_modules()[source]

Resets the execution states of all modules, restoring the output dirs as if nothing’s been run.

path_relative_to_config(path)[source]

Get an absolute path to a file/directory that’s been specified relative to a config file (usually within the config file).

Parameters:path – relative path
Returns:absolute path
static load(filename, local_config=None, variant='main', override_local_config={})[source]

Main function that loads a pipeline from a config file.

Parameters:
  • filename – file to read config from
  • local_config – location of local config file, where we’ll read system-wide config. Usually not specified, in which case standard locations are searched. When loading programmatically, you might want to give this
  • variant – pipeline variant to load
  • override_local_config – extra configuration values to override the system-wide config
Returns:

static load_local_config(filename=None, override={})[source]

Load local config parameters. These are usually specified in a .pimlico file, but may be overridden by other config locations, on the command line, or elsewhere programmatically.

static empty(local_config=None, override_local_config={}, override_pipeline_config={})[source]

Used to programmatically create an empty pipeline. It will contain no modules, but provides a gateway to system info, etc and can be used in place of a real Pimlico pipeline.

Parameters:
  • local_config – filename to load local config from. If not given, the default locations are searched
  • override_local_config – manually override certain local config parameters. Dict of parameter values
Returns:

the PipelineConfig instance

find_data_path(path, default=None)[source]

Given a path to a data dir/file relative to a data store, tries taking it relative to various store base dirs. If it exists in a store, that absolute path is returned. If it exists in no store, return None. If the path is already an absolute path, nothing is done to it.

The stores searched are the long-term store and the short-term store, though in the future more valid data storage locations may be added.

Parameters:
  • path – path to data, relative to store base
  • default – usually, return None if no data is found. If default=”short”, return path relative to short-term store in this case. If default=”long”, long-term store.
Returns:

absolute path to data, or None if not found in any store

find_all_data_paths(path)[source]
get_data_search_paths(path)[source]

Like find_all_data_paths(), but returns a list of all absolute paths which this data path could correspond to, whether or not they exist.

Parameters:path – relative path within Pimlico directory structures
Returns:list of string
get_storage_roots()[source]

Returns a list of all the (pipeline-specific) storage root locations known to the pipeline.

Currently, this is always [self.short_term_store, self.long_term_store], but in future we may have a more flexible system that allows an unbounded number of storage locations.

step
enable_step()[source]

Enable super-verbose, interactive step mode.

::seealso:

Module :mod:pimlico.cli.debug
   The debug module defines the behaviour of step mode.
exception PipelineConfigParseError(*args, **kwargs)[source]

Bases: exceptions.Exception

General problems interpreting pipeline config

exception PipelineStructureError[source]

Bases: exceptions.Exception

Fundamental structural problems in a pipeline.

exception PipelineCheckError(cause, *args, **kwargs)[source]

Bases: exceptions.Exception

Error in the process of explicitly checking a pipeline for problems.

preprocess_config_file(filename, variant='main', initial_vars={})[source]

Workhorse of the initial part of config file reading. Deals with all of our custom stuff for pipeline configs, such as preprocessing directives and includes.

Parameters:
  • filename – file from which to read main config
  • variant – name of a variant to load. The default (main) loads the main variant, which always exists
  • initial_vars – variable assignments to make available for substitution. This will be added to by any vars sections that are read.
Returns:

tuple: raw config dict; list of variants that could be loaded; final vars dict; list of filenames that were read, including included files; dict of docstrings for each config section

check_for_cycles(pipeline)[source]

Basic cyclical dependency check, always run on pipeline before use.

check_release(release_str)[source]

Check a release name against the current version of Pimlico to determine whether we meet the requirement.

check_pipeline(pipeline)[source]

Checks a pipeline over for metadata errors, cycles, module typing errors and other problems. Called every time a pipeline is loaded, to check the whole pipeline’s metadata is in order.

Raises a PipelineCheckError if anything’s wrong.

get_dependencies(pipeline, modules, recursive=False, sources=False)[source]

Get a list of software dependencies required by the subset of modules given.

If recursive=True, dependencies’ dependencies are added to the list too.

Parameters:
  • pipeline
  • modules – list of modules to check. If None, checks all modules
print_missing_dependencies(pipeline, modules)[source]

Check runtime dependencies for a subset of modules and output a table of missing dependencies.

Parameters:
  • pipeline
  • modules – list of modules to check. If None, checks all modules
Returns:

True if no missing dependencies, False otherwise

print_dependency_leaf_problems(dep, local_config)[source]