pimlico.core.config module

Reading of various types of config files, in particular a pipeline config.

exception pimlico.core.config.PipelineCheckError(cause, *args, **kwargs)[source]

Bases: exceptions.Exception

exception pimlico.core.config.PipelineConfigParseError[source]

Bases: exceptions.Exception

exception pimlico.core.config.PipelineStructureError[source]

Bases: exceptions.Exception

class pimlico.core.config.PipelineConfig(name, pipeline_config, local_config, raw_module_configs, module_order, filename=None, variant='main', available_variants=[], log=None, all_filenames=None)[source]

Bases: object

Main configuration for a pipeline, read in from a config file.

Each section, except for vars and pipeline, defines a module instance in the pipeline. Some of these can be executed, others act as filters on the outputs of other modules, or input readers.

Special sections:

  • vars:

    May contain any variable definitions, to be used later on in the pipeline. Further down, expressions like %(varname)s will be expanded into the value assigned to varname in the vars section.

  • pipeline:

    Main pipeline-wide configuration. The following options are required for every pipeline:

    • name: a single-word name for the pipeline, used to determine where files are stored
    • release: the release of Pimlico for which the config file was written. It is considered compatible with later minor versions of the same major release, but not with later major releases. Typically, a user receiving the pipeline config will get hold of an appropriate version of the Pimlico codebase to run it with.

    Other optional settings:

    • python_path: a path or paths, relative to the directory containing the config file, in which Python modules/packages used by the pipeline can be found. Typically, a config file is distributed with a directory of Python code providing extra modules, datatypes, etc. Multiple paths are separated by colons (:).

Special variable substitutions

Certain variable substitutions are always available, in addition to those defined in vars sections.

  • pimlico_root:

    Root directory of Pimlico, usually the directory pimlico/ within the project directory.

  • proejct_root:

    Root directory of the whole project. Current assumed to always be the parent directory of pimlico_root.

Directives:

Certain special directives are processed when reading config files. They are lines that begin with %%, followed by the directive name and any arguments.

  • variant:

    Allows a line to be included only when loading a particular variant of a pipeline. The variant name is specified as part of the directive in the form: variant:variant_name. You may include the line in more than one variant by specifying multiple names, separated by commas (and no spaces). You can use the default variant “main”, so that the line will be left out of other variants. The rest of the line, after the directive and variant name(s) is the content that will be included in those variants.

  • novariant:

    A line to be included only when not loading a variant of the pipeline. Equivalent to variant:main.

  • include:

    Include the entire contents of another file. The filename, specified relative to the config file in which the directive is found, is given after a space.

  • copy:

    Copies all config settings from another module, whose name is given as the sole argument. May be used multiple times in the same module and later copies will override earlier. Settings given explicitly in the module’s config override any copied settings. The following settings are not copied: input(s), filter, outputs, type.

Multiple parameter values:

Sometimes you want to write a whole load of modules that are almost identical, varying in just one or two parameters. You can give a parameter multiple values by writing them separated by vertical bars (|). The module definition will be expanded to produce a separate module for each value, with all the other parameters being identical.

You can even do this with multiple parameters of the same module and the expanded modules will cover all combinations of the parameter assignments.

Each module will be given a distinct name, based on the varied parameters. If just one is varied, the names will be of the form module_name{param_value}. If multiple parameters are varied at once, the names will be module_name{param_name0=param_value0~param_name1=param_value1~...).

find_all_data_paths(path)[source]
find_data_path(path, default=None)[source]

Given a path to a data dir/file relative to a data store, tries taking it relative to various store base dirs. If it exists in a store, that absolute path is returned. If it exists in no store, return None.

The stores searched are the long-term store and the short-term store, though in the future more valid data storage locations may be added.

Parameters:
  • path – path to data, relative to store base
  • default – usually, return None if no data is found. If default=”short”, return path relative to short-term store in this case. If default=”long”, long-term store.
Returns:

absolute path to data, or None if not found in any store

get_module_schedule()[source]

Work out the order in which modules should be executed. This is an ordering that respects dependencies, so that modules are executed after their dependencies, but otherwise follows the order in which modules were specified in the config.

Returns:list of module names
insert_module(module_info)[source]

Usually, all modules in the pipeline are loaded, based on config, by this class. However, occasionally, we may want to make modules available as part of the pipeline from elsewhere. In particular, this is necessary when building multi-stage modules – each stage is added (with special module name prefixes) into the main pipeline.

static load(filename, local_config=None, variant='main', override_local_config={})[source]
load_module_info(module_name)[source]

Load the module metadata for a named module in the pipeline. Loads only this module’s data and nothing more.

Parameters:module_name
Returns:
path_relative_to_config(path)[source]

Get an absolute path to a file/directory that’s been specified relative to a config file (usually within the config file).

Parameters:path – relative path
Returns:absolute path
modules
pimlico.core.config.check_for_cycles(pipeline)[source]
pimlico.core.config.check_pipeline(pipeline)[source]

Checks a pipeline over for metadata errors, cycles and other problems. Called every time a module is to be run, to check the whole pipeline’s metadata is in order.

pimlico.core.config.check_release(release_str)[source]
pimlico.core.config.get_dependencies(pipeline, modules)[source]

Get a list of software dependencies required by the subset of modules given.

Parameters:
  • pipeline
  • modules – list of modules to check. If None, checks all modules
pimlico.core.config.multiply_alternatives(alternative_params)[source]
pimlico.core.config.preprocess_config_file(filename, variant='main', initial_vars={})[source]
pimlico.core.config.print_dependency_leaf_problems(dep)[source]
pimlico.core.config.print_missing_dependencies(pipeline, modules)[source]

Check runtime dependencies for a subset of modules and output a table of missing dependencies.

Parameters:
  • pipeline
  • modules – list of modules to check. If None, checks all modules
Returns:

True if no missing dependencies, False otherwise

pimlico.core.config.var_substitute(option_val, vars)[source]