multistage¶

class MultistageModuleInfo(module_name, pipeline, **kwargs)[source]¶

Bases: pimlico.core.modules.base.BaseModuleInfo

Base class for multi-stage modules. You almost certainly don’t want to override this yourself, but use the factory method instead. It exists mainly for providing a way of identifying multi-stage modules.

module_executable = True¶

stages = None¶

typecheck_inputs()[source]¶: Overridden to check internal output-input connections as well as the main module’s inputs.

get_software_dependencies()[source]¶

Check that all software required to execute this module is installed and locatable. This is separate to metadata config checks, so that you don’t need to satisfy the dependencies for all modules in order to be able to run one of them. You might, for example, want to run different modules on different machines. This is called when a module is about to be executed and each of the dependencies is checked.

Returns a list of instances of subclasses of :class:~pimlico.core.dependencies.base.SoftwareDependency, representing the libraries that this module depends on.

Take care when providing dependency classes that you don’t put any import statements at the top of the Python module that will make loading the dependency type itself dependent on runtime dependencies. You’ll want to run import checks by putting import statements within this method.

You should call the super method for checking superclass dependencies.

get_input_software_dependencies()[source]¶

Collects library dependencies from the input datatypes to this module, which will need to be satisfied for the module to be run.

Unlike get_software_dependencies(), it shouldn’t need to be overridden by subclasses, since it just collects the results of getting dependencies from the datatypes.

check_ready_to_run()[source]¶

Called before a module is run, or if the ‘check’ command is called. This will only be called after all library dependencies have been confirmed ready (see :method:get_software_dependencies).

Essentially, this covers any module-specific checks that used to be in check_runtime_dependencies() other than library installation (e.g. checking models exist).

Always call the super class’ method if you override.

Returns a list of (name, description) pairs, where the name identifies the problem briefly and the description explains what’s missing and (ideally) how to fix it.

get_detailed_status()[source]¶

Returns a list of strings, containing detailed information about the module’s status that is specific to the module type. This may include module-specific information about execution status, for example.

Subclasses may override this to supply useful (human-readable) information specific to the module type. They should called the super method.

reset_execution()[source]¶

Remove all output data and metadata from this module to make a fresh start, as if it’s never been executed.

May be overridden if a module has some side effect other than creating/modifying things in its output directory(/ies), but overridden methods should always call the super method. Occasionally this is necessary, but most of the time the base implementation is enough.

classmethod get_key_info_table()[source]¶: Add the stages into the key info table.

get_next_stage()[source]¶: If there are more stages to be executed, returns a pair of the module info and stage definition. Otherwise, returns (None, None)

status¶

is_locked()[source]¶

Returns:	True is the module is currently locked from execution

multistage_module(multistage_module_type_name, module_stages, use_stage_option_names=False, module_readable_name=None)[source]¶: Factory to build a multi-stage module type out of a series of stages, each of which specifies a module type for the stage. The stages should be a list of ModuleStage objects.

class ModuleStage(name, module_info_cls, connections=None, output_connections=None, option_connections=None, use_stage_option_names=False, extra_connections_from_options=None)[source]¶

Bases: object

A single stage in a multi-stage module.

If no explicit input connections are given, the default input to this module is connected to the default output from the previous.

Connections can be given as a list of ModuleConnection s.

Output connections specify that one of this module’s outputs should be used as an output from the multi-stage module. Optional outputs for the multi-stage module are not currently supported (though could in theory be added later). This should be a list of ModuleOutputConnection s. If none are given for any of the stages, the module will have a single output, which is the default output from the last stage.

Option connections allow you to specify the names that are used for the multistage module’s options that get passed through to this stage’s module options. Simply specify a dict for option_connections where the keys are names module options for this stage and the values are the names that should be used for the multistage module’s options.

You may map multiple options from different stages to the same option name for the multistage module. This will result in the same option value being passed through to both stages. Note that help text, option type, option processing, etc will be taken from the first stage’s option (in case the two options aren’t identical).

Options not explicitly mapped to a name will use the name <stage_name>_<option_name>. If use_stage_option_names=True, this prefix will not be added: the stage’s option names will be used directly as the option name of the multistage module. Note that there is a danger of clashing option names with this behaviour, so only do it if you know the stages have distinct option names (or should share their values where the names overlap).

Further connections may be produced once processed options are available (when the main module’s module info is instantiated), by specifying a one-argument function as extra_connections_from_options. The argument is the processed option dictionary, which will contain the full set of options given the to the main module.

class ModuleConnection[source]¶: Bases: object

class InternalModuleConnection(input_name, output_name=None, previous_module=None)[source]¶

Bases: pimlico.core.modules.multistage.ModuleConnection

Connection between the output of one module in the multi-stage module and the input to another.

May specify the name of the previous module that a connection should be made to. If this is not given, the previous module in the sequence will be assumed.

If output_name=None, connects to the default output of the previous module.

class InternalModuleMultipleConnection(input_name, outputs)[source]¶

Bases: pimlico.core.modules.multistage.ModuleConnection

Connection between the outputs of multiple modules and the input to another (which must be a multiple input).

outputs should be a list of (module_name, output_name) pairs, or just strings giving the output name, assumed to be from the previous module.

class ModuleInputConnection(stage_input_name=None, main_input_name=None)[source]¶

Bases: pimlico.core.modules.multistage.ModuleConnection

Connection of a sub-module’s input to an input to the multi-stage module.

If main_input_name is not given, the name for the input to the multistage module will be identical to the stage input name. This might lead to unintended behaviour if multiple inputs end up with the same name, so you can specify a different name if necessary to avoid clashes.

If multiple inputs (e.g. from different stages) are connected to the same main input name, they will take input from the same previous module output. Nothing clever is done to unify the type requirements, however: the first stage’s type requirement is used for the main module’s input.

If stage_input_name is not given, the module’s default input will be connected.

class ModuleOutputConnection(stage_output_name=None, main_output_name=None)[source]¶

Bases: object

Specifies the connection of a sub-module’s output to the multi-stage module’s output. Works in a similar way to ModuleInputConnection.

exception MultistageModulePreparationError[source]¶: Bases: Exception