Module dependencies¶
In a Pimlico pipeline, you typically use lots of different external software packages. Some are Python packages, others system tools, Java libraries, whatever. Even the core modules that core with Pimlico between them depend on a huge amount of software.
Naturally, we don’t want to have to install all of this software before you can run even a simple Pimlico pipeline that doesn’t use all (or any) of it. So, we keep the core dependencies of Pimlico to an absolute minimum, and then check whether the necessary software dependencies are installed each time a pipeline module is going to be run.
Core dependencies¶
Certain dependencies are required for Pimlico to run at all, or needed so often that you wouldn’t get far without
installing them. These are defined in pimlico.core.dependencies.core
, and when you run the Pimlico command-line
interface, it checks they’re available and tries to install them if they’re not.
Module dependencies¶
Each module type defines its own set of software dependencies, if it has any. When you try to run the module, Pimlico runs some checks to try to make sure that all of these are available.
If some of them are not, it may be possible to install them automatically, straight from Pimlico. In particular,
many Python packages can be very easily installed using Pip. If this is
the case for one of the missing dependencies, Pimlico will tell you in the error output, and you can install
them using the install
command (with the module name/number as an argument).
Virtualenv¶
In order to simplify automatic installation, Pimlico is always run within a virtual environment, using Virtualenv. This means that any Python packages installed by Pip will live in a local directory within the Pimlico codebase that you’re running and won’t interfere with anything else on your system.
When you run Pimlico for the first time, it will create a new virtualenv for this purpose. Every time you run it after that, it will use this same environment, so anything you install will continue to be available.
Custom virtualenv¶
Most of the time, you don’t even need to be aware of the virtualenv that Python’s running in [1]. Under certain circumstances, you might need to use a custom virtualenv.
For example, say you’re running your pipeline over different servers, but have the pipeline and Pimlico codebase on a shared network drive. Then you can find that the software installed in the virtualenv on one machine is incompatible with the system-wide software on the other.
You can specify a name for a custom virtualenv using the environment variable PIMENV
. The first time
you run Pimlico with this set, it will automatically create the new virtualenv.
$ PIMENV=myenv ./pimlico.sh mypipeline.conf status
Replace myenv
with a name that better reflects its use (e.g. name of the server).
Every time you run Pimlico on that server, set the PIMENV
environment variable in the same way.
In case you want to get to the virtualenv itself, you can find it in pimlico/lib/virtualenv/myenv
.
Note
Pimlico previously used another environment variable VIRTUALENV
, which gave a path to the
virtualenv. You can still use this, but, unless you have a good reason to, it’s easier to use PIMENV
.
Defining module dependencies¶
Todo
Describe how module dependencies are defined for different types of deps