pimlico.core.modules.map.singleproc module

Sometimes the simple multiprocessing-based approach to map module parallelization just isn’t suitable. This module provides an equivalent set of implementations and convenience functions that don’t use multiprocessing, but conform to the pool-based execution pattern by creating a single-thread pool.

class SingleThreadMapModuleExecutor(module_instance_info, **kwargs)[source]

Bases: pimlico.core.modules.map.threaded.ThreadingMapModuleExecutor

create_pool(processes)[source]

Should return an instance of the pool to be used for document processing. Should generally be a subclass of DocumentProcessorPool.

Always called after preprocess().

single_process_executor_factory(process_document_fn, preprocess_fn=None, postprocess_fn=None, worker_set_up_fn=None, worker_tear_down_fn=None, batch_docs=None)[source]

Factory function for creating an executor that uses the single-process implementations of document-map pools and workers. This is an easy way to implement a non-parallelized executor

process_document_fn should be a function that takes the following arguments:

  • the executor instance (allowing access to things set during setup)
  • archive name
  • document name
  • the rest of the args are the document itself, from each of the input corpora

If proprocess_fn is given, it is called once before execution begins, with the executor as an argument.

If postprocess_fn is given, it is called at the end of execution, including on the way out after an error, with the executor as an argument and a kwarg error which is True if execution failed.