pimlico.core.modules.map.filter module

class pimlico.core.modules.map.filter.DocumentMapOutputTypeWrapper(*args, **kwargs)[source]

Bases: object

archive_iter(subsample=None, start_after=None)[source]

Provides an iterator just like TarredCorpus, but instead of iterating over data read from disk, gets it on the fly from the input datatype.

data_ready()[source]

Ready to supply this data as soon as all the wrapper module’s inputs are ready to produce their data.

non_filter_datatype = None
output_name = None
wrapped_module_info = None
pimlico.core.modules.map.filter.wrap_module_info_as_filter(module_info_instance)[source]

Create a filter module from a document map module so that it gets executed on the fly to provide its outputs as input to later modules. Can be applied to any document map module simply by adding filter=T to its config.

This function is called when filter=T is given.

Parameters:module_info_instance – basic module info to wrap the outputs of
Returns:a new non-executable ModuleInfo whose outputs are produced on the fly and will be identical to

the outputs of the wrapper module.