pimlico.core.modules.map.filter module¶
-
class
pimlico.core.modules.map.filter.
DocumentMapOutputTypeWrapper
(*args, **kwargs)[source]¶ Bases:
object
-
archive_iter
(subsample=None, start_after=None)[source]¶ Provides an iterator just like TarredCorpus, but instead of iterating over data read from disk, gets it on the fly from the input datatype.
-
data_ready
()[source]¶ Ready to supply this data as soon as all the wrapper module’s inputs are ready to produce their data.
-
non_filter_datatype
= None¶
-
output_name
= None¶
-
wrapped_module_info
= None¶
-
-
pimlico.core.modules.map.filter.
wrap_module_info_as_filter
(module_info_instance)[source]¶ Create a filter module from a document map module so that it gets executed on the fly to provide its outputs as input to later modules. Can be applied to any document map module simply by adding filter=T to its config.
This function is called when filter=T is given.
Parameters: module_info_instance – basic module info to wrap the outputs of Returns: a new non-executable ModuleInfo whose outputs are produced on the fly and will be identical to the outputs of the wrapper module.