Store a corpus

Path pimlico.modules.corpora.store
Executable yes

Store a corpus

Take documents from a corpus and write them to disk using the standard writer for the corpus’ data point type. This is useful where documents are produced on the fly, for example from some filter module or from an input reader, but where it is desirable to store the produced corpus for further use, rather than always running the filters/readers each time the corpus’ documents are needed.

Inputs

Name Type(s)
corpus grouped_corpus

Outputs

Name Type(s)
corpus grouped corpus with input doc type

Example config

This is an example of how this module can be used in a pipeline config file.

[my_store_module]
type=pimlico.modules.corpora.store
input_corpus=module_a.some_output

Example pipelines

This module is used by the following example pipelines. They are examples of how the module can be used together with other modules in a larger pipeline.

Test pipelines

This module is used by the following test pipelines. They are a further source of examples of the module’s usage.