Corpus concatenation

Path pimlico.modules.corpora.concat
Executable no

Concatenate two (or more) corpora to produce a bigger corpus.

They must have the same data point type, or one must be a subtype of the other.

This is a filter module. It is not executable, so won’t appear in a pipeline’s list of modules that can be run. It produces its output for the next module on the fly when the next module needs it.

Inputs

Name Type(s)
corpora list of iterable_corpus

Outputs

Name Type(s)
corpus corpus with data-point from input

Example config

This is an example of how this module can be used in a pipeline config file.

[my_concat_module]
type=pimlico.modules.corpora.concat
input_corpora=module_a.some_output

Test pipelines

This module is used by the following test pipelines. They are a further source of examples of the module’s usage.