Corpus document list filter

Path pimlico.modules.corpora.list_filter
Executable yes

Similar to split, but instead of taking a random split of the dataset, splits it according to a given list of documents, putting those in the list in one set and the rest in another.

Inputs

Name Type(s)
corpus grouped_corpus
list string_list

Example config

This is an example of how this module can be used in a pipeline config file.

[my_list_filter_module]
type=pimlico.modules.corpora.list_filter
input_corpus=module_a.some_output
input_list=module_a.some_output

Test pipelines

This module is used by the following test pipelines. They are a further source of examples of the module’s usage.