Corpus document list filter¶
Path | pimlico.modules.corpora.list_filter |
Executable | yes |
Similar to split
, but instead of taking a random split of the dataset, splits it
according to a given list of documents, putting those in the list in one set and the rest in another.
Inputs¶
Name | Type(s) |
---|---|
corpus | grouped_corpus |
list | string_list |
Outputs¶
Name | Type(s) |
---|---|
set1 | grouped corpus with input doc type |
set2 | grouped corpus with input doc type |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_list_filter_module]
type=pimlico.modules.corpora.list_filter
input_corpus=module_a.some_output
input_list=module_a.some_output
Test pipelines¶
This module is used by the following test pipelines. They are a further source of examples of the module’s usage.