Corpus document list filter¶
Path | pimlico.modules.corpora.list_filter |
Executable | yes |
Similar to :mod:pimlico.modules.corpora.split, but instead of taking a random split of the dataset, splits it according to a given list of documents, putting those in the list in one set and the rest in another.
Inputs¶
Name | Type(s) |
---|---|
corpus | TarredCorpus |
list | StringList |
Outputs¶
Name | Type(s) |
---|---|
set1 | same as input corpus |
set2 | same as input corpus |