Corpus document list filter

Path pimlico.modules.corpora.list_filter
Executable yes

Similar to :mod:pimlico.modules.corpora.split, but instead of taking a random split of the dataset, splits it according to a given list of documents, putting those in the list in one set and the rest in another.

Inputs

Name Type(s)
corpus TarredCorpus
list StringList

Outputs

Name Type(s)
set1 same as input corpus
set2 same as input corpus