Tokenized corpus to ID mapper

Path pimlico.modules.corpora.vocab_mapper
Executable yes

Inputs

Name Type(s)
text TarredCorpus<TokenizedDocumentType>
vocab Dictionary

Outputs

Name Type(s)
ids IntegerListsDocumentCorpus

Options

Name Description Type
oov If given, special token to map all OOV characters to. Otherwise, use vocab_size+1 as index string