Tokenized corpus to ID mapper¶
Path | pimlico.modules.corpora.vocab_mapper |
Executable | yes |
Inputs¶
Name | Type(s) |
---|---|
text | TarredCorpus<TokenizedDocumentType> |
vocab | Dictionary |
Outputs¶
Name | Type(s) |
---|---|
ids | IntegerListsDocumentCorpus |
Options¶
Name | Description | Type |
---|---|---|
oov | If given, special token to map all OOV characters to. Otherwise, use vocab_size+1 as index | string |