Text to character level¶

Path	pimlico.modules.text.char_tokenize
Executable	yes

Filter to treat text data as character-level tokenized data. This makes it simple to train character-level models, since the output appears exactly like a tokenized document, where each token is a single character. You can then feed it into any module that expects tokenized text.

Inputs¶

Name	Type(s)
corpus	TarredCorpus<TextDocumentType>

Outputs¶

Name	Type(s)
corpus	`CharacterTokenizedDocumentTypeTarredCorpus`