OpenNLP NIST tokenizer¶

Path	pimlico.modules.nltk.nist_tokenize
Executable	yes

Sentence splitting and tokenization using the NLTK NIST tokenizer.

Inputs¶

Name	Type(s)
text	TarredCorpus<RawTextDocumentType>

Name	Type(s)
documents	`TokenizedCorpus`

Name	Description	Type
lowercase	Lowercase all output. Default: False	bool
non_european	Use the tokenizer’s international_tokenize() method instead of tokenize(). Default: False	bool