OpenNLP tokenizer

Path pimlico.modules.opennlp.tokenize
Executable yes

Todo

Document this module

Todo

Replace check_runtime_dependencies() with get_software_dependencies()

Inputs

Name Type(s)
text TarredCorpus

Outputs

Name Type(s)
documents TokenizedCorpus

Options

Name Description Type
token_model Tokenization model. Specify a full path, or just a filename. If a filename is given it is expected to be in the opennlp model directory (models/opennlp/) string
sentence_model Sentence segmentation model. Specify a full path, or just a filename. If a filename is given it is expected to be in the opennlp model directory (models/opennlp/) string