OpenNLP tokenizer¶
Path | pimlico.modules.opennlp.tokenize |
Executable | yes |
Todo
Document this module
Todo
Replace check_runtime_dependencies() with get_software_dependencies()
Inputs¶
Name | Type(s) |
---|---|
text | TarredCorpus |
Outputs¶
Name | Type(s) |
---|---|
documents | TokenizedCorpus |
Options¶
Name | Description | Type |
---|---|---|
token_model | Tokenization model. Specify a full path, or just a filename. If a filename is given it is expected to be in the opennlp model directory (models/opennlp/) | string |
sentence_model | Sentence segmentation model. Specify a full path, or just a filename. If a filename is given it is expected to be in the opennlp model directory (models/opennlp/) | string |