Tokenizer¶
Path | pimlico.modules.spacy.tokenize |
Executable | yes |
Tokenization using spaCy.
Inputs¶
Name | Type(s) |
---|---|
text | grouped_corpus <TextDocumentType > |
Outputs¶
Name | Type(s) |
---|---|
documents | grouped_corpus <TokenizedDocumentType > |
Options¶
Name | Description | Type |
---|---|---|
model | spaCy model to use. This may be a name of a standard spaCy model or a path to the location of a trained model on disk, if on_disk=T. If it’s not a path, the spaCy download command will be run before execution | string |
on_disk | Load the specified model from a location on disk (the model parameter gives the path) | bool |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_spacy_tokenizer_module]
type=pimlico.modules.spacy.tokenize
input_text=module_a.some_output
This example usage includes more options.
[my_spacy_tokenizer_module]
type=pimlico.modules.spacy.tokenize
input_text=module_a.some_output
model=en_core_web_sm
on_disk=T
Example pipelines¶
This module is used by the following example pipelines. They are examples of how the module can be used together with other modules in a larger pipeline.
Test pipelines¶
This module is used by the following test pipelines. They are a further source of examples of the module’s usage.