Text parser¶
Path | pimlico.modules.spacy.parse_text |
Executable | yes |
Parsing using spaCy
Entire parsing pipeline from raw text using the same spaCy model.
The word annotations in the output contain the information from the spaCy parser and the documents are split into sentences following the spaCy’s sentence segmentation.
The annotation fields follow those produced by the Malt parser: pos, head and deprel.
Inputs¶
Name | Type(s) |
---|---|
text | grouped_corpus <RawTextDocumentType > |
Outputs¶
Name | Type(s) |
---|---|
parsed | grouped_corpus <WordAnnotationsDocumentType > |
Options¶
Name | Description | Type |
---|---|---|
model | spaCy model to use. This may be a name of a standard spaCy model or a path to the location of a trained model on disk, if on_disk=T. If it’s not a path, the spaCy download command will be run before execution | string |
on_disk | Load the specified model from a location on disk (the model parameter gives the path) | bool |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_spacy_text_parser_module]
type=pimlico.modules.spacy.parse_text
input_text=module_a.some_output
This example usage includes more options.
[my_spacy_text_parser_module]
type=pimlico.modules.spacy.parse_text
input_text=module_a.some_output
model=en_core_web_sm
on_disk=T
Test pipelines¶
This module is used by the following test pipelines. They are a further source of examples of the module’s usage.