Constituency parser¶
Path | pimlico.modules.opennlp.parse |
Executable | yes |
Constituency parsing using OpenNLP’s tools.
We run OpenNLP in the background using a Py4J wrapper, just as with the other OpenNLP wrappers.
The output format is not yet ideal: currently we produce documents consisting of a list of strings, each giving the OpenNLP tree output for a sentence. It would be better to use a standard constituency tree datatype that can be used generically as input to any modules required tree input. For now, if you write a module taking input from the parser, it will itself need to process the strings from the OpenNLP parser output.
Inputs¶
Name | Type(s) |
---|---|
documents | grouped_corpus <TokenizedDocumentType > |
Outputs¶
Name | Type(s) |
---|---|
trees | grouped_corpus <OpenNLPTreeStringsDocumentType > |
Options¶
Name | Description | Type |
---|---|---|
model | Parser model, full path or directory name. If a filename is given, it is expected to be in the OpenNLP model directory (models/opennlp/) | string |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_opennlp_parser_module]
type=pimlico.modules.opennlp.parse
input_documents=module_a.some_output
This example usage includes more options.
[my_opennlp_parser_module]
type=pimlico.modules.opennlp.parse
input_documents=module_a.some_output
model=en-parser-chunking.bin
Test pipelines¶
This module is used by the following test pipelines. They are a further source of examples of the module’s usage.