OpenNLP coreference resolution

Path pimlico.modules.opennlp.coreference
Executable yes

Todo

Document this module

Todo

Replace check_runtime_dependencies() with get_software_dependencies()

Use local config setting opennlp_memory to set the limit on Java heap memory for the OpenNLP processes. If parallelizing, this limit is shared between the processes. That is, each OpenNLP worker will have a memory limit of opennlp_memory / processes. That setting can use g, G, m, M, k and K, as in the Java setting.

Inputs

Name Type(s)
parses TarredCorpus<TreeStringsDocumentType>

Outputs

Name Type(s)
coref CorefCorpus

Options

Name Description Type
gzip If True, each output, except annotations, for each document is gzipped. This can help reduce the storage occupied by e.g. parser or coref output. Default: False bool
model Coreference resolution model, full path or directory name. If a filename is given, it is expected to be in the OpenNLP model directory (models/opennlp/). Default: ‘’ (standard English opennlp model in models/opennlp/) string
readable If True, pretty-print the JSON output, so it’s human-readable. Default: False bool
timeout Timeout in seconds for each individual coref resolution task. If this is exceeded, an InvalidDocument is returned for that document int