pimlico.datatypes.coref.opennlp module

Datatypes for coreference resolution output. Based on OpenNLP’s coref output, so includes all the information provided by that. This is a slight different set of information to CoreNLP. Currently, there’s no way to convert between the two datatypes, but in future it will be easy to provide an adapter that carries across the information common to the two (which for most purposes will be sufficient).

class CorefDocumentType(options, metadata)[source]

Bases: pimlico.datatypes.jsondoc.JsonDocumentType

process_document(doc)[source]
class CorefCorpus(base_dir, pipeline, **kwargs)[source]

Bases: pimlico.datatypes.jsondoc.JsonDocumentCorpus

datatype_name = 'opennlp_coref'
data_point_type

alias of CorefDocumentType

class CorefCorpusWriter(base_dir, readable=False, **kwargs)[source]

Bases: pimlico.datatypes.jsondoc.JsonDocumentCorpusWriter

document_to_raw_data(data)
class Entity(id, mentions, category=None, gender=None, gender_prob=None, number=None, number_prob=None)[source]

Bases: object

get_head_word(pronouns=['i', 'you', 'he', 'she', 'it', 'we', 'they', 'me', 'him', 'her', 'us', 'them', 'myself', 'yourself', 'himself', 'herself', 'itself', 'ourself', 'ourselves', 'themselves', 'my', 'your', 'his', 'its', "it's", 'our', 'their', 'mine', 'yours', 'ours', 'theirs', 'this', 'that', 'those', 'these'])[source]

Retrieve a head word from the entity’s mentions if possible. Returns None if no suitable head word can be found: e.g., if all mentions are pronouns.

Pronouns are filtered out using :data:pimlico.utils.linguistic.ENGLISH_PRONOUNS by default. You can override this with the pronouns kwargs. If pronouns=None, no filtering is done.

to_json_dict()[source]
static from_json(json)[source]
static from_java_object(obj)[source]
class Mention(sentence_num, start_index, end_index, text, gender=None, gender_prob=None, number=None, number_prob=None, head_start_index=None, head_end_index=None, name_type=None)[source]

Bases: object

static from_json(json)[source]
to_json_dict()[source]
static from_java_object(obj)[source]