pimlico.datatypes.parse.dependency module

class pimlico.datatypes.parse.dependency.StanfordDependencyParseCorpus(base_dir, pipeline, raw_data=False)[source]

Bases: pimlico.datatypes.jsondoc.JsonDocumentCorpus

process_document(data)[source]
datatype_name = 'stanford_dependency_parses'
class pimlico.datatypes.parse.dependency.StanfordDependencyParseCorpusWriter(base_dir, readable=False, **kwargs)[source]

Bases: pimlico.datatypes.jsondoc.JsonDocumentCorpusWriter

document_to_raw_data(data)
class pimlico.datatypes.parse.dependency.CoNLLDependencyParseCorpus(base_dir, pipeline)[source]

Bases: pimlico.datatypes.word_annotations.WordAnnotationCorpus

10-field CoNLL dependency parse format (conllx) – i.e. post parsing.

Fields are:
id (int), word form, lemma, coarse POS, POS, features, head (int), dep relation, phead (int), pdeprel

The last two are usually not used.

process_document(data)[source]
datatype_name = 'conll_dependency_parses'
class pimlico.datatypes.parse.dependency.CoNLLDependencyParseCorpusWriter(base_dir, **kwargs)[source]

Bases: pimlico.datatypes.word_annotations.WordAnnotationCorpusWriter

document_to_raw_data(data)
class pimlico.datatypes.parse.dependency.CoNLLDependencyParseInputCorpus(base_dir, pipeline)[source]

Bases: pimlico.datatypes.word_annotations.WordAnnotationCorpus

The version of the CoNLL format (conllx) that only has the first 6 columns, i.e. no dependency parse yet annotated.

process_document(data)[source]
datatype_name = 'conll_dependency_parse_inputs'
class pimlico.datatypes.parse.dependency.CoNLLDependencyParseInputCorpusWriter(base_dir, **kwargs)[source]

Bases: pimlico.datatypes.word_annotations.WordAnnotationCorpusWriter

document_to_raw_data(data)