pimlico.datatypes.vrt module¶
-
class
VRTWord
(word, *attributes)[source]¶ Bases:
object
Word with all its annotations.
The Korp docs give the following example list of positional attributes (columns):
word form, the number of the token within the sentence, lemma, lemma with compound boundaries marked, part of speech, morphological analysis, dependency head number and dependency relationHowever, they are not fixed and different files may have different numbers of attributes with different meanings. This information is not included in the data file.
-
class
VRTText
(words, paragraph_ranges=[], sentence_ranges=[], opening_tag=None)[source]¶ Bases:
object
Contains a single VRT text (i.e. document).
Note that VRT’s structures are not hierarchical: they can be overlapping. See VRT docs.
We don’t currently process structural attributes. This can easily be added later if necessary.
-
paragraphs
¶
-
sentences
¶
-
word_strings
¶
-
-
class
VRTDocumentType
(options, metadata)[source]¶ Bases:
pimlico.datatypes.documents.DataPointType
Document type for annotation text documents read in from VRT files (VeRticalized Text, as used by Korp:).
-
formatters
= [('vrt', 'pimlico.datatypes.vrt.VRTFormatter')]¶
-
-
class
VRTFormatter
(corpus)[source]¶ Bases:
pimlico.cli.browser.formatter.DocumentBrowserFormatter
-
DATATYPE
¶ alias of
VRTDocumentType
-