pimlico.datatypes.caevo module

class CaevoCorpus(base_dir, pipeline, **kwargs)[source]

Bases: pimlico.datatypes.tar.TarredCorpus

Datatype for Caevo output. The output is stored exactly as it comes out from Caevo, in an XML format. This datatype reads in that XML and provides easy access to its components.

Since we simply store the XML that comes from Caevo, there’s no corresponding corpus writer. The data is output using a :class:pimlico.datatypes.tar.TarredCorpusWriter.

data_point_type

alias of CaevoDocumentType