pimlico.datatypes.caevo module¶
-
class
CaevoCorpus
(base_dir, pipeline, **kwargs)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpus
Datatype for Caevo output. The output is stored exactly as it comes out from Caevo, in an XML format. This datatype reads in that XML and provides easy access to its components.
Since we simply store the XML that comes from Caevo, there’s no corresponding corpus writer. The data is output using a :class:pimlico.datatypes.tar.TarredCorpusWriter.
-
data_point_type
¶ alias of
CaevoDocumentType
-