pimlico.datatypes.jsondoc module¶
-
class
pimlico.datatypes.jsondoc.
JsonDocumentCorpus
(base_dir, pipeline, raw_data=False)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpus
Very simple document corpus in which each document is a JSON object.
-
data_point_type
¶ alias of
JsonDocumentType
-
datatype_name
= 'json'¶
-
-
class
pimlico.datatypes.jsondoc.
JsonDocumentCorpusWriter
(base_dir, readable=False, **kwargs)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpusWriter
If readable=True, JSON text output will be nicely formatted so that it’s human-readable. Otherwise, it will be formatted to take up less space.
-
document_to_raw_data
(data)¶
-