json¶
-
class
JsonDocumentType
(*args, **kwargs)[source]¶ Bases:
pimlico.datatypes.corpora.data_points.RawDocumentType
Very simple document corpus in which each document is a JSON object.
-
formatters
= [('json', 'pimlico.datatypes.corpora.formatters.json.JsonFormatter')]¶
-
data_point_type_supports_python2
= True¶
-
class
Document
(data_point_type, raw_data=None, internal_data=None, metadata=None)[source]¶ Bases:
pimlico.datatypes.corpora.data_points.Document
Document class for JsonDocumentType
-
keys
= ['data']¶
-
raw_to_internal
(raw_data)[source]¶ Take a bytes object containing the raw data for a document, read in from disk, and produce a dictionary containing all the processed data in the document’s internal format.
You will often want to call the super method and replace values or add to the dictionary. Whatever you do, make sure that all the internal data that the super type provides is also provided here, so that all of its properties and methods work.
-
-