json

class JsonDocumentType(*args, **kwargs)[source]

Bases: pimlico.datatypes.corpora.data_points.RawDocumentType

Very simple document corpus in which each document is a JSON object.

formatters = [('json', 'pimlico.datatypes.corpora.formatters.json.JsonFormatter')]
data_point_type_supports_python2 = True
class Document(data_point_type, raw_data=None, internal_data=None, metadata=None)[source]

Bases: pimlico.datatypes.corpora.data_points.Document

Document class for JsonDocumentType

keys = ['data']
raw_to_internal(raw_data)[source]

Take a bytes object containing the raw data for a document, read in from disk, and produce a dictionary containing all the processed data in the document’s internal format.

You will often want to call the super method and replace values or add to the dictionary. Whatever you do, make sure that all the internal data that the super type provides is also provided here, so that all of its properties and methods work.

internal_to_raw(internal_data)[source]

Take a dictionary containing all the document’s data in its internal format and produce a bytes object containing all that data, which can be written out to disk.