pimlico.datatypes.floats module

Similar to :mod:pimlico.datatypes.ints, but for lists of floats.

class FloatListsDocumentType(options, metadata)[source]

Bases: pimlico.datatypes.documents.RawDocumentType

formatters = [('float_lists', 'pimlico.datatypes.floats.FloatListsFormatter')]
process_document(data)[source]
read_rows(reader)[source]
class FloatListsFormatter(corpus)[source]

Bases: pimlico.cli.browser.formatter.DocumentBrowserFormatter

DATATYPE

alias of FloatListsDocumentType

format_document(doc)[source]

Format a single document and return the result as a string (or unicode, but it will be converted to ASCII for display).

Must be overridden by subclasses.

class FloatListsDocumentCorpus(base_dir, pipeline, **kwargs)[source]

Bases: pimlico.datatypes.tar.TarredCorpus

Corpus of float list data: each doc contains lists of float. Unlike IntegerTableDocumentCorpus, they are not all constrained to have the same length. The downside is that the storage format (and probably I/O speed) isn’t quite as efficient. It’s still better than just storing ints as strings or JSON objects.

The floats are stored as C double, which use 8 bytes. At the moment, we don’t provide any way to change this. An alternative would be to use C floats, losing precision but (almost) halving storage size.

datatype_name = 'float_lists_corpus'
data_point_type

alias of FloatListsDocumentType

class FloatListsDocumentCorpusWriter(base_dir, **kwargs)[source]

Bases: pimlico.datatypes.tar.TarredCorpusWriter

document_to_raw_data(data)
class FloatListDocumentType(options, metadata)[source]

Bases: pimlico.datatypes.documents.RawDocumentType

Like FloatListsDocumentType, but each document is treated as a single list of floats.

process_document(data)[source]
read_floats(reader)[source]
class FloatListDocumentCorpus(base_dir, pipeline, **kwargs)[source]

Bases: pimlico.datatypes.tar.TarredCorpus

Corpus of float data: each doc contains a single sequence of floats.

The floats are stored as C doubles, using 8 bytes each.

datatype_name = 'float_list_corpus'
data_point_type

alias of FloatListDocumentType

class FloatListDocumentCorpusWriter(base_dir, **kwargs)[source]

Bases: pimlico.datatypes.tar.TarredCorpusWriter

document_to_raw_data(data)