pimlico.datatypes.floats module¶
Similar to :mod:pimlico.datatypes.ints, but for lists of floats.
-
class
FloatListsDocumentType
(options, metadata)[source]¶ Bases:
pimlico.datatypes.documents.RawDocumentType
-
formatters
= [('float_lists', 'pimlico.datatypes.floats.FloatListsFormatter')]¶
-
-
class
FloatListsFormatter
(corpus)[source]¶ Bases:
pimlico.cli.browser.formatter.DocumentBrowserFormatter
-
DATATYPE
¶ alias of
FloatListsDocumentType
-
-
class
FloatListsDocumentCorpus
(base_dir, pipeline, **kwargs)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpus
Corpus of float list data: each doc contains lists of float. Unlike
IntegerTableDocumentCorpus
, they are not all constrained to have the same length. The downside is that the storage format (and probably I/O speed) isn’t quite as efficient. It’s still better than just storing ints as strings or JSON objects.The floats are stored as C double, which use 8 bytes. At the moment, we don’t provide any way to change this. An alternative would be to use C floats, losing precision but (almost) halving storage size.
-
datatype_name
= 'float_lists_corpus'¶
-
data_point_type
¶ alias of
FloatListsDocumentType
-
-
class
FloatListsDocumentCorpusWriter
(base_dir, **kwargs)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpusWriter
-
document_to_raw_data
(data)¶
-
-
class
FloatListDocumentType
(options, metadata)[source]¶ Bases:
pimlico.datatypes.documents.RawDocumentType
Like FloatListsDocumentType, but each document is treated as a single list of floats.
-
class
FloatListDocumentCorpus
(base_dir, pipeline, **kwargs)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpus
Corpus of float data: each doc contains a single sequence of floats.
The floats are stored as C doubles, using 8 bytes each.
-
datatype_name
= 'float_list_corpus'¶
-
data_point_type
¶ alias of
FloatListDocumentType
-
-
class
FloatListDocumentCorpusWriter
(base_dir, **kwargs)[source]¶ Bases:
pimlico.datatypes.tar.TarredCorpusWriter
-
document_to_raw_data
(data)¶
-