pimlico.datatypes.formatters.tokenized module

class TokenizedDocumentFormatter(corpus, raw_data=False)[source]

Bases: pimlico.cli.browser.formatter.DocumentBrowserFormatter

DATATYPE

alias of pimlico.datatypes.tokenized.TokenizedDocumentType

format_document(doc)[source]

Format a single document and return the result as a string (or unicode, but it will be converted to ASCII for display).

Must be overridden by subclasses.

class CharacterTokenizedDocumentFormatter(corpus, raw_data=False)[source]

Bases: pimlico.cli.browser.formatter.DocumentBrowserFormatter

DATATYPE

alias of pimlico.datatypes.tokenized.CharacterTokenizedDocumentType

format_document(doc)[source]

Format a single document and return the result as a string (or unicode, but it will be converted to ASCII for display).

Must be overridden by subclasses.

class SegmentedLinesFormatter(corpus)[source]

Bases: pimlico.cli.browser.formatter.DocumentBrowserFormatter

DATATYPE

alias of pimlico.datatypes.tokenized.SegmentedLinesDocumentType

format_document(doc)[source]

Format a single document and return the result as a string (or unicode, but it will be converted to ASCII for display).

Must be overridden by subclasses.