formatter¶
The command-line iterable corpus browser displays one document at a time. It can display the raw data from the corpus files, which sometimes is sufficiently human-readable to not need any special formatting. It can also parse the data using its datatype and output text either from the datatype’s standard unicode representation or, if the document datatype provides it, a special browser formatting of the data.
When viewing output data, particularly during debugging of modules, it can be useful to provide special formatting routines to the browser, rather than using or overriding the datatype’s standard formatting methods. For example, you might want to pull out specific attributes for each document to get an overview of what’s coming out.
The browser command accepts a command-line option that specifies a Python class to format the data. This class should be a subclass of :class:~pimlico.cli.browser.formatter.DocumentBrowserFormatter that accepts a datatype compatible with the datatype being browsed and provides a method to format each document. You can write these in your custom code and refer to them by their fully qualified class name.
-
class
DocumentBrowserFormatter
(corpus_datatype)[source]¶ Bases:
object
Base class for formatters used to post-process documents for display in the iterable corpus browser.
-
DATATYPE
= DataPointType()¶
-
-
class
DefaultFormatter
(corpus_datatype)[source]¶ Bases:
pimlico.cli.browser.tools.formatter.DocumentBrowserFormatter
Generic implementation of a browser formatter that’s used if no other formatter is given.
-
DATATYPE
= DataPointType()¶
-
-
class
InvalidDocumentFormatter
(corpus_datatype)[source]¶ Bases:
pimlico.cli.browser.tools.formatter.DocumentBrowserFormatter
Formatter that skips over all docs other than invalid results. Uses standard formatting for InvalidDocument information.
-
typecheck_formatter
(formatted_doc_type, formatter_cls)[source]¶ Check that a document type is compatible with a particular formatter.
-
load_formatter
(datatype, formatter_name=None)[source]¶ Load a formatter specified by its fully qualified Python class name. If None, loads the default formatter. You may also specify a formatter by name, choosing from one of the standard ones that the formatted datatype gives.
Parameters: - datatype – datatype instance representing the datatype that will be formatted
- formatter_name – class name, or class
Returns: instantiated formatter