reader

class PimarcReader(archive_filename)[source]

Bases: object

The Pimlico Archive format: read-only archive.

close()[source]
read_file(filename)[source]

Load a file. Same as reader[filename]

iter_filenames()[source]

Iterate over just the filenames in the archive, without further metadata or file data. Fast for Pimarc, as the index is fully loaded into memory.

iter_metadata()[source]

Iterate over all files in the archive, yielding just the metadata, skipping over the data.

iter_files(skip=None, start_after=None)[source]

Iterate over files, together with their JSON metadata, which includes their name (as “name”).

Parameters:
  • start_after – skips all files before that with the given name, which is expected to be in the archive
  • skip – skips over the first portion of the archive, until this number of documents have been seen. Ignored is start_after is given.
read_doc_from_pimarc(archive_filename, metadata_start_byte)[source]

Read a single file’s metadata and file data from a given start point in the archive. This can be useful if you know the start point and don’t want to read in the whole index for an archive.

Parameters:
  • archive_filename – path to archive file
  • metadata_start_byte – byte from which metadata starts
Returns:

tuple (metadata, raw file data)

read_doc_from_pimarc_file(archive_file, metadata_start_byte)[source]

Same as read_doc_from_pimarc, but operates on an already-opened archive file.

Parameters:
  • archive_file – file-like object
  • metadata_start_byte – byte from which metadata starts
Returns:

tuple (metadata, raw file data)

metadata_decode_decorator(fn)[source]
class PimarcFileMetadata(raw_data)[source]

Bases: dict

Simple wrapper around the JSON-encoded metadata associated with a file in a Pimarc archive. When the metadata is loaded, the raw bytes data is wrapped in an instance of PimarcFileMetadata, so that it can be easily decoded when needed, but avoiding decoding all metadata, which might not ever be needed.

You can simply use the object as if it is a dict and it will decode the JSON data the first time you try accessing it. You can also call dict(obj) to get a plain dict instead.

decode()[source]
keys(*args, **kwargs)
values(*args, **kwargs)
items(*args, **kwargs)
exception StartAfterFilenameNotFound[source]

Bases: KeyError