FastText embedding reader (Gensim)

Path pimlico.modules.input.embeddings.fasttext_gensim
Executable yes

Reads in embeddings from the FastText format, storing them in the format used internally in Pimlico for embeddings. This version uses Gensim’s implementation of the format reader, so depends on Gensim.

Can be used, for example, to read the pre-trained embeddings offered by Facebook AI.

Reads only the binary format (.bin), not the text format (.vec).

Does not support Python 2, since Gensim has dropped Python 2 support.

See also

pimlico.modules.input.embeddings.fasttext:
An alternative reader that does not use Gensim. It permits (only) reading the text format.

Todo

Add test pipeline. This is slightly difficult, as we need a small FastText binary file, which is harder to produce, since you can’t easily just truncate a big file.

This module does not support Python 2, so can only be used when Pimlico is being run under Python 3

Inputs

No inputs

Outputs

Name Type(s)
embeddings embeddings

Options

Name Description Type
path (required) Path to the FastText embedding file (.bin) string

Example config

This is an example of how this module can be used in a pipeline config file.

[my_fasttext_embedding_reader_gensim_module]
type=pimlico.modules.input.embeddings.fasttext_gensim
path=value