FastText embedding reader (Gensim)¶
Path | pimlico.modules.input.embeddings.fasttext_gensim |
Executable | yes |
Reads in embeddings from the FastText format, storing them in the format used internally in Pimlico for embeddings. This version uses Gensim’s implementation of the format reader, so depends on Gensim.
Can be used, for example, to read the pre-trained embeddings offered by Facebook AI.
Reads only the binary format (.bin
), not the text format (.vec
).
Does not support Python 2, since Gensim has dropped Python 2 support.
See also
pimlico.modules.input.embeddings.fasttext
:- An alternative reader that does not use Gensim. It permits (only) reading the text format.
Todo
Add test pipeline. This is slightly difficult, as we need a small FastText binary file, which is harder to produce, since you can’t easily just truncate a big file.
This module does not support Python 2, so can only be used when Pimlico is being run under Python 3
Inputs¶
No inputs
Outputs¶
Name | Type(s) |
---|---|
embeddings | embeddings |
Options¶
Name | Description | Type |
---|---|---|
path | (required) Path to the FastText embedding file (.bin) | string |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_fasttext_embedding_reader_gensim_module]
type=pimlico.modules.input.embeddings.fasttext_gensim
path=value