Word2vec embedding reader (Gensim)¶
Path | pimlico.modules.input.embeddings.word2vec |
Executable | yes |
Reads in embeddings from the word2vec format, storing them in the format used internally in Pimlico for embeddings. We use Gensim’s implementation of the format reader, so the module depends on Gensim.
Can be used, for example, to read the pre-trained embeddings offered by Google.
This module does not support Python 2, so can only be used when Pimlico is being run under Python 3
Inputs¶
No inputs
Outputs¶
Name | Type(s) |
---|---|
embeddings | embeddings |
Options¶
Name | Description | Type |
---|---|---|
binary | Assume input is in word2vec binary format. Default: True | bool |
limit | Limit to the first N vectors in the file. Default: no limit | int |
path | (required) Path to the word2vec embedding file (.bin) | string |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_word2vec_embedding_reader_module]
type=pimlico.modules.input.embeddings.word2vec
path=value
This example usage includes more options.
[my_word2vec_embedding_reader_module]
type=pimlico.modules.input.embeddings.word2vec
binary=T
limit=0
path=value