Store in word2vec format

Path pimlico.modules.embeddings.store_word2vec
Executable yes

Takes embeddings stored in the default format used within Pimlico pipelines (see Embeddings) and stores them using the word2vec storage format.

This is for using the vectors outside your pipeline, for example, for distributing them publicly. For passing embeddings between Pimlico modules, the internal Embeddings datatype should be used.

The output contains a bin file, containing the vectors in the binary format, and a vocab file, containing the vocabulary and word counts.

Uses the Gensim implementation of the storage, so depends on Gensim.

Does not support Python 2, since we depend on Gensim.

This module does not support Python 2, so can only be used when Pimlico is being run under Python 3

Inputs

Name Type(s)
embeddings embeddings

Outputs

Name Type(s)
embeddings word2vec_files

Example config

This is an example of how this module can be used in a pipeline config file.

[my_store_word2vec_module]
type=pimlico.modules.embeddings.store_word2vec
input_embeddings=module_a.some_output

Test pipelines

This module is used by the following test pipelines. They are a further source of examples of the module’s usage.