Embedding space plotter¶

Path	pimlico.modules.visualization.embeddings_plot
Executable	yes

Plot vectors from embeddings, trained by some other module, in a 2D space using a MDS reduction and Matplotlib.

They might, for example, come from pimlico.modules.embeddings.word2vec. The embeddings are read in using Pimlico’s generic word embedding storage type.

Uses scikit-learn to perform the MDS/TSNE reduction.

Inputs¶

Name	Type(s)
vectors	`list` of `Embeddings`

Outputs¶

Name	Type(s)
plot	`PlotOutput`

Options¶

Name	Description	Type
skip	Number of most frequent words to skip, taking the next most frequent after these. Default: 0	int
metric	Distance metric to use. Choose from ‘cosine’, ‘euclidean’, ‘manhattan’. Default: ‘cosine’	‘cosine’, ‘euclidean’ or ‘manhattan’
reduction	Dimensionality reduction technique to use to project to 2D. Available: mds (Multi-dimensional Scaling), tsne (t-distributed Stochastic Neighbor Embedding). Default: mds	‘mds’ or ‘tsne’
colors	List of colours to use for different embedding sets. Should be a list of matplotlib colour strings, one for each embedding set given in input_vectors	comma-separated list of strings
cmap	Mapping from word prefixes to matplotlib plotting colours. Every word beginning with the given prefix has the prefix removed and is plotted in the corresponding colour. Specify as a JSON dictionary mapping prefix strings to colour strings	JSON string
words	Number of most frequent words to plot. Default: 50	int