LDA trainer¶

Path	pimlico.modules.gensim.lda
Executable	yes

Trains LDA using Gensim’s basic LDA implementation.

Inputs¶

Name	Type(s)
corpus	TarredCorpus<IntegerListsDocumentType>
vocab	`Dictionary`

Name	Type(s)
model	`GensimLdaModel`

Name	Description	Type
eval_every		int
passes	Passes parameter. Default: 1	int
num_topics	Number of topics for the trained model to have. Default: 100	int
decay	Decay parameter. Default: 0.5	float
minimum_phi_value		float
distributed	Turn on distributed computing. Default: False	bool
update_every	Model’s update_every parameter. Default: 1	int
tfidf	Transform word counts using TF-IDF when presenting documents to the model for training. Default: False	bool
ignore_terms	Ignore any of these terms in the bags of words when iterating over the corpus to train the model. Typically, you’ll want to include an OOV term here if your corpus has one, and any other special terms that are not part of a document’s content	comma-separated list of strings
eta	Eta prior of word distribution. May be one of special values ‘auto’ and ‘symmetric’, or a float. Default: symmetric	<function eta_opt at 0x7f7a8db26500>
iterations	Max number of iterations in each update. Default: 50	int
offset	Offset parameter. Default: 1.0	float
gamma_threshold		float
alpha	Alpha prior over topic distribution. May be one of special values ‘symmetric’, ‘asymmetric’ and ‘auto’, or a single float, or a list of floats. Default: symmetric	<function alpha_opt at 0x7f7a8db262a8>
minimum_probability		float
chunksize	Model’s chunksize parameter. Chunk size to use for distributed computing. Default: 2000	int