LDA trainer

Path pimlico.modules.gensim.lda
Executable yes

Trains LDA using Gensim’s basic LDA implementation.

Inputs

Name Type(s)
corpus TarredCorpus<IntegerListsDocumentType>
vocab Dictionary

Outputs

Name Type(s)
model GensimLdaModel

Options

Name Description Type
eval_every   int
passes Passes parameter. Default: 1 int
num_topics Number of topics for the trained model to have. Default: 100 int
decay Decay parameter. Default: 0.5 float
minimum_phi_value   float
distributed Turn on distributed computing. Default: False bool
update_every Model’s update_every parameter. Default: 1 int
tfidf Transform word counts using TF-IDF when presenting documents to the model for training. Default: False bool
ignore_terms Ignore any of these terms in the bags of words when iterating over the corpus to train the model. Typically, you’ll want to include an OOV term here if your corpus has one, and any other special terms that are not part of a document’s content comma-separated list of strings
eta Eta prior of word distribution. May be one of special values ‘auto’ and ‘symmetric’, or a float. Default: symmetric <function eta_opt at 0x7f7a8db26500>
iterations Max number of iterations in each update. Default: 50 int
offset Offset parameter. Default: 1.0 float
gamma_threshold   float
alpha Alpha prior over topic distribution. May be one of special values ‘symmetric’, ‘asymmetric’ and ‘auto’, or a single float, or a list of floats. Default: symmetric <function alpha_opt at 0x7f7a8db262a8>
minimum_probability   float
chunksize Model’s chunksize parameter. Chunk size to use for distributed computing. Default: 2000 int