LDA document topic analysis¶

Path	pimlico.modules.gensim.lda_doc_topics
Executable	yes

Takes a trained LDA model and produces the topic vector for every document in a corpus.

The corpus is given as integer lists documents, which are the integer IDs of the words in each sentence of each document. It is assumed that the corpus uses the same vocabulary to map to integer IDs as the LDA model’s training corpus, so no further mapping needs to be done.

Does not support Python 2 since Gensim has dropped Python 2 support.

Todo

Add test pipeline and test

This module does not support Python 2, so can only be used when Pimlico is being run under Python 3

Inputs¶

Name	Type(s)
corpus	`grouped_corpus` <`IntegerListsDocumentType`>
model	`lda_model`

Outputs¶

Name	Type(s)
vectors	`grouped_corpus` <`VectorDocumentType`>

Example config¶

This is an example of how this module can be used in a pipeline config file.

[my_lda_doc_topics_module]
type=pimlico.modules.gensim.lda_doc_topics
input_corpus=module_a.some_output
input_model=module_a.some_output