LDA document topic analysis

Path pimlico.modules.gensim.lda_doc_topics
Executable yes

Takes a trained LDA model and produces the topic vector for every document in a corpus.

The corpus is given as integer lists documents, which are the integer IDs of the words in each sentence of each document. It is assumed that the corpus uses the same vocabulary to map to integer IDs as the LDA model’s training corpus, so no further mapping needs to be done.

Does not support Python 2 since Gensim has dropped Python 2 support.

Todo

Add test pipeline and test

This module does not support Python 2, so can only be used when Pimlico is being run under Python 3

Inputs

Name Type(s)
corpus grouped_corpus <IntegerListsDocumentType>
model lda_model

Outputs

Name Type(s)
vectors grouped_corpus <VectorDocumentType>

Example config

This is an example of how this module can be used in a pipeline config file.

[my_lda_doc_topics_module]
type=pimlico.modules.gensim.lda_doc_topics
input_corpus=module_a.some_output
input_model=module_a.some_output