LDA-seq (DTM) document topic analysis¶
Path | pimlico.modules.gensim.ldaseq_doc_topics |
Executable | yes |
Takes a trained DTM model and produces the topic vector for every document in a corpus.
The corpus is given as integer lists documents, which are the integer IDs of the words in each sentence of each document. It is assumed that the corpus uses the same vocabulary to map to integer IDs as the LDA model’s training corpus, so no further mapping needs to be done.
We also require a corpus of labels to say what time slice each document is in. These should be from the same set of labels that the DTM model was trained on, so that each document label can be mapped to a trained slice.
Does not support Python 2 since Gensim has dropped Python 2 support.
This module does not support Python 2, so can only be used when Pimlico is being run under Python 3
Inputs¶
Name | Type(s) |
---|---|
corpus | grouped_corpus <IntegerListsDocumentType > |
labels | grouped_corpus <LabelDocumentType > |
model | ldaseq_model |
Outputs¶
Name | Type(s) |
---|---|
vectors | grouped_corpus <VectorDocumentType > |
Example config¶
This is an example of how this module can be used in a pipeline config file.
[my_ldaseq_doc_topics_module]
type=pimlico.modules.gensim.ldaseq_doc_topics
input_corpus=module_a.some_output
input_labels=module_a.some_output
input_model=module_a.some_output
Example pipelines¶
This module is used by the following example pipelines. They are examples of how the module can be used together with other modules in a larger pipeline.
Test pipelines¶
This module is used by the following test pipelines. They are a further source of examples of the module’s usage.