Term-feature corpus vocab builder

Path pimlico.modules.features.vocab_builder
Executable yes

Todo

Document this module

Inputs

Name Type(s)
term_features TermFeatureListCorpus

Outputs

Name Type(s)
term_vocab Dictionary
feature_vocab Dictionary

Options

Name Description Type
feature_limit Limit vocab size to this number of most common entries (after other filters) int
feature_max_prop Include features that occur in max this proportion of documents float
term_max_prop Include terms that occur in max this proportion of documents float
term_threshold Minimum number of occurrences required of a term to be included int
feature_threshold Minimum number of occurrences required of a feature to be included int
term_limit Limit vocab size to this number of most common entries (after other filters) int