Core Pimlico modulesΒΆ
Pimlico comes with a substantial collection of module types that provide wrappers around existing NLP and machine learning tools, as well as a number of general tools for processing datasets that are useful for many applications.
Some modules that used to be among the core modules have not yet been updated since a big change
in the datatypes system. They can be found in pimlico.old_datatypes.modules
, but are not
currently functional, until I get round to updating them.
- Corpus manipulation
- Corpus concatenation
- Corpus statistics
- Human-readable formatting
- Archive grouper (filter)
- Interleaved corpora
- Corpus document list filter
- Random shuffle
- Random shuffle (linear)
- Corpus split
- Store a corpus
- Random subsample
- Corpus subset
- Corpus vocab builder
- Token frequency counter
- Tokenized corpus to ID mapper
- ID to tokenized corpus mapper
- Embeddings
- Gensim topic modelling
- Input readers
- Malt dependency parser
- NLTK
- OpenNLP tools
- Output modules
- Scikit-learn tools
- spaCy
- Document-level text filters
- General utilities
- Visualization tools