Pimlico guides
Core docs
Core Pimlico modules
!! candc
!! corenlp
Corpus manipulation
Corpus concatenation
Corpus statistics
Human-readable formatting
Archive grouper (filter)
Interleaved corpora
Corpus document list filter
Random shuffle
Corpus split
Store a corpus
Corpus subset
Corpus vocab builder
Token frequency counter
Tokenized corpus to ID mapper
Embeddings
Feature set processing
Gensim topic modelling
Input readers
Malt dependency parser
NLTK
OpenNLP modules
Output modules
R interfaces
Regular expressions
Scikit-learn tools
Document-level text filters
General utilities
Visualization tools
Command-line interface
API Documentation
Module test pipelines
Future plans
Pimlico
Docs
»
Core Pimlico modules
»
Corpus manipulation
Edit on GitHub
Corpus manipulation
ΒΆ
Core modules for generic manipulation of mainly iterable corpora.
Corpus concatenation
Corpus statistics
Human-readable formatting
Archive grouper (filter)
Interleaved corpora
Corpus document list filter
Random shuffle
Corpus split
Store a corpus
Corpus subset
Corpus vocab builder
Token frequency counter
Tokenized corpus to ID mapper
Read the Docs
v: python3
Versions
latest
v0.9
v0.8
v0.7
v0.6
v0.5
v0.3
v0.2
python3
Downloads
pdf
html
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.