Pimlico guides
Core docs
Core Pimlico modules
CAEVO event extractor
C&C parser
Stanford CoreNLP
Corpus manipulation
Corpus concatenation
Corpus statistics
Human-readable formatting
Corpus document list filter
Corpus split
Corpus subset
Tar archive grouper
Tar archive grouper (filter)
Corpus vocab builder
Token frequency counter
Tokenized corpus to ID mapper
Embedding feature extractors and trainers
Feature set processing
Input readers
Malt dependency parser
OpenNLP modules
R interfaces
Regular expressions
Scikit-learn tools
Document-level text filters
General utilities
Visualization tools
Command-line interface
API Documentation
Future plans
Pimlico
Docs
»
Core Pimlico modules
»
Corpus manipulation
Edit on GitHub
Corpus manipulation
ΒΆ
Core modules for generic manipulation of mainly iterable corpora.
Corpus concatenation
Corpus statistics
Human-readable formatting
Corpus document list filter
Corpus split
Corpus subset
Tar archive grouper
Tar archive grouper (filter)
Corpus vocab builder
Token frequency counter
Tokenized corpus to ID mapper
Read the Docs
v: v0.8
Versions
latest
stable
v0.8
v0.7
v0.6
v0.5
v0.3
v0.2
Downloads
On Read the Docs
Project Home
Builds
Free document hosting provided by
Read the Docs
.