Logo
  • Pimlico guides
  • Core docs
  • Core Pimlico modules
    • !! candc
    • !! corenlp
    • Corpus manipulation
      • Corpus concatenation
      • Corpus statistics
      • Human-readable formatting
      • Archive grouper (filter)
      • Interleaved corpora
      • Corpus document list filter
      • Random shuffle
      • Corpus split
      • Store a corpus
      • Corpus subset
      • Corpus vocab builder
      • Token frequency counter
      • Tokenized corpus to ID mapper
    • Embeddings
    • Feature set processing
    • Gensim topic modelling
    • Input readers
    • Malt dependency parser
    • NLTK
    • OpenNLP modules
    • Output modules
    • R interfaces
    • Regular expressions
    • Scikit-learn tools
    • Document-level text filters
    • General utilities
    • Visualization tools
  • Command-line interface
  • API Documentation
  • Module test pipelines
  • Future plans
Pimlico
  • Docs »
  • Core Pimlico modules »
  • Corpus manipulation
  • Edit on GitHub

Corpus manipulationΒΆ

Core modules for generic manipulation of mainly iterable corpora.

  • Corpus concatenation
  • Corpus statistics
  • Human-readable formatting
  • Archive grouper (filter)
  • Interleaved corpora
  • Corpus document list filter
  • Random shuffle
  • Corpus split
  • Store a corpus
  • Corpus subset
  • Corpus vocab builder
  • Token frequency counter
  • Tokenized corpus to ID mapper
Next Previous

© Copyright 2016, Mark Granroth-Wilding Revision 37b75e98.

Built with Sphinx using a theme provided by Read the Docs.