Pimlico Wishlist

Things I plan to add to Pimlico.

Todos

The following to-dos appear elsewhere in the docs. They are generally bits of the documentation I’ve not written yet, but am aware are needed.

Todo

Continue updating this for the new datatype system. I’ve got partway, but the reader is still far from finished

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/src/python/pimlico/core/modules/map/filter.py:docstring of pimlico.core.modules.map.filter, line 1.)

Todo

Under the new datatype system, this should be done differently. Don’t wrap datatypes, but instead use the actual output datatypes (taken from the wrapped module type’s output) and instead create custom readers that gets instantiated when fetching the module’s output readers.

I’ve created the test pipeline filter_tokenize for testing this.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/src/python/pimlico/core/modules/map/filter.py:docstring of pimlico.core.modules.map.filter.wrap_module_info_as_filter, line 7.)

Todo

Call get_writer_software_dependencies before instantiating writer

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/src/python/pimlico/datatypes/base.py:docstring of pimlico.datatypes.base.PimlicoDatatype.get_writer_software_dependencies, line 10.)

Todo

Call get_writer_software_dependencies before instantiating writer

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/src/python/pimlico/datatypes/embeddings.py:docstring of pimlico.datatypes.embeddings.Embeddings.get_writer_software_dependencies, line 10.)

Todo

Add unit test for ScoredReadFeatureSets

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/src/python/pimlico/datatypes/features.py:docstring of pimlico.datatypes.features.ScoredRealFeatureSets, line 9.)

Todo

Not got these things working yet, but they’ll be useful in the long run

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/src/python/pimlico/utils/urwid.py:docstring of pimlico.utils.urwid, line 8.)

Todo

Describe how module dependencies are defined for different types of deps

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/core/dependencies.rst, line 73.)

Todo

Include some examples from the core modules of how deps are defined and some special cases of software fetching

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/core/dependencies.rst, line 80.)

Todo

Write documentation for this

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/core/module_structure.rst, line 9.)

Todo

Filter module guide needs to be updated for new datatypes. This section is currently completely wrong – ignore it! This is quite a substantial change.

The difficulty of describing what you need to do here suggests we might want to provide some utilities to make this easier!

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/guides/filters.rst, line 31.)

Todo

Write a guide to building document map modules.

For now, the skeletons below are a useful starting point, but there should be a more fulsome explanation here of what document map modules are all about and how to use them.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/guides/map_module.rst, line 5.)

Todo

Document map module guides needs to be updated for new datatypes.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/guides/map_module.rst, line 12.)

Todo

Module writing guide needs to be updated for new datatypes.

In particular, the executor example and datatypes in the module definition need to be updated.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/guides/module.rst, line 23.)

Todo

Setup guide has a lot that needs to be updated for the new datatypes system. I’ve updated up to Getting input.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/guides/setup.rst, line 5.)

Todo

Continue writing from here

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/guides/setup.rst, line 110.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.candc.rst, line 25.)

Todo

Update to new datatypes system and add test pipelines

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.corenlp.rst, line 36.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.embeddings.dependencies.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.embeddings.dependencies.rst, line 20.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.embeddings.store_tsv.rst, line 22.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.embeddings.store_word2vec.rst, line 23.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.embeddings.word2vec.rst, line 23.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.term_feature_compiler.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.term_feature_compiler.rst, line 20.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.term_feature_matrix_builder.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.term_feature_matrix_builder.rst, line 20.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.vocab_builder.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.vocab_builder.rst, line 20.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.vocab_mapper.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.features.vocab_mapper.rst, line 20.)

Todo

Add test pipeline and test

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.gensim.lda.rst, line 15.)

Todo

Add test pipeline and test

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.gensim.lda_doc_topics.rst, line 19.)

Todo

Add test pipeline. This is slightly difficult, as we need a small FastText binary file, which is harder to produce, since you can’t easily just truncate a big file.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.input.embeddings.fasttext_gensim.rst, line 27.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.malt.conll_parser_input.rst, line 20.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.malt.parse.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.malt.parse.rst, line 20.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.nltk.nist_tokenize.rst, line 19.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.coreference.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.coreference.rst, line 20.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.coreference_pipeline.rst, line 19.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.ner.rst, line 26.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.parse.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.parse.rst, line 20.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.opennlp.pos.rst, line 22.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.r.script.rst, line 19.)

Todo

Document this module

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.regex.annotated_text.rst, line 16.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.regex.annotated_text.rst, line 20.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.sklearn.matrix_factorization.rst, line 25.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.text.char_tokenize.rst, line 21.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.text.text_normalize.rst, line 18.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.text.untokenize.rst, line 26.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.utility.alias.rst, line 51.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.utility.collect_files.rst, line 24.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.utility.copy_file.rst, line 21.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.visualization.bar_chart.rst, line 18.)

Todo

Update to new datatypes system and add test pipeline

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/latest/docs/modules/pimlico.modules.visualization.embeddings_plot.rst, line 24.)