Pimlico Wishlist¶
Things I plan to add to Pimlico.
- Further modules:
- CherryPicker for coreference resolution
- Berkeley Parser for fast constituency parsing
- Reconcile coref. Seems to incorporate upstream NLP tasks. Would want to interface such that we can reuse output from other modules and just do coref.
- Pipeline graph visualizations: Outputting pipeline diagrams. Maybe an interactive GUI to help with viewing large pipelines
- See issue list on Github for other specific plans
- Big redesign of datatype implementation is documented as a Github project
Todos¶
The following to-dos appear elsewhere in the docs. They are generally bits of the documentation I’ve not written yet, but am aware are needed.
Todo
Not got these things working yet, but they’ll be useful in the long run
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/src/python/pimlico/utils/urwid.py:docstring of pimlico.utils.urwid, line 8.)
Todo
Describe how module dependencies are defined for different types of deps
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/core/dependencies.rst, line 73.)
Todo
Include some examples from the core modules of how deps are defined and some special cases of software fetching
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/core/dependencies.rst, line 80.)
Todo
Write documentation for this
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/core/module_structure.rst, line 9.)
Todo
Filter module guide needs to be updated for new datatypes. This section is currently completely wrong – ignore it! This is quite a substantial change.
The difficulty of describing what you need to do here suggests we might want to provide some utilities to make this easier!
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/guides/filters.rst, line 31.)
Todo
Write a guide to building document map modules.
For now, the skeletons below are a useful starting point, but there should be a more fulsome explanation here of what document map modules are all about and how to use them.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/guides/map_module.rst, line 5.)
Todo
Document map module guides needs to be updated for new datatypes.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/guides/map_module.rst, line 12.)
Todo
Module writing guide needs to be updated for new datatypes.
In particular, the executor example and datatypes in the module definition need to be updated.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/guides/module.rst, line 23.)
Todo
Setup guide has a lot that needs to be updated for the new datatypes system. I’ve updated up to Getting input.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/guides/setup.rst, line 5.)
Todo
Continue writing from here
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/guides/setup.rst, line 110.)
Todo
Add test pipeline and test
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/modules/pimlico.modules.gensim.lda.rst, line 15.)
Todo
Add test pipeline and test
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/modules/pimlico.modules.gensim.lda_doc_topics.rst, line 19.)
Todo
Add test pipeline. This is slightly difficult, as we need a small FastText binary file, which is harder to produce, since you can’t easily just truncate a big file.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/modules/pimlico.modules.input.embeddings.fasttext_gensim.rst, line 27.)
Todo
Update to new datatypes system and add test pipeline
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/modules/pimlico.modules.input.text_annotations.vrt_text.rst, line 24.)
Todo
Currently skipped from module doc generator, until updated
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/modules/pimlico.modules.input.text_annotations.vrt_text.rst, line 28.)
Todo
Add test pipeline
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/pimlico/checkouts/python3/docs/modules/pimlico.modules.input.xml.rst, line 15.)