Regex annotated text matcher

Path pimlico.modules.regex.annotated_text
Executable yes

Todo

Document this module

Inputs

Name Type(s)
documents TarredCorpus<WordAnnotationsDocumentType>

Outputs

Name Type(s)
documents KeyValueListCorpus

Options

Name Description Type
expr (required) An expression to determine what to search for in sentences. Consists of a sequence of tokens, each matching one field in the corresponding token’s annotations in the data. These are specified in the form field[x], where field is the name of a field supplied by the input data and x is the value required of that field. If x ends in a , it will match prefixes: e.g. pos[NN]. If no field name is given, the default ‘word’ is used. A token of the form ‘x=y’ matches the expression y as above and assigns the matching word to the extracted variable x (to be output). You may also extract a different annotation field by specifying x=f:y, where f is the field name to be extracted. E.g. ‘what a=lemma:pos[NN*] lemma[come] with b=pos[NN*]’ matches phrases like ‘what meals come with fries’, producing ‘a=meal’ and ‘b=fries’. Both pos and lemma need to be fields in the dataset’. If you give multiple whole expressions separated by |s, matches will be collected from all of them string