Regex annotated text matcher¶

Path	pimlico.modules.regex.annotated_text
Executable	yes

Todo

Document this module

Inputs¶

Name	Type(s)
documents	TarredCorpus<WordAnnotationsDocumentType>

Name	Type(s)
documents	`KeyValueListCorpus`

Name	Description	Type
expr	(required) An expression to determine what to search for in sentences. Consists of a sequence of tokens, each matching one field in the corresponding token’s annotations in the data. These are specified in the form field[x], where field is the name of a field supplied by the input data and x is the value required of that field. If x ends in a , it will match prefixes: e.g. pos[NN]. If no field name is given, the default ‘word’ is used. A token of the form ‘x=y’ matches the expression y as above and assigns the matching word to the extracted variable x (to be output). You may also extract a different annotation field by specifying x=f:y, where f is the field name to be extracted. E.g. ‘what a=lemma:pos[NN] lemma[come] with b=pos[NN]’ matches phrases like ‘what meals come with fries’, producing ‘a=meal’ and ‘b=fries’. Both pos and lemma need to be fields in the dataset’. If you give multiple whole expressions separated by \|s, matches will be collected from all of them	string