Regex extract¤
Description¤
The regexExtract plugin extracts one or all matches of a regular expression within the input.
This plugin is an extraction transformer plugin. It is configured with the parameters regex and extractAll. The
regular expression regex is simply the pattern used in the matching. With extractAll, we tell the regexExtract
plugin whether to extract all values (with extractAll = true) or only the first occurrence of the matching
(with extractAll = false, which is the default).
Additionally to normal regular expressions, we can also use capturing groups such as in (A)(B)(C) instead of just
ABC. If capturing groups are used in a regular expression, only the first capturing group will be considered. This
does not mean the first matching group, but the first capturing group in the regex.
Notes on regular expressions¤
The most commonly used examples of regular expressions are "\\s*" for representing whitespace characters, [^0-9]*
for numbers, and [a-z]* for the usual English characters between a and z. The star (*) represents an arbitrary
number of occurrences (zero included), whereas the plus sign (+) indicates a strictly positive number of occurrences
(zero excluded).
An uppercase version of the predefined character classes means negation, such as "\\S*" for non-whitespace
characters, or "\\D*" for non-digits.
Similarly, the hat sign ^ can be used for negating (arbitrary) character classes, such as [^xyz] for any character
except x, y or z.
Attention: Slashes in regular expressions have to be escaped, e.g. instead of \s we need to escape it as \\s.
Note for advanced users¤
A compilation of the available constructs for building regular expressions is available in the
API of the Java Pattern.
Relation to other plugins¤
Additionally to the regexExtract plugin, there are related plugins such as validateRegex, ifMatchesRegex and
regexReplace.
The distinctive feature of each of these plugins lies in what happens whenever the regular expression
matches the input value(s): the regexExtract plugin is used for extracting matches from the input, validateRegex
is useful for validating the input, ifMatchesRegex conditionally distinguishes which input to take, and
regexReplace replaces all occurrences of the matching.
Examples¤
Notation: List of values are represented via square brackets. Example: [first, second] represents a list of two values “first” and “second”.
returns only the first match, when extractAll = false (default):
- 
Parameters - regex: [a-z]{2,4}123
 
- regex: 
- 
Input values: - [afe123_abcd23]
 
- 
Returns: [afe123]
returns all matches, when extractAll = true:
- 
Parameters - regex: [a-z]{2,4}123
- extractAll: true
 
- regex: 
- 
Input values: - [afe123_abcd123]
 
- 
Returns: [afe123, abcd123]
returns an empty list if nothing matches:
- 
Parameters - regex: ^[a-z]{2,4}123
 
- regex: 
- 
Input values: - [abcde123]
 
- 
Returns: []
returns the match of the first capturing group, which includes two to four letters:
- 
Parameters - regex: ^([a-z]{2,4})123([a-z]+)
 
- regex: 
- 
Input values: - [abcd123xyz]
 
- 
Returns: [abcd]
returns the match of the first capturing group, which includes at least one letter:
- 
Parameters - regex: ^([a-z]+)123([a-z]{2,4})
 
- regex: 
- 
Input values: - [pqrstuvwxyz123abcd]
 
- 
Returns: [pqrstuvwxyz]
returns an empty string, because the first capturing group includes the possibility of no letters:
- 
Parameters - regex: ^([a-z]*)123([a-z]{2,4})
 
- regex: 
- 
Input values: - [123abcd]
 
- 
Returns: []
returns an empty list, because the first capturing group excludes the possibility of no letters:
- 
Parameters - regex: ^([a-z]+)123([a-z]{2,4})
 
- regex: 
- 
Input values: - [123abcd]
 
- 
Returns: []
Example 8:
- 
Parameters - regex: "bedeutungen"\s*:\s*\[\s*(?:"([^"]*)"(?:\s*,\s*"([^"]*)")*)*\s*\]
 
- regex: 
- 
Input values: - ["bedeutungen" : [ ]]
 
- 
Returns: []
Parameter¤
Regex¤
Regular expression
- ID: regex
- Datatype: string
- Default Value: None
Extract all¤
If true, all matches are extracted. If false, only the first match is extracted (default).
- ID: extractAll
- Datatype: boolean
- Default Value: false
Advanced Parameter¤
None