Regex extract¤
Description¤
The Regex extract plugin extracts one or all matches of a regular expression within the input.
This plugin is an extraction transformer plugin. It is configured with the parameters regex and extractAll. The
regular expression regex is simply the pattern used in the matching. With extractAll, we tell the Regex extract
plugin whether to extract all values (with extractAll = true) or only the first occurrence of the matching
(with extractAll = false, which is the default).
Additionally to normal regular expressions, we can also use capturing groups such as in (A)(B)(C) instead of just
ABC. If capturing groups are used in a regular expression, only the first capturing group will be considered. This
does not mean the first matching group, but the first capturing group in the regex.
Notes on regular expressions¤
The most commonly used examples of regular expressions are "\\s*" for representing whitespace characters, [^0-9]*
for numbers, and [a-z]* for the usual English characters between a and z. The star (*) represents an arbitrary
number of occurrences (zero included), whereas the plus sign (+) indicates a strictly positive number of occurrences
(zero excluded).
An uppercase version of the predefined character classes means negation, such as "\\S*" for non-whitespace
characters, or "\\D*" for non-digits.
Similarly, the hat sign ^ can be used for negating (arbitrary) character classes, such as [^xyz] for any character
except x, y or z.
Attention: Slashes in regular expressions have to be escaped, e.g. instead of \s we need to escape it as \\s.
Note for advanced users¤
A compilation of the available constructs for building regular expressions is available in the
API of the Java Pattern.
Examples¤
Notation: List of values are represented via square brackets. Example: [first, second] represents a list of two values “first” and “second”.
returns only the first match, when extractAll = false (default):
-
Parameters
- regex:
[a-z]{2,4}123
- regex:
-
Input values:
[afe123_abcd23]
-
Returns:
[afe123]
returns all matches, when extractAll = true:
-
Parameters
- regex:
[a-z]{2,4}123 - extractAll:
true
- regex:
-
Input values:
[afe123_abcd123]
-
Returns:
[afe123, abcd123]
returns an empty list if nothing matches:
-
Parameters
- regex:
^[a-z]{2,4}123
- regex:
-
Input values:
[abcde123]
-
Returns:
[]
returns the match of the first capturing group, which includes two to four letters:
-
Parameters
- regex:
^([a-z]{2,4})123([a-z]+)
- regex:
-
Input values:
[abcd123xyz]
-
Returns:
[abcd]
returns the match of the first capturing group, which includes at least one letter:
-
Parameters
- regex:
^([a-z]+)123([a-z]{2,4})
- regex:
-
Input values:
[pqrstuvwxyz123abcd]
-
Returns:
[pqrstuvwxyz]
returns an empty string, because the first capturing group includes the possibility of no letters:
-
Parameters
- regex:
^([a-z]*)123([a-z]{2,4})
- regex:
-
Input values:
[123abcd]
-
Returns:
[]
returns an empty list, because the first capturing group excludes the possibility of no letters:
-
Parameters
- regex:
^([a-z]+)123([a-z]{2,4})
- regex:
-
Input values:
[123abcd]
-
Returns:
[]
Example 8:
-
Parameters
- regex:
"bedeutungen"\s*:\s*\[\s*(?:"([^"]*)"(?:\s*,\s*"([^"]*)")*)*\s*\]
- regex:
-
Input values:
["bedeutungen" : [ ]]
-
Returns:
[]
Parameter¤
Regex¤
Regular expression
- ID:
regex - Datatype:
string - Default Value:
None
Extract all¤
If true, all matches are extracted. If false, only the first match is extracted (default).
- ID:
extractAll - Datatype:
boolean - Default Value:
false
Advanced Parameter¤
None
Related Plugins¤
- regexReplace — The Regex extract plugin returns what the regular expression matches, or the first capturing group if capturing groups exist. The Regex replace plugin returns the full input string after rewriting it by replacing every match with the configured replacement.
- regexSelect — The Regex selection plugin does not return matched text at all. It emits copies of a provided output value at the positions where the checked values match the regular expressions, while the Regex extract plugin returns the matched substring or capturing-group content.
- ifMatchesRegex — The If matches regex plugin uses the match only as a decision about which provided input value to return. The Regex extract plugin uses the match as the produced content, so the output is derived from the matched region rather than from branch inputs.
- validateRegex — The Validate regex plugin keeps the original value only when the full value matches the configured regular expression and otherwise fails validation. The Regex extract plugin returns match-derived output and can return an empty result when nothing matches.