Advanced Regular Expressions Parent topic

Content Filtering keywords support regular expression declarations. See the following tables for more in-depth examples of regular expressions.
There are a number of websites and tutorials available online. One such site is the PerlDoc site, which can be found at:

Counting and Grouping

Element
Meaning
Example
.
The dot or period character represents any character (except the new line character).
do. matches:
doe, dog, don, dos, dot
d..r matches:
deer, door
*
The asterisk character means zero or more instances of the preceding element.
do* matches:
d, do, doo, dooo, doooo
+
The plus sign character means one or more instances of the preceding element.
do+ matches:
do, doo, dooo, doooo but not d
?
The question mark character means zero or one instances of the preceding element.
do? matches:
d or do but not doo, dooo
( )
Parenthesis characters group whatever is between them to be considered as a single entity.
d(eer)+ matches:
deer or deereer or deereereer
The + sign is applied to the substring within parentheses, so the regular expression looks for d followed by one or more of the grouping eer.
[ ]
Square bracket characters indicate a set or a range of characters.
d[aeiouy]+ matches:
da, de, di, do, du, dy, daa, dae, dai
The + sign is applied to the set within brackets, so the regular expression looks for d followed by one or more of any of the characters in the set [aeioy].
d[A-Z] matches:
dA, dB, dC, and so on up to dZ.
The set in square brackets represents the range of all upper-case letters between A and Z.
[^ ]
Caret characters within square brackets logically negate the set or range specified, meaning the regular expression will match any character that is not in the set or range.
d[^aeiouy] matches:
db, dc or dd, d9, d#--d followed by any single character except a vowel
{ }
Curly brace characters set a specific number of occurrences of the preceding element. A single value inside the braces means that only that many occurrences will match. A pair of numbers separated by a comma represents a set of valid counts of the preceding character. A single digit followed by a comma means there is no upper bound.
da{3} matches:
daaa--d followed by 3 and only 3 occurrences of “a”
da{2,4} matches:
daa, daaa, daaaa, and daaaa (but not daaaaa)--d followed by 2, 3, or 4 occurrences of a
da{4,} matches:
daaaa, daaaaa, daaaaaa--d followed by 4 or more occurrences of a.

Shorthand Classes

Element
Meaning
Example
\d
Any digit character; functionally equivalent to [0-9] or [[:digit:]]
\d matches:
1, 12, 123, but not 1b7--one or more of any digit characters.
\D
Any non-digit character; functionally equivalent to [^0-9] or [^[:digit:]]
\D matches:
a, ab, ab&, but not 1--one or more of any character but 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.
\w
Any "word" character--that is, any alphanumeric character; functionally equivalent to [A-Za-z0-9] or [ [:alnum:]]
\w matches:
a, ab, a1, but not !&--one or more upper- or lower-case letters or digits, but not punctuation or other special characters.
\W
Any non-alphanumeric character; functionally equivalent to [^A-Za-z0-9] or [^ [:alnum:]]
\W matches:
*, &, but not ace or a1--one or more of any character but upper- or lower-case letters and digits.
\s
Any white space character; space, new line, tab, non-breaking space, and others; functionally equivalent to [[:space]]
vegetable\s matches:
"vegetable" followed by any white space character
So the phrase "I like a vegetable in my soup" would trigger the regular expression, but "I like vegetables in my soup" would not.
\S
Any non-white space character; anything other than a space, new line, tab, non-breaking space, and others; functionally equivalent to [^[:space]]
vegetable\S matches:
"vegetable" followed by any non-white space character
So the phrase "I like vegetables in my soup" would trigger the regular expression, but "I like a vegetable in my soup" would not.

Character Classes

Element
Meaning
Example
[:alpha:]
Any alphabetic characters
.REG. [[:alpha:]] matches:
abc, def, xxx, but not 123, @#$.
[:digit:]
Any digit character; functionally equivalent to \d
.REG. [[:digit:]] matches:
1, 12, 123
[:alnum:]
Any "word" character--that is, any alphanumeric character; functionally equivalent to \w
.REG. [[:alnum:]] matches:
abc, 123, but not ~!@.
[:space:]
Any white space character; space, new line, tab, non-breaking space; functionally equivalent to \s
.REG. (vegetable)[[:space:]] matches:
"vegetable" followed by any white space character
So the phrase "I like a vegetable in my soup" would trigger the regular expression, but "I like vegetables in my soup" would not.
[:graph:]
Any characters except space, control characters, or other similar characters
.REG. [[:graph:]] matches:
123, abc, xxx, ><”, but not space or control characters.
[:print:]
Any characters (similar with [:graph:]) but includes the space character
.REG. [[:print:]] matches:
123, abc, xxx, ><”, and space characters.
[:cntrl:]
Any control character (for example, CTRL + C, CTRL + X)
.REG. [[:cntrl:]] matches:
0x03, 0x08, but not abc, 123, !@#.
[:blank:]
Space and tab characters
.REG. [[:blank:]] matches:
space and tab characters, but not 123, abc, !@#
[:punct:]
Punctuation characters
.REG. [[:punct:]] matches:
; : ? ! ~ @ # $ % & * ‘ “ , but not 123, abc
[:lower:]
Any lowercase alphabetic character
Note
Note
Enable case sensitive matching must be enabled or else it will function as [:alnum:])
.REG. [[:lower:]] matches:
abc, Def, sTress, Do, but not ABC, DEF, STRESS, DO, 123, !@#.
[:upper:]
Any uppercase alphabetic character
Note
Note
Enable case sensitive matching must be enabled or else it will function as [:alnum:])
.REG. [[:upper:]] matches:
ABC, DEF, STRESS, DO, but not abc, Def, Stress, Do, 123, !@#.
[:xdigit:]
Digits allowed in a hexadecimal number (0-9a-fA-F)
.REG. [[:xdigit:]] matches:
0a, 7E, 0f

Pattern Anchor Regular Expressions

Element
Meaning
Example
^
Indicates the beginning of a string
^(notwithstanding) matches:
Any block of text that begins with "notwithstanding"
So the phrase "notwithstanding the fact that I like vegetables in my soup" would trigger the regular expression, but "The fact that I like vegetables in my soup notwithstanding" would not.
$
Indicates the end of a string
(notwithstanding)$ matches:
Any block of text that ends with "notwithstanding"
So the phrase "notwithstanding the fact that I like vegetables in my soup" would not trigger the regular expression, but "The fact that I like vegetables in my soup notwithstanding" would.

Escape Sequences and Literal Strings

Element
Meaning
Example
\
matches
Indicates that some characters match a special meaning in a regular expression (for example, +)
.REG. C\/C\+\+ matches:
‘C\C++’
.REG. \* matches:
*
.REG. \? matches:
?
\t
Indicates a tab character (ASCII 0x09 character)
(stress)\t matches:
Any block of text that contained the substring "stress" immediately followed by a tab.
\n
Indicates a new line character (ASCII 0x0A character)
Note
Note
Different platforms represent a new line character differently. On Windows, a new line is a pair of characters, a carriage return followed by a line feed. On UNIX and Linux, a new line is just a line feed, and on Macintosh a new line is just a carriage return.
(stress)\n\n matches:
Any block of text that contained the substring "stress" followed immediately by two new line characters.
\r
Indicates a carriage return character (ASCII 0x0D character)
(stress)\r matches:
Any block of text that contained the substring "stress" followed immediately by one carriage return.
\xhh
Indicates an ASCII character with given hexadecimal code (where hh represents any two-digit hex value)
\x7E(\w){6} matches:
Any block of text containing a "word" of exactly six alphanumeric characters preceded with a ~ (tilde) character.
Additional examples that will trigger a match: ~ab12cd and ~Pa3499.
\b
Indicates a backspace character
(stress)\b matches:
Any block of text that contained the substring “stress” followed immediately by one backspace (ASCII 0x08) character