What are regular expressions? Parent topic

Regular expressions are used to perform string matching. See the following tables for some common examples of regular expressions.
Note
Note
Regular expressions are a powerful string matching tool. For this reason, it is recommended that an administrator who chooses to use regular expressions should be familiar and comfortable with regular expression syntax. Poorly written regular expressions can have a negative performance impact. Trend Micro’s recommendation is to start with simple regular expressions that do not use complex syntax. When introducing new rules, use the backup action and observe how ScanMail manages messages using your rule. When you are confident that the rule has no unexpected consequences, you can change your action.

Counting and Grouping

Element
What It Means
Example
.
The dot or period character represents any character except new line character.
do. matches doe, dog, don, dos, dot, etc.d.r matches deer, door, etc.
*
The asterisk character means zero or more instances of the preceding element.
do* matches d, do, doo, dooo, doooo, etc.
+
The plus sign character means one or more instances of the preceding element.
do+ matches do, doo, dooo, doooo, etc. but not d
?
The question mark character means zero or one instances of the preceding element.
do?g matches dg or dog but not doog, dooog, etc.
( )
Parenthesis characters group whatever is between them to be considered as a single entity.
d(eer)+ matches deer or deereer or deereereer, etc. The + sign is applied to the substring within parentheses, so the regex looks for d followed by one or more of the grouping "eer." 
[ ]
Square bracket characters indicate a set or a range of characters.
d[aeiouy]+ matches da, de, di, do, du, dy, daa, dae, dai, etc. The + sign is applied to the set within brackets parentheses, so the regex looks for d followed by one or more of any of the characters in the set [aeioy].
d[A-Z] matches dA, dB, dC, and so on up to dZ. The set in square brackets represents the range of all upper-case letters between A and Z.
^
Carat characters within square brackets logically negate the set or range specified, meaning the regex will match any character that is not in the set or range.
d[^aeiouy] matches db, dc or dd, d9, d#. d followed by any single character except a vowel.
{ }
Curly brace characters set a specific number of occurrences of the preceding element. A single value inside the braces means that only that many occurrences will match. A pair of numbers separated by a comma represents a set of valid counts of the preceding character. A single digit followed by a comma means there is no upper bound.
da{3} matches daaa. d followed by 3 and only 3 occurrences of "r;a". da{2,4} matches daa, daaa, daaaa, and daaaa (but not daaaaa). d followed by 2, 3, or 4 occurrences of "r;a". da{4,} matches daaaa, daaaaa, daaaaaa, etc. d followed by 4 or more occurrences of "r;a".

Character Classes (shorthand)

Element
What It Means
Example
\d
Any digit character; functionally equivalent to [0-9] or [[:digit:]]
\d matches 1, 12, 123, etc., but not 1b7. One or more of any digit characters.
\D
Any non-digit character; functionally equivalent to [^0-9] or [^[:digit:]]
\D matches a, ab, ab&, but not 1. One or more of any character but 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.
\w
Any "word" character. That is, any alphanumeric character; functionally equivalent to [_A-Za-z0-9] or [_[:alnum:]]
\w matches a, ab, a1, but not !&. One or more upper- or lower-case letters or digits, but not punctuation or other special characters.
\W
Any non-alphanumeric character; functionally equivalent to [^_A-Za-z0-9] or [^_[:alnum:]]
\W matches *, &, but not ace or a1. One or more of any character but upper- or lower-case letters and digits.
\s
Any white space character; space, new line, tab, non-breaking space, etc.; functionally equivalent to [[:space]]
vegetable\s matches "vegetable" followed by any white space character. So the phrase "I like vegetables in my soup" would not trigger the regex, but "I like a vegetable in my soup" would.
\S
Any non-white space character; anything other than a space, new line, tab, non-breaking space, etc.; functionally equivalent to [^[:space]]
vegetable\S matches "vegetable" followed by any non-white space character. So the phrase "I like vegetables in my soup" would trigger the regex, but "I like a vegetable in my soup" would not.

Character Classes

Element
What It Means
Example
[:alpha:]
Any alphabetic characters
.REG. [[:alpha:]] matches abc, def, xxx, but not 123 or @#$.
[:digit:]
Any digit character; functionally equivalent to  \d
.REG. [[:digit:]] matches 1, 12, 123, etc.
[:alnum:]
Any "word" character. That is, any alphanumeric character; functionally equivalent to \w
.REG. [[:alnum:]] matches abc, 123, but not ~!@.
[:space:]
Any white space character; space, new line, tab, non-breaking space, etc.; functionally equivalent to \s
.REG. (vegetable)[[:space:]] matches "vegetable" followed by any white space character. So the phrase "I like a vegetable in my soup" would trigger the regex, but "I like vegetables in my soup" would not.
[:graph:]
Any characters except space, control characters or the like
.REG. [[:graph:]] matches 123, abc, xxx, ><", but not space or control characters.
[:print:]
Any characters (similar with [:graph:]) but includes the space character
.REG. [[:print:]] matches 123, abc, xxx, ><", and space characters.
[:cntrl:]
Any control characters (e.g. CTRL + C, CTRL + X)
.REG. [[:cntrl:]] matches 0x03, 0x08, but not abc, 123, !@#.
[:blank:]
Space and tab characters
.REG. [[:blank:]] matches space and tab characters, but not 123, abc, !@#
[:punct:]
Punctuation characters
.REG. [[:punct:]] matches ; : ? ! ~ @ # $ % & * ’r; "r; , etc., but not 123, abc
[:lower:]
Any lowercase alphabetic characters
Note
Note
Enable case sensitive matching must be enabled or else it will function as [:alnum:].
.REG. [[:lower:]] matches abc, Def, sTress, Do, etc., but not ABC, DEF, STRESS, DO, 123, !@#.
[:upper:]
Any uppercase alphabetic characters
Note
Note
Enable case sensitive matching must be enabled or else it will function as [:alnum:].
.REG. [[:upper:]] matches ABC, DEF, STRESS, DO, Def, Stress, Do, etc., but not abc, 123, !@#.
[:xdigit:]
Digits allowed in a hexadecimal number (0-9a-fA-F)
.REG. [[:xdigit:]] matches 0a, 7E, 0f, etc.

Pattern Anchor Regular Expressions

Element
What It Means
Example
^
Indicates the beginning of a string.
^ (notwithstanding) matches any block of text that began with "notwithstanding" So the phrase "notwithstanding the fact that I like vegetables in my soup" would trigger the regex, but "The fact that I like vegetables in my soup notwithstanding" would not.
$
Indicates the end of a string.
(notwithstanding) $ matches any block of text that ended with "notwithstanding" So the phrase "notwithstanding the fact that I like vegetables in my soup" would not trigger the regex, but "The fact that I like vegetables in my soup notwithstanding" would.
\
In order to match some characters that have special meaning in regular expression (for example, "+").
  • .REG. C\\C\+\+ matches ’r;C\C++’.
  • .REG. \* matches *.
  • .REG. \? matches ?.
\t
Indicates a tab character.
(stress) \t matches any block of text that contained the substring "stress" immediately followed by a tab (ASCII 0x09) character.
\n
Indicates a new line character.
Note
Note
Different platforms represent a new line character. On Windows, a new line is a pair of characters, a carriage return followed by a line feed. On Unix and Linux, a new line is just a line feed, and on Macintosh a new line is just a carriage return.
(stress) \n matches any block of text that contained the substring "stress" followed immediately by two new line (ASCII 0x0A) characters.
\r
Indicates a carriage return character.
(stress) \r matches any block of text that contained the substring "stress" followed immediately by one carriage return (ASCII 0x0D) character.
\b
Indicates a backspace character
(stress) \b matches any block of text that contained the substring "r;stress" followed immediately by one backspace (ASCII 0x08) character.
\xhh
Indicates an ASCII character with given hexadecimal code (where hh represents any two-digit hex value).
\x7E(\w){6} matches any block of text containing a "word" of exactly six alphanumeric characters preceded with a ~ (tilde) character. So, the words ’r;~ab12cd’, ’r;~Pa3499’ would be matched, but ’r;~oops’ would not.