Regular
expressions are used to perform string matching. See the following tables
for some common examples of regular expressions.
NoteRegular
expressions are a powerful string matching tool. For this reason,
it is recommended that an administrator who chooses to use regular
expressions should be familiar and comfortable with regular expression
syntax. Poorly written regular expressions can have a negative performance
impact. Trend Micro’s recommendation
is to start with simple regular expressions that do not use complex
syntax. When introducing new rules, use the backup action and observe how ScanMail manages messages
using your rule. When you are confident that the rule has no unexpected
consequences, you can change your action.
|
Counting and Grouping
Element
|
What It Means
|
Example
|
.
|
The dot or period character represents any
character except new line character.
|
do. matches doe,
dog, don, dos, dot, etc.d.r matches deer, door, etc.
|
*
|
The asterisk character means zero or more
instances of the preceding element.
|
do* matches d, do,
doo, dooo, doooo, etc.
|
+
|
The plus sign character means one or more
instances of the preceding element.
|
do+ matches do, doo,
dooo, doooo, etc. but not d
|
?
|
The question mark character means zero or
one instances of the preceding element.
|
do?g matches dg or
dog but not doog, dooog, etc.
|
( )
|
Parenthesis characters group whatever is
between them to be considered as a single entity.
|
d(eer)+ matches deer
or deereer or deereereer, etc. The + sign is applied to the substring
within parentheses, so the regex looks for d followed by one or
more of the grouping "eer."
|
[ ]
|
Square bracket characters indicate a set
or a range of characters.
|
d[aeiouy]+ matches
da, de, di, do, du, dy, daa, dae, dai, etc. The + sign is applied
to the set within brackets parentheses, so the regex looks for d
followed by one or more of any of the characters in the set [aeioy].
d[A-Z] matches
dA, dB, dC, and so on up to dZ. The set in square brackets represents
the range of all upper-case letters between A and Z.
|
^
|
Carat characters within square brackets
logically negate the set or range specified, meaning the regex will
match any character that is not in the set or range.
|
d[^aeiouy] matches
db, dc or dd, d9, d#. d followed by any single character except
a vowel.
|
{ }
|
Curly brace characters set a specific number
of occurrences of the preceding element. A single value inside the
braces means that only that many occurrences will match. A pair
of numbers separated by a comma represents a set of valid counts
of the preceding character. A single digit followed by a comma means
there is no upper bound.
|
da{3} matches daaa.
d followed by 3 and only 3 occurrences of "r;a". da{2,4} matches
daa, daaa, daaaa, and daaaa (but not daaaaa). d followed by 2, 3,
or 4 occurrences of "r;a". da{4,} matches daaaa, daaaaa, daaaaaa,
etc. d followed by 4 or more occurrences of "r;a".
|
Character Classes (shorthand)
Element
|
What It Means
|
Example
|
\d
|
Any digit character; functionally equivalent
to [0-9] or [[:digit:]]
|
\d matches 1, 12,
123, etc., but not 1b7. One or more of any digit characters.
|
\D
|
Any non-digit character; functionally equivalent
to [^0-9] or [^[:digit:]]
|
\D matches a, ab,
ab&, but not 1. One or more of any character but 0, 1, 2, 3,
4, 5, 6, 7, 8, or 9.
|
\w
|
Any "word" character. That is, any alphanumeric
character; functionally equivalent to [_A-Za-z0-9] or [_[:alnum:]]
|
\w matches a, ab,
a1, but not !&. One or more upper- or lower-case letters or
digits, but not punctuation or other special characters.
|
\W
|
Any non-alphanumeric character; functionally
equivalent to [^_A-Za-z0-9] or [^_[:alnum:]]
|
\W matches *, &,
but not ace or a1. One or more of any character but upper- or lower-case
letters and digits.
|
\s
|
Any white space character; space, new line,
tab, non-breaking space, etc.; functionally equivalent to [[:space]]
|
vegetable\s matches "vegetable" followed by any white space
character. So the phrase "I like vegetables in my soup" would
not trigger the regex, but "I like a vegetable in my soup"
would.
|
\S
|
Any non-white space character; anything
other than a space, new line, tab, non-breaking space, etc.; functionally
equivalent to [^[:space]]
|
vegetable\S matches "vegetable"
followed by any non-white space character. So the phrase "I like
vegetables in my soup" would trigger the regex, but "I like a vegetable
in my soup" would not.
|
Character Classes
Element
|
What It Means
|
Example
|
||
[:alpha:]
|
Any alphabetic characters
|
.REG. [[:alpha:]] matches abc,
def, xxx, but not 123 or @#$.
|
||
[:digit:]
|
Any digit character; functionally equivalent
to \d
|
.REG. [[:digit:]] matches 1,
12, 123, etc.
|
||
[:alnum:]
|
Any "word" character. That is, any alphanumeric
character; functionally equivalent to \w
|
.REG. [[:alnum:]] matches abc,
123, but not ~!@.
|
||
[:space:]
|
Any white space character; space, new line,
tab, non-breaking space, etc.; functionally equivalent to \s
|
.REG. (vegetable)[[:space:]] matches
"vegetable" followed by any white space character. So the phrase
"I like a vegetable in my soup" would trigger the regex, but "I
like vegetables in my soup" would not.
|
||
[:graph:]
|
Any characters except space, control characters
or the like
|
.REG. [[:graph:]] matches 123,
abc, xxx, ><", but not space or control characters.
|
||
[:print:]
|
Any characters (similar with [:graph:])
but includes the space character
|
.REG. [[:print:]] matches 123,
abc, xxx, ><", and space characters.
|
||
[:cntrl:]
|
Any control characters (e.g. CTRL + C, CTRL
+ X)
|
.REG. [[:cntrl:]] matches 0x03,
0x08, but not abc, 123, !@#.
|
||
[:blank:]
|
Space and tab characters
|
.REG. [[:blank:]] matches space
and tab characters, but not 123, abc, !@#
|
||
[:punct:]
|
Punctuation characters
|
.REG. [[:punct:]] matches
; : ? ! ~ @ # $ % & * ’r; "r; , etc., but not 123, abc
|
||
[:lower:]
|
Any lowercase alphabetic characters
|
.REG. [[:lower:]] matches abc,
Def, sTress, Do, etc., but not ABC, DEF, STRESS, DO, 123, !@#.
|
||
[:upper:]
|
Any uppercase alphabetic characters
|
.REG. [[:upper:]] matches ABC,
DEF, STRESS, DO, Def, Stress, Do, etc., but not abc, 123, !@#.
|
||
[:xdigit:]
|
Digits allowed in a hexadecimal number (0-9a-fA-F)
|
.REG. [[:xdigit:]] matches
0a, 7E, 0f, etc.
|
Pattern Anchor Regular Expressions
Element
|
What It Means
|
Example
|
||
^
|
Indicates the beginning of a string.
|
^ (notwithstanding) matches
any block of text that began with "notwithstanding" So the phrase
"notwithstanding the fact that I like vegetables in my soup" would
trigger the regex, but "The fact that I like vegetables in my soup
notwithstanding" would not.
|
||
$
|
Indicates the end of a string.
|
(notwithstanding) $ matches
any block of text that ended with "notwithstanding" So the phrase
"notwithstanding the fact that I like vegetables in my soup" would
not trigger the regex, but "The fact that I like vegetables in my
soup notwithstanding" would.
|
||
\
|
In order to match some characters that have
special meaning in regular expression (for example, "+").
|
|
||
\t
|
Indicates a tab character.
|
(stress) \t matches
any block of text that contained the substring "stress" immediately
followed by a tab (ASCII 0x09) character.
|
||
\n
|
Indicates a new line character.
|
(stress) \n matches
any block of text that contained the substring "stress" followed
immediately by two new line (ASCII 0x0A) characters.
|
||
\r
|
Indicates a carriage return character.
|
(stress) \r matches
any block of text that contained the substring "stress" followed
immediately by one carriage return (ASCII 0x0D) character.
|
||
\b
|
Indicates a backspace character
|
(stress) \b matches
any block of text that contained the substring "r;stress" followed
immediately by one backspace (ASCII 0x08) character.
|
||
\xhh
|
Indicates an ASCII character with given
hexadecimal code (where hh represents any two-digit hex value).
|
\x7E(\w){6} matches
any block of text containing a "word" of exactly six alphanumeric
characters preceded with a ~ (tilde) character. So, the words ’r;~ab12cd’, ’r;~Pa3499’
would be matched, but ’r;~oops’ would not.
|