about_regular_expressions
IMSVA treats all keyword expressions as regular expressions. IMSVA supports the following regular expressions.
While keywords or expressions can be created during policy creation, Trend Micro recommends creating keywords or expressions before you begin creating policies.
Characters
Regular Expression |
Description |
. (dot) |
Any character (byte) except newline |
x |
The character 'x' |
\\ |
The character '\' |
\a |
The alert (bell) character (ASCII 0x07) |
\b |
|
\f |
The form-feed character (ASCII 0x0C) |
\n |
The newline (line feed) character (ASCII 0x0A) |
\r |
The carriage-return character (ASCII 0x0D) |
\t |
The normal (horizontal) tab character (ASCII 0x09) |
\v |
The vertical tab character (ASCII 0x0B) |
\n |
The character with octal value 0n (0 <= n <= 7) |
\nn |
The character with octal value 0nn (0 <= n <= 7) |
\mnn |
The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
\xhh |
The character with a hexadecimal value 0xhh, for example, \x20 means the space character |
Bracket Expression and Character Classes
Bracket expressions are a list of characters and/or character classes enclosed in brackets []. Use bracket expressions to match single characters in a list, or a range of characters in a list. If the first character of the list is the carat ^ then it matches characters that are not in the list.
For example:
Expression |
Matches |
[abc] |
a, b, or c |
[a-z] |
a through z |
[^abc] |
Any character except a, b, or c |
[[:alpha:]] |
Any alphabetic character (see below) |
Each character class designates a set of characters equivalent to the corresponding standard C isXXX function. For example, [:alpha:] designates those characters for which isalpha() returns true (example: any alphabetic character). Character classes must be within bracket expression.
Character class |
Description |
[:alpha:] |
Alphabetic characters |
[:digit:] |
Digits |
[:alnum:] |
Alphabetic characters and numeric characters |
[:cntrl:] |
Control character |
[:blank:] |
Space and tab |
[:space:] |
All white space characters |
[:graph:] |
Non-blank (not spaces, control characters, or the like) |
[:print:] |
Like [:graph:], but includes the space character |
[:punct:] |
Punctuation characters |
[:lower:] |
Lowercase alphabetic |
[:upper:] |
Uppercase alphabetic |
[:xdigit:] |
Digits allowed in a hexadecimal number (0-9a-fA-F) |
For a case-insensitive expression, [:lower:] and [:upper:] are equivalent to [:alpha:].
Boundary Matches
Expression |
Description |
^ |
Beginning of line |
$ |
End of line |
Greedy Quantifiers
Expression |
Description |
R? |
Matches R, once or not at all |
R* |
Matches R, zero or more times |
R+ |
Matches R, one or more times |
R{n} |
Matches R, exactly n times |
R{n,} |
Matches R, at least n times |
R{n,m} |
Matches R, at least n but no more than m times |
R is a regular expression.
Trend Micro does not recommend using ".*" in a regular expression. ".*" matches any length of letters and the large number of matches may increase memory usage and affect performance.
For example:
If the content is 123456abc, the regular expression ".*abc" match results are:
12345abc
23455abc
3456abc
456abc
56abc
6abc
abc
In this example, replace ".*abc" with "abc" to prevent excessive use of resources.
Logical operators
Expression |
Description |
RS |
R followed by S (concatenation) |
R|S |
Either R or S |
R/S |
An R but only if it is followed by S |
(R) |
Grouping R |
R and S are regular expressions
Shorthand and meta-symbol
eManager provides the following shorthand for writing complicated regular expressions. eManager will pre-process expressions and translate the shorthand into regular expressions.
For example, {D}+ would be translated to [0-9]+. If a shorthand expression is enclosed in brackets (example: {}) or double-quotes, then IMSVA will not translate that shorthand expression to a regular expression.
Shorthand |
Description |
{D} |
[0-9] |
{L} |
[A-Za-z] |
{SP} |
[(),;\.\\<>@\[\]:] |
{NUMBER} |
[0-9]+ |
{WORD} |
[A-Za-z]+ |
{CR} |
\r |
{LF} |
\n |
{LWSP} |
[ \t] |
{CRLF} |
(\r\n) |
{WSP} |
[ \t\f]+ |
{ALLC} |
. |
eManager also provides the following meta-symbols. The difference between shorthand and meta-symbols is that meta-symbols can be within a bracket expression.
Meta-symbol |
Description |
\s |
[[:space:]] |
\S |
[^[:space:]] |
\d |
[[:digit:]] |
\D |
[^[:digit:]] |
\w |
[_[:alnum:]] |
\W |
[^_[:alnum:]] |
Literal string and escape character of regular expressions
To match a character that has a special meaning in regular expressions (example: +), you need to use the backslash \ escape character. For example, to match string C/C++, use the expression C\/C\+\+.
Sometimes, you have to add many escape characters to your expression (example: C\/C\+\+). In this situation, enclose the string C/C++ in double-quotes (example: .REG "C/C++") then the new expression is equivalent to the old one. Characters (except \ which is an escape character) within double-quotes are literal. The following are some examples:
Expression |
Description |
"C/C++" |
Match string C/C++ (does not include double-quotes) |
"Regular\x20Expression" |
Match string Regular Expression (does not include double-quotes), where \x20 means the space character. |
"[xyz]\"foo" |
Match the literal string: [xyz]"foo |
Change the adjacent <space> to "\x20" for the following in a regular expression:
.AND.
.OR.
.NOT.
.WILD.
To match "abc AND def", use "abc\x20.AND.\x20def" instead of "abc .AND. def".