about_regular_expressions

About Regular Expressions

IMSVA treats all keyword expressions as regular expressions. IMSVA supports the following regular expressions.

Regular Expression

Description

. (dot)

Any character (byte) except newline

x

The character 'x'

\\

The character '\'

\a

The alert (bell) character (ASCII 0x07)

\b

  1. If this meta-symbol is within square brackets [] or by itself, it will be treated as the backspace character (ASCII 0x08). For example, [\b] or \b

  2. If this meta-symbol is at the beginning (or end) of a regular expression, it means any matched string of the regular expression must check whether the left (or right) side of the matched string is a boundary. For example,

    • \bluck > left side must be the boundary

    • luck\b > right side must be the boundary

    • \bluck\b > both sides must be the boundary

  3. If this meta-symbol appears in the middle of a regular expression, it will cause a syntax error.

\f

The form-feed character (ASCII 0x0C)

\n

The newline (line feed) character (ASCII 0x0A)

\r

The carriage-return character (ASCII 0x0D)

\t

The normal (horizontal) tab character (ASCII 0x09)

\v

The vertical tab character (ASCII 0x0B)

\n

The character with octal value 0n (0 <= n <= 7)

\nn

The character with octal value 0nn (0 <= n <= 7)

\mnn

The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)

\xhh

The character with a hexadecimal value 0xhh, for example, \x20 means the space character

Bracket expressions are a list of characters and/or character classes enclosed in brackets []. Use bracket expressions to match single characters in a list, or a range of characters in a list. If the first character of the list is the carat ^ then it matches characters that are not in the list.

For example:

Expression

Matches

[abc]

a, b, or c

[a-z]

a through z

[^abc]

Any character except a, b, or c

[[:alpha:]]

Any alphabetic character (see below)

Each character class designates a set of characters equivalent to the corresponding standard C isXXX function. For example, [:alpha:] designates those characters for which isalpha() returns true (example: any alphabetic character). Character classes must be within bracket expression.

Character class

Description

[:alpha:]

Alphabetic characters

[:digit:]

Digits

[:alnum:]

Alphabetic characters and numeric characters

[:cntrl:]

Control character

[:blank:]

Space and tab

[:space:]

All white space characters

[:graph:]

Non-blank (not spaces, control characters, or the like)

[:print:]

Like [:graph:], but includes the space character

[:punct:]

Punctuation characters

[:lower:]

Lowercase alphabetic

[:upper:]

Uppercase alphabetic

[:xdigit:]

Digits allowed in a hexadecimal number (0-9a-fA-F)

For a case-insensitive expression, [:lower:] and [:upper:] are equivalent to [:alpha:].

Expression

Description

^

Beginning of line

$

End of line

Expression

Description

R?

Matches R, once or not at all

R*

Matches R, zero or more times

R+

Matches R, one or more times

R{n}

Matches R, exactly n times

R{n,}

Matches R, at least n times

R{n,m}

Matches R, at least n but no more than m times

R is a regular expression.

Trend Micro does not recommend using ".*" in a regular expression. ".*" matches any length of letters and the large number of matches may increase memory usage and affect performance.

For example:

If the content is 123456abc, the regular expression ".*abc" match results are:

In this example, replace ".*abc" with "abc" to prevent excessive use of resources.

Expression

Description

RS

R followed by S (concatenation)

R|S

Either R or S

R/S

An R but only if it is followed by S

(R)

Grouping R

R and S are regular expressions

eManager provides the following shorthand for writing complicated regular expressions. eManager will pre-process expressions and translate the shorthand into regular expressions.

For example, {D}+ would be translated to [0-9]+. If a shorthand expression is enclosed in brackets (example: {}) or double-quotes, then IMSVA will not translate that shorthand expression to a regular expression.

Shorthand

Description

{D}

[0-9]

{L}

[A-Za-z]

{SP}

[(),;\.\\<>@\[\]:]

{NUMBER}

[0-9]+

{WORD}

[A-Za-z]+

{CR}

\r

{LF}

\n

{LWSP}

[ \t]

{CRLF}

(\r\n)

{WSP}

[ \t\f]+

{ALLC}

.

eManager also provides the following meta-symbols. The difference between shorthand and meta-symbols is that meta-symbols can be within a bracket expression.

Meta-symbol

Description

\s

[[:space:]]

\S

[^[:space:]]

\d

[[:digit:]]

\D

[^[:digit:]]

\w

[_[:alnum:]]

\W

[^_[:alnum:]]

To match a character that has a special meaning in regular expressions (example: +), you need to use the backslash \ escape character. For example, to match string C/C++, use the expression C\/C\+\+.

Sometimes, you have to add many escape characters to your expression (example: C\/C\+\+). In this situation, enclose the string C/C++ in double-quotes (example: .REG "C/C++") then the new expression is equivalent to the old one. Characters (except \ which is an escape character) within double-quotes are literal. The following are some examples:

Expression

Description

"C/C++"

Match string C/C++ (does not include double-quotes)

"Regu­lar\x20Expression"

Match string Regular Expression (does not include double-quotes), where \x20 means the space charac­ter.

"[xyz]\"foo"

Match the literal string: [xyz]"foo

Change the adjacent <space> to "\x20" for the following in a regular expression:

To match "abc AND def", use "abc\x20.AND.\x20def" instead of "abc .AND. def".