Tips on Understanding Microsoft Regular Expressions
Joseph Parlas, CCSI, CCVP, CCNP, CCNA,
A+, MCSE
Abstract
This white paper focuses on the regular expression process and
the syntax used by the Microsoft OCS Expert to create a dial plan
and normalization rules that will be properly interpreted and
executed.
Introduction
Microsoft has introduced regular expressions for the main
purpose of normalizing E.164 numbers and allowing users to dial
numbers by a pattern they are accustomed to and to define routes to
send to an external gateway for PSTN connectivity. Regular
expressions are also used for Address Book translations of numbers
in users contact database that would have to be converted to the
E.164 format.
This white paper focuses on the regular expression process and
the syntax used by the Microsoft OCS (Optional Component Manager)
Expert to create a dial plan and normalization rules that will
properly be interpreted and executed.
We also will be introducing tool sets that can be used right on
your XP or Vista computer to test regular expression constructs
without disturbing the corporate production environment.
Let’s start by looking at some of the basic constructs of the
regular expression itself by homing on some of the basic symbols
used. These examples are from a pdf document that can be downloaded
from http://www.addedbytes.com/.
The first building block symbols are ^ and $.
^ means the start of the string or “must
start here”
$ end of the string or “must end here
The starting point of your regular expression should be ^$, then
add the rest of the constructs between them.
These symbols are also referred to as anchors, and represent the
start and end of whatever you are looking for.The next series of
symbols to consider, which are part of groups or ranges, are listed
below.
( ) The parentheses represent a set or a group reference where
(defines the beginning of a group declaration and ) defines an end
of a group declaration. An example of grouping is (\d{3}). For now
don’t worry about the \d{3} within the group; that will be
discussed below. Just understand we create the grouping of our
expressions using the () symbols.
After declaring a group, it is referenced on replacement as $1
where 1 represents the first group placement. We will use this in
an example to better understand the relationship.
[ ] Brackets represent a range of items you are looking for.
Only one item is matched within a range specification. As an
example, all North America Numbering codes for service are 211,
311, 411, 511, 611, 711, 811, and 911. I could represent all these
variations using the range specification [2-9]11 where the range
will fall on 2 - 9.
\ The back slash represents an escape character, which means to
escape a meaning of something because you are trying to match it as
a character and not use it as an regular expression option. For
instance, + means 1 or more in regular expression language;
however, I need to match + as part of an E.164 number. To prevent
it from being used as a regular expression verb, we add a ‘\’
before the value as in this example : ‘^\+404’. We are looking for
+404 as a number and do not want to use the actual noun one or
more.
\d This is a character class that is used to represent the
number of digits, regardless of the actual value. For instance, if
I am looking for 3 digits in the range of 0-9, I could use ‘/d/d/d’
since /d would represent one digit condition. This could be awkward
if we want to find, for example, a combination of 9 digits. The
alternative is to use a number representation in curly braces {N}
where N is an integer value of the number of digits you are trying
to match. So in our previous example to match any 3 [0-9]
combinations I could write this two ways. The first way, described
earlier, is ‘/d/d/d’ but an easier approach would be to
use’/d{3}’.
Now let’s look at other symbols that are used routinely in
OCS.
Quantifiers
* - means 0 or more digits to follow
+ - means at least 1 or more digits to follow
? - means exactly 0 or 1 more digits to follow
{6} - means exactly 6 digits to follow
{3,} - means at least 3 or more digits to follow
{5,9} - means 5, 6, 7,8 or 9
Note that the quantifiers with {} are typically used after the
\d character class.
Assertions
?= - this is a look ahead
?| - negative look ahead
?() - if then condition
?()| - if then else condition
?# - place a comment
Related Courses
Configuring Microsoft Office Communications Server 2007