Using Regular Expressions

EMC
VoyenceControl
Version 4.0.1
Using Regular
Expressions
P/N 300-007-472
REV A01
EMC Corporation
Corporate Headquarters
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
Using Regular Expressions
COPYRIGHT
Copyright 2008 EMC Corporation. All rights reserved.
Published August 2008

EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES
NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION
IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an
applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on
EMC.com.
All other trademarks used herein are the property of their respective owners.
Using Regular Expressions (RegEx)
Version 4.0.1
Page 2
TABLE OF CONTENTS
COPYRIGHT .................................................................................................. 2
TABLE OF CONTENTS .................................................................................... 3
PREFACE....................................................................................................... 4
Who Should Read this Document? .............................................................................. 4
Related Publications ................................................................................................. 4
Accessing Publications Online .................................................................................... 4
VOYENCECONTROL AND REGEX .................................................................... 5

Using Regular Expressions (RegEx) in VoyenceControl .................................................. 5
BASIC REGEX INFORMATION ....................................................................... 6

Simple Patterns ....................................................................................................... 6
Character Classes .................................................................................................... 7
Pattern Repetition.................................................................................................... 9
Optional and Grouping............................................................................................ 10
Quoting Special Characters ..................................................................................... 11
Boundary Matchers ................................................................................................ 11
USEFUL EXAMPLES ..................................................................................... 13

ADVANCED REGEX ...................................................................................... 14
Predefined Characters and Character Classes............................................................. 14
Examples of Regular Expressions........................................................................................... 15
Union, Intersection, and Subtraction in Character Classes............................................ 16
REFERENCED GROUPS ................................................................................ 17

WHERE TO GET HELP .................................................................................. 19
Where to get help .................................................................................................. 19
INDEX ........................................................................................................ 20
Version 4.0.1
Page 3
PREFACE
Who Should Read this Document?
This guide is intended for Network Engineers, System Administrators, and those individuals
needing conceptual knowledge of the features and functions included in this release of
VoyenceControl. Those individuals who will be installing VoyenceControl can also benefit from
this document.
Related Publications
This section lists publications related to this document that should also be reviewed.
>
>
>
Installing VoyenceControl on Red Hat Enterprise Linux 4.0, P/N 300-007-465

Installing VoyenceControl on Windows Server 2003, P/N 300-007-466
Installing VoyenceControl on Solaris 10, P/N 300-007-467
Accessing Publications Online

The publications for this product are available online in Portable Document Format (PDF). To
locate product publications in the VoyenceControl Reference Library:
1.
Click the Help option on the VoyenceControl menu bar.
2.
Select Help Contents. The Online Users Guide will open.
3.
Go to Accessing Help and Additional Documents, and expand that section by clicking
the book icon.
4.
Once the book has opened, select Reference Library. The related publications mentioned
above are accessible in PDF format. These documents can now be viewed online, saved to
a defined location, or printed.
Version 4.0.1
Page 4
VOYENCECONTROL AND REGEX

Using Regular Expressions (RegEx) in VoyenceControl
Regular Expressions (RegEx) are used in several places in VoyenceControl. In particular, they
are used to do the following:
Setup Configuration Audit filters

Create filters for table views within the product (using the RegEx filter operator)
In DASL Device Drivers, for writing device-specific communications protocols
Regular expressions are a way of defining a set of matching criteria to determine if a specified
expression matches a set of target text.
For example, the regular expression test*again, when applied against a series of target
texts, such as:
today it again works

today it tests better
today it tests better again
again, it tests today
today it tests again today
matches some of the lines, and does not match other lines. In particular, the specified regular
expression would match the following lines from the above example:
today it tests better again

today it tests again today
Regular expressions are an easy and efficient way of filtering or matching text in a variety of
ways:
For Configuration Audit filters, regular expressions are used to specify valid and
expected text within a configuration file, to determine if a devices configuration is in
compliance.
For table views, regular expressions allow filtering the table view to show only rows
that have fields matching specific regular expressions.
For device drivers, regular expressions allow for capturing text from a device, and
determining if specific results are obtained, and/or scraping useful data from the
captured text.
Basic regular expressions are easy to understand and create. Advanced regular expressions
are extremely powerful and provide complex filtering and searching capabilities.
Version 4.0.1
Page 5
BASIC REGEX INFORMATION

A regular expression is a string called a pattern that is used to match against one or more
subject strings. If a pattern matches a valid subset of a subject string, the subject string is
said to have a match. If a pattern does not match any of the subject string, the subject string
is said to not match.
A pattern is said to have a perfect match if the pattern matches the entire subject string.
A pattern string contains a series of normal characters, which must match consecutive
characters in the subject string to generate a match, along with special pattern characters.
These pattern characters provide a way of matching variable length and variable content
characters in the subject string. A pattern string can contain zero or more patterns, and zero
or more sequences of regular characters.
The remainder of this document describes the allowed patterns used within a pattern string.
Simple Patterns
A period (.) character in the pattern string will match any character within the subject string.
For example, the pattern: 123.567
Will match the subject string:
1234567
along with the subject string:
123J567
or:
1238567
but will not match strings such as:
1228555
The asterisk (*) character is the character repeat operator. This operator says that zero or
more copies of the previous character may exist in a subject string. For example, the following
pattern: 1234*567
Will match the following subject strings:
123567
1234567
12344567
12344444444567
When used with a period, the asterisk can match large sections of text within the middle of a
subject string. For example, the pattern:
123.*567
Will match:
123567
1234567
123This is a test of patterns567
Version 4.0.1
Page 6

Character Classes
A period (.) is useful to match any character, but to match only a few different characters,
that is where character classes are used. Characters between a pair of square braces are
considered a character class. Any of the characters in the class can be used to match the
subject string. For example:
[abc]
will match either the letter a, b, or c. When used in a pattern, as an example:
123[abc]567
will match:
123a567
123b567
123c567
but will not match:
123d567
1234567
123567
Any combination of characters can be used within the square braces, in any order. For
example:
[1aj35d]
Also, if a range of characters is desired, the beginning and ending character in the range can
be used, separated by a dash (-). For example, the following pattern:
[a-e2-6]
will match any of the following characters: a, b, c, d, e, 2, 3, 4, 5 or 6.
Finally, the entire character class can be used as the subject of an asterisk (*) operator. For
example, the following regular expression:
123[abc]*567
will match any of the following strings:
123567
123a567
123ab567
123ba567
123abcaacbbbaac567
Version 4.0.1
Page 7

You can create a group of characters that contain all characters except a specific list by using
the caret (^) operator as the first character in a square brace set. For example:
[âbc]
will match any character, except the letters a, b, or c. For example:
123[âbc]*567
will match any of the following strings:
123567
1234567
1239567
123x567
123xxxx567
1233848383838567
but will not match:
123a567
123xxbxxx567
Version 4.0.1
Page 8

Pattern Repetition
The asterisk (*) is called a pattern repetition operator. It describes how a previous pattern
character (or character class) can be repeated. In the case of the asterisk, the previous
pattern can be repeated zero or more times.
There are other pattern repetition operators available, each with a unique purpose, and each,
like the asterisk, operate on a previous pattern character:
Operator
Purposes
Repeat 0 or more times
Repeat 0 or 1 times
Repeat 1 or more times
{n}
Repeat exactly n times
{n,}
Repeat n or more times
{n,m}
Repeat at least n times, but no more than m

times.
The following are some examples of their usage:
Pattern
Matches
Does Not Match
123x*567
123567
123x567
123xx567
123xxx567
1234567
123x?567
123567
123x567
123xx567
123xxx567
123x+567
123x567
123xx567
123xxx567
123567
123x{3}567
123xxx567
123567
123x567
123xx567
123xxxx567
123x{2,}567
123xx567
123xxx567
123xxxx567
123xxxxx567
123567
123x567
123x{2,4}567
123xx567
123xxx567
123xxxx567
123567
123x567
123xxxxx567
Version 4.0.1
Page 9

Notice that the following operators are equivalent:
Operator
Is Equivalent To
{0,}
{1,}
{0,1}
<none>
{1,1}
Optional and Grouping

For a pattern to match two different sequences of characters, the OR operator (|) can be used
in this case. For example: abc|def
Will match the subject string abc or the subject string def. These may be mixed with all of the
other operators above. For example: ab*c|d.f
will match:
ac
abc
abbbbc
def
dgf
dhf
but will not match:
acc
df
deef
To control the limit of the optional operator, parenthesis() can be used to group portions of an
expression. The parenthesis themselves do not appear in the subject string, but are used to
group the portions of a RegEx pattern together. For example, the following regular expression:
abc(d*|efg|hi+j)abc
will match:
abcabc
abcdabc
abcddddabc
abcefgabc
abchijabc
abchiiiijabc
but will not match:
abc(d)abc
abcefghijabc
abcdefghijabc
Version 4.0.1
Page 10

Quoting Special Characters
To match a pattern that contains special characters, such as (, ), |, [, etc., you can quote
them in the regular expression. For example, by adding a backslash character (\).
Regular Expression
Will Match
Will Not Match
abc(def)ghi
abcdefghi
abc(def)ghi
abc$def$ghi
abc(def)ghi
abcdefghi
abcd\*efg
abcd*efg
abcefg
abcdefg
abcdddddefg
Boundary Matchers
By default, regular expression patterns will match subject strings, if the regular expression
matches any part of the subject string. For example, the regular expression:
t[hij]e
Will match:
the
as well as:
We ate at the store.
as the word the is contained in that last string. It does not matter that the subject string is a
perfect match, just that it contains a match.
Version 4.0.1
Page 11

There are two operators that can be used to constrain this matching further. The first is the
beginning-of-line (^) operator. This operator constrains the regular expression matching, to
match the beginning of the subject string only. For example, the following regular expression:
^the
will match:
the
the fire is out
but will not match:
when will the fire be out
because the word the is not at the beginning of the string. Further, the end-of-line ($)
operator is used to constrain the regular expression to match the end of the subject string
only. For example, the following regular expression:
[0-9]*$
will match:
123456
This is a number: 1234
but will not match:
The number 1234 is a whole number
because the number is not at the end of the string.
Finally, using both the beginning-of-line and end-of-line operators will force the regular
expression to match the entire subject string, or not at all. For example:
^the$
Will match:
the
But not:
This is the time
while the regular expression the without the ^ or $ will match both.
Version 4.0.1
Page 12
USEFUL EXAMPLES
The following are some useful examples of regular expressions:
Regular Expression
Use
[0-9]*
Match any string that contains a number.
^[0-9]*$
Match any string that contains only a number, or is an empty string.
^[0-9]+$
Match any string that contains only a number, and is not empty.
[Tt]he
Matches the word the, even if capitalized, such as at the beginning of a

sentence.
[Tt]he[^\.]*\.
Match an entire sentence that starts with the word The, and ends with
a period (but does not contain a period anywhere else in it). Notice that
the period is escaped with a \, so it is not interpreted as the any
character operator in either location in the regular expression.
0x[0-9a-fA-F]+
A hexadecimal number of the form 0x13a4.
\+[0-9]+
An integer with a leading plus sign (such as, +37). Notice the escaped
+ sign.
Version 4.0.1
Page 13
ADVANCED REGEX
The following section details more advanced regular expression subjects.
Predefined Characters and Character Classes

There are a number of special character sequences that have special meaning in regular
expressions, as shown in the following list:
Sequence
Meaning
Matches a single character of any value
\t
A tab character
\n
A newline character
\r
A carriage return character
\d
A digit (same as [0-9])
\D
A non-digit (same as [^0-9])
\s
A white-space character (space, tab, Newline, formfeed, or carriage return)
\S
A non-white-space character (not one of the above white-space characters)
\w
A character used in a word (such as [a-zA-Z_0-9], notice that it contains an

underscore character)
\W
Not a character used in a word (not one of the characters above)
\0n
\0nn
\0nnn
A character with octal value specified by n. For example \012 is a new-line

character, ASCII(10)
\xhh
A character with hex value specified by h. For example \xa is a new-line

character, ASCII(10)
\\
The backslash (\) character as a constant
\e
The escape character
\cx
The control character corresponding to x. For example \ca is a Control-A
\p{Lower}
Equivalent to [a-z]
\p{Upper}
Equivalent to [A-Z]
\p{Digit}
Equivalent to [0-9]
\p{Alpha}
Equivalent to [a-zA-Z]
\p{Alnum}
Equivalent to [a-zA-Z0-9]
\p{Punct}
Any punctuation character, such as !@#$%^&*()-_=+[]{}\|;:,./<>?`~
\p{Print}
Printable characters, equivalent to [\p{Alnum}\p{Punct}]
Pipe
\r
Carriage Return
\S
Non-white space character: [^\s]
\G
The end of the previous match

VoyenceControl uses .+? (period, plus and question mark) characters to
match criteria for pre-conditions. Do not use .* (period and asterisk) when using
Begins with / Ends with pre-conditions.
Note:
Version 4.0.1
Page 14

Examples of Regular Expressions
Expression
Meaning
Matches anything.
^.*
Stuff matches anything up until the last occurrence of the word

Stuff.
.*\t
Matches all characters up until the last tab character.
[^\t]*t
Matches all characters up until the first tab character.
[misy]*th
Matches words such as smith, or sith, or myth.
[misy]*th?
Will match all words listed above, but also words such as mist
and sit.
Version 4.0.1
Page 15

Union, Intersection, and Subtraction in Character Classes
Character classes can be embedded within each other to create more complicated sets. This
embedding can be accomplished using Union, Intersection, or Subtraction rules.
For unions, a square brace character class contained within another square brace character
class is equivalent to combining the two character classes. For example:
[a-z[0-9]A-Z]
is equivalent to:
[a-z0-9A-Z]
Notice, however, that this is a very different pattern than:
[a-z][0-9][A-Z]
The latter will only match lower case characters, then numerals, then uppercase letters,
without allowing intertwining of the different character types.
A more interesting case involves intersection. This involves an embedded character class
preceded by the intersection operator (&&). For example:
[a-j&&[d-z]]
is equivalent to the intersection of [a-j] and [d-z]. In this example, it is equivalent to:
[d-j]
The overall usefulness of this is questionable, but there may be cases (when combined with
the predefined character classes) where this can be useful.
A potentially useful example, however, is when the second regular expression is a negation
regular expression (using the ^ operator). This is called subtraction, and can be demonstrated
by the following example:
[a-z]&&[^d-f]]
This says any character, a-z, other than the characters d, e, or f. Or, written as a regular
expression, its equivalent to:
[a-cg-z]
This can be helpful when used with pre-defined character classes. The following are some
helpful examples of this:
Regular Expression
Meaning
[a-z&&[âeiou]]
All consonant characters (non-vowels)
[\p{Punct}&&[^:;]]
All punctuation other than colon or semi-colon
[\s&&[^\t]]
All white space, except a tab character
Version 4.0.1
Page 16
REFERENCED GROUPS
Grouping with the ( ) operators has previously been detailed. However, multiple and nested
groups are also allowed. For example, the following is a legal regular expression:
a*(b+(c*)([de]*(f))f+g)
In this example, there are four nested groups. Groups are sometimes called referenced
groups, and each is assigned a number that corresponds to the part of the string that matches
the grouped portion of the regular expression.
By convention, group 0 is the entire regular expression, group 1 is the regular expression
contained in the first group, group 2 is the regular expression contained in the second group,
etc.
Nested groups are numbered by reading left-to-right, and sequentially assigning integers 1n
based on the order, where the left parenthesis is found. In the example above, the following
group numbers are assigned to the portions of the regular expression:
Group
Number
Regular Expression Segment
a*(b+(c*)([de]*(f))f+g)
(b+(c*)([de]*(f))f+g)
(c*)
([de]*(f))
(f)
When a grouped regular expression is found to match a subject string, the groups are used to
extract the matched portion of the subject string. For example, apply the above regular
expression to the following subject string:
xxxaaabcccdeedeedffffghhhh
The above regular expression does indeed match this subject string (at least a contained
portion of it). The groups assigned to the regular expression can then be used to extract the
matched portion of the regular expression.
For example, group 0 contains the portion of the subject string that matches the entire regular
expression. In this example, that portion is:
aaabcccdeedeedffffg
since the leading xxx and trailing hhhh do not match the regular expression (and the regular
expression does not contain the beginning-of-line ^ or end-of-line $ operators). These
sequences are not contained in group 0.
Version 4.0.1
Page 17

Further, the other groups return the portion of the string that matches their regular expression
segment. The following is a table that corresponds to the entire results of matching the
example regular expression with the example subject string:
Group
Number
Regular Expression
Segment
Matched Subject String
Group 0
a*(b+(c*)([de]*(f))f+g)
aaabcccdeedeedffffg
Group 1
(b+(c*)([de]*(f))f+g)
bcccdeedeedffffg
Group 2
(c*)
ccc
Group 3
([de]*(f))
deedeedf
Group 4
(f)
This named grouping can be quite useful in extracting information from subject strings. For
example, the following regular expression:
total is ([0-9]*)
when applied to the following subject string:
The grand total is 324, but the partial amount is only 15, ok?
will match the subject string. After the match, group 0 will contain total is 324, but group 1
will contain 324. Notice that this regular expression allowed the ability to successfully yank
out a useful number from a complex subject string that contained a great deal of unwanted
information.
Referenced groups are used most often in regular expression search/replace capabilities, and
similar actions. They are used extensively in DASL Device drivers for stripping information
from captured device text. In the first case, referenced groups can be used in the replace
substitutions. In the latter case, referenced groups are used to yank information from device
output.
Version 4.0.1
Page 18
WHERE TO GET HELP

Where to get help
EMC support, product, and licensing information can be obtained as follows.
Product information For documentation, release notes, software updates, or for information
about EMC products, licensing, and service, go to the EMC Powerlink website (registration
required) at:
http://Powerlink.EMC.com
Technical support For technical support, go to EMC Customer Service on Powerlink. To open
a service request through Powerlink, you must have a valid support agreement. Please contact
your EMC sales representative for details about obtaining a valid support agreement or to
answer any questions about your account.
Sales and customer service contacts For the list of EMC sales locations, please access the
EMC home page at:
http://EMC.com/contact
Version 4.0.1
Page 19
INDEX
A
Accessing Publications Online, 4
Advanced RegEx, 14
advanced regular expression subjects, 14
Advanced regular expressions, 5
M
multiple and nested groups, 17
N
Nested groups, 17
B
Basic RegEx Information, 6
Basic regular expressions, 5
beginning-of-line (^) operator, 12
beginning-of-line and end-of-line
operators, 12
Boundary Matchers, 11
C
character classes, 7
Character Classes, 7, 14
combination of characters, 7
complex filtering, 5
Configuration Audit filters, 5
constrain the regular expression, 12
copyright, 2
Create filters, 5
D
DASL Device Drivers, 5
O
Optional and Grouping, 10
optional operator,, 10
P
pattern, 6
pattern characters, 6
pattern does not match, 6
pattern matches, 6
Pattern Repetition, 9
pattern repetition operator, 9
pattern string, 6
perfect match, 6
pre-defined character classes, 16
Predefined Characters, 14
Preface, 4
previous pattern character, 9
Q
Quoting Special Characters, 11
E
entire character class, 7
Examples of Regular Expressions, 15
extracting information, 18
F
filtering the table view, 5
G
group numbers, 17
group of characters, 8
group portions, 10
I
Intersection, 16
R
range of characters, 7
referenced groups, 17
Referenced groups, 18
Referenced Groups, 17
RegEx, 5
regex pattern, 10
regular expressions, 13
Regular Expressions, 5
Related Publications, 4
S
searching capabilities, 5
sequences of characters, 10
Simple Patterns, 6
special character sequences, 14
subject string, 6, 18
Subtraction, 16
Subtraction rules, 16
Version 4.0.1
Page 20

Support, 19
T
table of contents, 3
target text, 5
U
Union, 16
Union, Intersection, and Subtraction in
Character Classes, 16
Useful Examples, 13
Using Regular Expressions, 5
V
variable content, 6
variable length, 6
VoyenceControl and RegEx, 5
W
Who Should Read this Document, 4
Version 4.0.1
Page 21

Using Regular Expressions

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Using Regular Expressions

Загружено:

Авторское право:

Доступные форматы

EMC