Академический Документы
Профессиональный Документы
Культура Документы
VoyenceControl
Version 4.0.1
Using Regular
Expressions
P/N 300-007-472
REV A01
EMC Corporation
Corporate Headquarters
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com
COPYRIGHT
Copyright 2008 EMC Corporation. All rights reserved.
Version 4.0.1
Page 2
TABLE OF CONTENTS
COPYRIGHT .................................................................................................. 2
TABLE OF CONTENTS .................................................................................... 3
PREFACE....................................................................................................... 4
Who Should Read this Document? .............................................................................. 4
Related Publications ................................................................................................. 4
Accessing Publications Online .................................................................................... 4
INDEX ........................................................................................................ 20
Version 4.0.1
Page 3
PREFACE
Who Should Read this Document?
This guide is intended for Network Engineers, System Administrators, and those individuals
needing conceptual knowledge of the features and functions included in this release of
VoyenceControl. Those individuals who will be installing VoyenceControl can also benefit from
this document.
Related Publications
This section lists publications related to this document that should also be reviewed.
>
>
>
2.
3.
Go to Accessing Help and Additional Documents, and expand that section by clicking
the book icon.
4.
Once the book has opened, select Reference Library. The related publications mentioned
above are accessible in PDF format. These documents can now be viewed online, saved to
a defined location, or printed.
Version 4.0.1
Page 4
Regular expressions are a way of defining a set of matching criteria to determine if a specified
expression matches a set of target text.
For example, the regular expression test*again, when applied against a series of target
texts, such as:
matches some of the lines, and does not match other lines. In particular, the specified regular
expression would match the following lines from the above example:
Regular expressions are an easy and efficient way of filtering or matching text in a variety of
ways:
For Configuration Audit filters, regular expressions are used to specify valid and
expected text within a configuration file, to determine if a devices configuration is in
compliance.
For table views, regular expressions allow filtering the table view to show only rows
that have fields matching specific regular expressions.
For device drivers, regular expressions allow for capturing text from a device, and
determining if specific results are obtained, and/or scraping useful data from the
captured text.
Basic regular expressions are easy to understand and create. Advanced regular expressions
are extremely powerful and provide complex filtering and searching capabilities.
Version 4.0.1
Page 5
Simple Patterns
A period (.) character in the pattern string will match any character within the subject string.
For example, the pattern: 123.567
Will match the subject string:
1234567
along with the subject string:
123J567
or:
1238567
but will not match strings such as:
1228555
The asterisk (*) character is the character repeat operator. This operator says that zero or
more copies of the previous character may exist in a subject string. For example, the following
pattern: 1234*567
Will match the following subject strings:
123567
1234567
12344567
12344444444567
When used with a period, the asterisk can match large sections of text within the middle of a
subject string. For example, the pattern:
123.*567
Will match:
123567
1234567
123This is a test of patterns567
Version 4.0.1
Page 6
Version 4.0.1
Page 7
Version 4.0.1
Page 8
Operator
Purposes
Repeat 0 or 1 times
{n}
{n,}
{n,m}
Pattern
Matches
123x*567
123567
123x567
123xx567
123xxx567
1234567
123x?567
123567
123x567
123xx567
123xxx567
123x+567
123x567
123xx567
123xxx567
123567
123x{3}567
123xxx567
123567
123x567
123xx567
123xxxx567
123x{2,}567
123xx567
123xxx567
123xxxx567
123xxxxx567
123567
123x567
123x{2,4}567
123xx567
123xxx567
123xxxx567
123567
123x567
123xxxxx567
Version 4.0.1
Page 9
Operator
Is Equivalent To
{0,}
{1,}
{0,1}
<none>
{1,1}
Version 4.0.1
Page 10
Regular Expression
Will Match
abc(def)ghi
abcdefghi
abc(def)ghi
abc\(def\)ghi
abc(def)ghi
abcdefghi
abcd\*efg
abcd*efg
abcefg
abcdefg
abcdddddefg
Boundary Matchers
By default, regular expression patterns will match subject strings, if the regular expression
matches any part of the subject string. For example, the regular expression:
t[hij]e
Will match:
the
as well as:
We ate at the store.
as the word the is contained in that last string. It does not matter that the subject string is a
perfect match, just that it contains a match.
Version 4.0.1
Page 11
Version 4.0.1
Page 12
USEFUL EXAMPLES
The following are some useful examples of regular expressions:
Regular Expression
Use
[0-9]*
^[0-9]*$
^[0-9]+$
Match any string that contains only a number, and is not empty.
[Tt]he
[Tt]he[^\.]*\.
Match an entire sentence that starts with the word The, and ends with
a period (but does not contain a period anywhere else in it). Notice that
the period is escaped with a \, so it is not interpreted as the any
character operator in either location in the regular expression.
0x[0-9a-fA-F]+
\+[0-9]+
An integer with a leading plus sign (such as, +37). Notice the escaped
+ sign.
Version 4.0.1
Page 13
ADVANCED REGEX
The following section details more advanced regular expression subjects.
Sequence
Meaning
\t
A tab character
\n
A newline character
\r
\d
\D
\s
\S
\w
\W
\0n
\0nn
\0nnn
\xhh
\\
\e
\cx
\p{Lower}
Equivalent to [a-z]
\p{Upper}
Equivalent to [A-Z]
\p{Digit}
Equivalent to [0-9]
\p{Alpha}
Equivalent to [a-zA-Z]
\p{Alnum}
Equivalent to [a-zA-Z0-9]
\p{Punct}
\p{Print}
Pipe
\r
Carriage Return
\S
\G
Note:
Version 4.0.1
Page 14
Expression
Meaning
Matches anything.
^.*
.*\t
[^\t]*t
[misy]*th
[misy]*th?
Will match all words listed above, but also words such as mist
and sit.
Version 4.0.1
Page 15
Regular Expression
Meaning
[a-z&&[^aeiou]]
[\p{Punct}&&[^:;]]
[\s&&[^\t]]
Version 4.0.1
Page 16
REFERENCED GROUPS
Grouping with the ( ) operators has previously been detailed. However, multiple and nested
groups are also allowed. For example, the following is a legal regular expression:
a*(b+(c*)([de]*(f))f+g)
In this example, there are four nested groups. Groups are sometimes called referenced
groups, and each is assigned a number that corresponds to the part of the string that matches
the grouped portion of the regular expression.
By convention, group 0 is the entire regular expression, group 1 is the regular expression
contained in the first group, group 2 is the regular expression contained in the second group,
etc.
Nested groups are numbered by reading left-to-right, and sequentially assigning integers 1n
based on the order, where the left parenthesis is found. In the example above, the following
group numbers are assigned to the portions of the regular expression:
Group
Number
a*(b+(c*)([de]*(f))f+g)
(b+(c*)([de]*(f))f+g)
(c*)
([de]*(f))
(f)
When a grouped regular expression is found to match a subject string, the groups are used to
extract the matched portion of the subject string. For example, apply the above regular
expression to the following subject string:
xxxaaabcccdeedeedffffghhhh
The above regular expression does indeed match this subject string (at least a contained
portion of it). The groups assigned to the regular expression can then be used to extract the
matched portion of the regular expression.
For example, group 0 contains the portion of the subject string that matches the entire regular
expression. In this example, that portion is:
aaabcccdeedeedffffg
since the leading xxx and trailing hhhh do not match the regular expression (and the regular
expression does not contain the beginning-of-line ^ or end-of-line $ operators). These
sequences are not contained in group 0.
Version 4.0.1
Page 17
Group
Number
Regular Expression
Segment
Group 0
a*(b+(c*)([de]*(f))f+g)
aaabcccdeedeedffffg
Group 1
(b+(c*)([de]*(f))f+g)
bcccdeedeedffffg
Group 2
(c*)
ccc
Group 3
([de]*(f))
deedeedf
Group 4
(f)
This named grouping can be quite useful in extracting information from subject strings. For
example, the following regular expression:
total is ([0-9]*)
when applied to the following subject string:
The grand total is 324, but the partial amount is only 15, ok?
will match the subject string. After the match, group 0 will contain total is 324, but group 1
will contain 324. Notice that this regular expression allowed the ability to successfully yank
out a useful number from a complex subject string that contained a great deal of unwanted
information.
Referenced groups are used most often in regular expression search/replace capabilities, and
similar actions. They are used extensively in DASL Device drivers for stripping information
from captured device text. In the first case, referenced groups can be used in the replace
substitutions. In the latter case, referenced groups are used to yank information from device
output.
Version 4.0.1
Page 18
Version 4.0.1
Page 19
INDEX
A
Accessing Publications Online, 4
Advanced RegEx, 14
advanced regular expression subjects, 14
Advanced regular expressions, 5
M
multiple and nested groups, 17
N
Nested groups, 17
B
Basic RegEx Information, 6
Basic regular expressions, 5
beginning-of-line (^) operator, 12
beginning-of-line and end-of-line
operators, 12
Boundary Matchers, 11
C
character classes, 7
Character Classes, 7, 14
combination of characters, 7
complex filtering, 5
Configuration Audit filters, 5
constrain the regular expression, 12
copyright, 2
Create filters, 5
D
DASL Device Drivers, 5
O
Optional and Grouping, 10
optional operator,, 10
P
pattern, 6
pattern characters, 6
pattern does not match, 6
pattern matches, 6
Pattern Repetition, 9
pattern repetition operator, 9
pattern string, 6
perfect match, 6
pre-defined character classes, 16
Predefined Characters, 14
Preface, 4
previous pattern character, 9
Q
Quoting Special Characters, 11
E
entire character class, 7
Examples of Regular Expressions, 15
extracting information, 18
F
filtering the table view, 5
G
group numbers, 17
group of characters, 8
group portions, 10
I
Intersection, 16
R
range of characters, 7
referenced groups, 17
Referenced groups, 18
Referenced Groups, 17
RegEx, 5
regex pattern, 10
regular expressions, 13
Regular Expressions, 5
Related Publications, 4
S
searching capabilities, 5
sequences of characters, 10
Simple Patterns, 6
special character sequences, 14
subject string, 6, 18
Subtraction, 16
Subtraction rules, 16
Version 4.0.1
Page 20
T
table of contents, 3
target text, 5
U
Union, 16
Union, Intersection, and Subtraction in
Character Classes, 16
Useful Examples, 13
Using Regular Expressions, 5
V
variable content, 6
variable length, 6
VoyenceControl and RegEx, 5
W
Who Should Read this Document, 4
Version 4.0.1
Page 21