Вы находитесь на странице: 1из 21

EMC

VoyenceControl

Version 4.0.1

Using Regular
Expressions
P/N 300-007-472
REV A01

EMC Corporation
Corporate Headquarters
Hopkinton, MA 01748-9103
1-508-435-1000
www.EMC.com

Using Regular Expressions

COPYRIGHT
Copyright 2008 EMC Corporation. All rights reserved.

Published August 2008


EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES
NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION
IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an
applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on
EMC.com.
All other trademarks used herein are the property of their respective owners.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 2

Using Regular Expressions

TABLE OF CONTENTS
COPYRIGHT .................................................................................................. 2
TABLE OF CONTENTS .................................................................................... 3
PREFACE....................................................................................................... 4
Who Should Read this Document? .............................................................................. 4
Related Publications ................................................................................................. 4
Accessing Publications Online .................................................................................... 4

VOYENCECONTROL AND REGEX .................................................................... 5


Using Regular Expressions (RegEx) in VoyenceControl .................................................. 5

BASIC REGEX INFORMATION ....................................................................... 6


Simple Patterns ....................................................................................................... 6
Character Classes .................................................................................................... 7
Pattern Repetition.................................................................................................... 9
Optional and Grouping............................................................................................ 10
Quoting Special Characters ..................................................................................... 11
Boundary Matchers ................................................................................................ 11

USEFUL EXAMPLES ..................................................................................... 13


ADVANCED REGEX ...................................................................................... 14
Predefined Characters and Character Classes............................................................. 14
Examples of Regular Expressions........................................................................................... 15

Union, Intersection, and Subtraction in Character Classes............................................ 16

REFERENCED GROUPS ................................................................................ 17


WHERE TO GET HELP .................................................................................. 19
Where to get help .................................................................................................. 19

INDEX ........................................................................................................ 20

Using Regular Expressions (RegEx)

Version 4.0.1
Page 3

Using Regular Expressions

PREFACE
Who Should Read this Document?
This guide is intended for Network Engineers, System Administrators, and those individuals
needing conceptual knowledge of the features and functions included in this release of
VoyenceControl. Those individuals who will be installing VoyenceControl can also benefit from
this document.

Related Publications
This section lists publications related to this document that should also be reviewed.

>
>
>

Installing VoyenceControl on Red Hat Enterprise Linux 4.0, P/N 300-007-465


Installing VoyenceControl on Windows Server 2003, P/N 300-007-466
Installing VoyenceControl on Solaris 10, P/N 300-007-467

Accessing Publications Online


The publications for this product are available online in Portable Document Format (PDF). To
locate product publications in the VoyenceControl Reference Library:
1.

Click the Help option on the VoyenceControl menu bar.

2.

Select Help Contents. The Online Users Guide will open.

3.

Go to Accessing Help and Additional Documents, and expand that section by clicking
the book icon.

4.

Once the book has opened, select Reference Library. The related publications mentioned
above are accessible in PDF format. These documents can now be viewed online, saved to
a defined location, or printed.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 4

Using Regular Expressions

VOYENCECONTROL AND REGEX


Using Regular Expressions (RegEx) in VoyenceControl
Regular Expressions (RegEx) are used in several places in VoyenceControl. In particular, they
are used to do the following:

Setup Configuration Audit filters


Create filters for table views within the product (using the RegEx filter operator)
In DASL Device Drivers, for writing device-specific communications protocols

Regular expressions are a way of defining a set of matching criteria to determine if a specified
expression matches a set of target text.
For example, the regular expression test*again, when applied against a series of target
texts, such as:

today it again works


today it tests better
today it tests better again
again, it tests today
today it tests again today

matches some of the lines, and does not match other lines. In particular, the specified regular
expression would match the following lines from the above example:

today it tests better again


today it tests again today

Regular expressions are an easy and efficient way of filtering or matching text in a variety of
ways:

For Configuration Audit filters, regular expressions are used to specify valid and
expected text within a configuration file, to determine if a devices configuration is in
compliance.

For table views, regular expressions allow filtering the table view to show only rows
that have fields matching specific regular expressions.

For device drivers, regular expressions allow for capturing text from a device, and
determining if specific results are obtained, and/or scraping useful data from the
captured text.

Basic regular expressions are easy to understand and create. Advanced regular expressions
are extremely powerful and provide complex filtering and searching capabilities.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 5

Using Regular Expressions

BASIC REGEX INFORMATION


A regular expression is a string called a pattern that is used to match against one or more
subject strings. If a pattern matches a valid subset of a subject string, the subject string is
said to have a match. If a pattern does not match any of the subject string, the subject string
is said to not match.
A pattern is said to have a perfect match if the pattern matches the entire subject string.
A pattern string contains a series of normal characters, which must match consecutive
characters in the subject string to generate a match, along with special pattern characters.
These pattern characters provide a way of matching variable length and variable content
characters in the subject string. A pattern string can contain zero or more patterns, and zero
or more sequences of regular characters.
The remainder of this document describes the allowed patterns used within a pattern string.

Simple Patterns
A period (.) character in the pattern string will match any character within the subject string.
For example, the pattern: 123.567
Will match the subject string:
1234567
along with the subject string:
123J567
or:
1238567
but will not match strings such as:
1228555
The asterisk (*) character is the character repeat operator. This operator says that zero or
more copies of the previous character may exist in a subject string. For example, the following
pattern: 1234*567
Will match the following subject strings:
123567
1234567
12344567
12344444444567
When used with a period, the asterisk can match large sections of text within the middle of a
subject string. For example, the pattern:
123.*567
Will match:
123567
1234567
123This is a test of patterns567

Using Regular Expressions (RegEx)

Version 4.0.1
Page 6

Using Regular Expressions


Character Classes
A period (.) is useful to match any character, but to match only a few different characters,
that is where character classes are used. Characters between a pair of square braces are
considered a character class. Any of the characters in the class can be used to match the
subject string. For example:
[abc]
will match either the letter a, b, or c. When used in a pattern, as an example:
123[abc]567
will match:
123a567
123b567
123c567
but will not match:
123d567
1234567
123567
Any combination of characters can be used within the square braces, in any order. For
example:
[1aj35d]
Also, if a range of characters is desired, the beginning and ending character in the range can
be used, separated by a dash (-). For example, the following pattern:
[a-e2-6]
will match any of the following characters: a, b, c, d, e, 2, 3, 4, 5 or 6.
Finally, the entire character class can be used as the subject of an asterisk (*) operator. For
example, the following regular expression:
123[abc]*567
will match any of the following strings:
123567
123a567
123ab567
123ba567
123abcaacbbbaac567

Using Regular Expressions (RegEx)

Version 4.0.1
Page 7

Using Regular Expressions


You can create a group of characters that contain all characters except a specific list by using
the caret (^) operator as the first character in a square brace set. For example:
[^abc]
will match any character, except the letters a, b, or c. For example:
123[^abc]*567
will match any of the following strings:
123567
1234567
1239567
123x567
123xxxx567
1233848383838567
but will not match:
123a567
123xxbxxx567

Using Regular Expressions (RegEx)

Version 4.0.1
Page 8

Using Regular Expressions


Pattern Repetition
The asterisk (*) is called a pattern repetition operator. It describes how a previous pattern
character (or character class) can be repeated. In the case of the asterisk, the previous
pattern can be repeated zero or more times.
There are other pattern repetition operators available, each with a unique purpose, and each,
like the asterisk, operate on a previous pattern character:

Operator

Purposes

Repeat 0 or more times

Repeat 0 or 1 times

Repeat 1 or more times

{n}

Repeat exactly n times

{n,}

Repeat n or more times

{n,m}

Repeat at least n times, but no more than m


times.

The following are some examples of their usage:

Pattern

Matches

Does Not Match

123x*567

123567
123x567
123xx567
123xxx567

1234567

123x?567

123567
123x567

123xx567
123xxx567

123x+567

123x567
123xx567
123xxx567

123567

123x{3}567

123xxx567

123567
123x567
123xx567
123xxxx567

123x{2,}567

123xx567
123xxx567
123xxxx567
123xxxxx567

123567
123x567

123x{2,4}567

123xx567
123xxx567
123xxxx567

123567
123x567
123xxxxx567

Using Regular Expressions (RegEx)

Version 4.0.1
Page 9

Using Regular Expressions


Notice that the following operators are equivalent:

Operator

Is Equivalent To

{0,}

{1,}

{0,1}

<none>

{1,1}

Optional and Grouping


For a pattern to match two different sequences of characters, the OR operator (|) can be used
in this case. For example: abc|def
Will match the subject string abc or the subject string def. These may be mixed with all of the
other operators above. For example: ab*c|d.f
will match:
ac
abc
abbbbc
def
dgf
dhf
but will not match:
acc
df
deef
To control the limit of the optional operator, parenthesis() can be used to group portions of an
expression. The parenthesis themselves do not appear in the subject string, but are used to
group the portions of a RegEx pattern together. For example, the following regular expression:
abc(d*|efg|hi+j)abc
will match:
abcabc
abcdabc
abcddddabc
abcefgabc
abchijabc
abchiiiijabc
but will not match:
abc(d)abc
abcefghijabc
abcdefghijabc

Using Regular Expressions (RegEx)

Version 4.0.1
Page 10

Using Regular Expressions


Quoting Special Characters
To match a pattern that contains special characters, such as (, ), |, [, etc., you can quote
them in the regular expression. For example, by adding a backslash character (\).

Regular Expression

Will Match

Will Not Match

abc(def)ghi

abcdefghi

abc(def)ghi

abc\(def\)ghi

abc(def)ghi

abcdefghi

abcd\*efg

abcd*efg

abcefg
abcdefg
abcdddddefg

Boundary Matchers
By default, regular expression patterns will match subject strings, if the regular expression
matches any part of the subject string. For example, the regular expression:
t[hij]e
Will match:
the
as well as:
We ate at the store.
as the word the is contained in that last string. It does not matter that the subject string is a
perfect match, just that it contains a match.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 11

Using Regular Expressions


There are two operators that can be used to constrain this matching further. The first is the
beginning-of-line (^) operator. This operator constrains the regular expression matching, to
match the beginning of the subject string only. For example, the following regular expression:
^the
will match:
the
the fire is out
but will not match:
when will the fire be out
because the word the is not at the beginning of the string. Further, the end-of-line ($)
operator is used to constrain the regular expression to match the end of the subject string
only. For example, the following regular expression:
[0-9]*$
will match:
123456
This is a number: 1234
but will not match:
The number 1234 is a whole number
because the number is not at the end of the string.
Finally, using both the beginning-of-line and end-of-line operators will force the regular
expression to match the entire subject string, or not at all. For example:
^the$
Will match:
the
But not:
This is the time
while the regular expression the without the ^ or $ will match both.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 12

Using Regular Expressions

USEFUL EXAMPLES
The following are some useful examples of regular expressions:

Regular Expression

Use

[0-9]*

Match any string that contains a number.

^[0-9]*$

Match any string that contains only a number, or is an empty string.

^[0-9]+$

Match any string that contains only a number, and is not empty.

[Tt]he

Matches the word the, even if capitalized, such as at the beginning of a


sentence.

[Tt]he[^\.]*\.

Match an entire sentence that starts with the word The, and ends with
a period (but does not contain a period anywhere else in it). Notice that
the period is escaped with a \, so it is not interpreted as the any
character operator in either location in the regular expression.

0x[0-9a-fA-F]+

A hexadecimal number of the form 0x13a4.

\+[0-9]+

An integer with a leading plus sign (such as, +37). Notice the escaped
+ sign.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 13

Using Regular Expressions

ADVANCED REGEX
The following section details more advanced regular expression subjects.

Predefined Characters and Character Classes


There are a number of special character sequences that have special meaning in regular
expressions, as shown in the following list:

Sequence

Meaning

Matches a single character of any value

\t

A tab character

\n

A newline character

\r

A carriage return character

\d

A digit (same as [0-9])

\D

A non-digit (same as [^0-9])

\s

A white-space character (space, tab, Newline, formfeed, or carriage return)

\S

A non-white-space character (not one of the above white-space characters)

\w

A character used in a word (such as [a-zA-Z_0-9], notice that it contains an


underscore character)

\W

Not a character used in a word (not one of the characters above)

\0n
\0nn
\0nnn

A character with octal value specified by n. For example \012 is a new-line


character, ASCII(10)

\xhh

A character with hex value specified by h. For example \xa is a new-line


character, ASCII(10)

\\

The backslash (\) character as a constant

\e

The escape character

\cx

The control character corresponding to x. For example \ca is a Control-A

\p{Lower}

Equivalent to [a-z]

\p{Upper}

Equivalent to [A-Z]

\p{Digit}

Equivalent to [0-9]

\p{Alpha}

Equivalent to [a-zA-Z]

\p{Alnum}

Equivalent to [a-zA-Z0-9]

\p{Punct}

Any punctuation character, such as !@#$%^&*()-_=+[]{}\|;:,./<>?`~

\p{Print}

Printable characters, equivalent to [\p{Alnum}\p{Punct}]

Pipe

\r

Carriage Return

\S

Non-white space character: [^\s]

\G

The end of the previous match


VoyenceControl uses .+? (period, plus and question mark) characters to
match criteria for pre-conditions. Do not use .* (period and asterisk) when using
Begins with / Ends with pre-conditions.

Note:

Using Regular Expressions (RegEx)

Version 4.0.1
Page 14

Using Regular Expressions


Examples of Regular Expressions

Expression

Meaning

Matches anything.

^.*

Stuff matches anything up until the last occurrence of the word


Stuff.

.*\t

Matches all characters up until the last tab character.

[^\t]*t

Matches all characters up until the first tab character.

[misy]*th

Matches words such as smith, or sith, or myth.

[misy]*th?

Will match all words listed above, but also words such as mist
and sit.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 15

Using Regular Expressions


Union, Intersection, and Subtraction in Character Classes
Character classes can be embedded within each other to create more complicated sets. This
embedding can be accomplished using Union, Intersection, or Subtraction rules.
For unions, a square brace character class contained within another square brace character
class is equivalent to combining the two character classes. For example:
[a-z[0-9]A-Z]
is equivalent to:
[a-z0-9A-Z]
Notice, however, that this is a very different pattern than:
[a-z][0-9][A-Z]
The latter will only match lower case characters, then numerals, then uppercase letters,
without allowing intertwining of the different character types.
A more interesting case involves intersection. This involves an embedded character class
preceded by the intersection operator (&&). For example:
[a-j&&[d-z]]
is equivalent to the intersection of [a-j] and [d-z]. In this example, it is equivalent to:
[d-j]
The overall usefulness of this is questionable, but there may be cases (when combined with
the predefined character classes) where this can be useful.
A potentially useful example, however, is when the second regular expression is a negation
regular expression (using the ^ operator). This is called subtraction, and can be demonstrated
by the following example:
[a-z]&&[^d-f]]
This says any character, a-z, other than the characters d, e, or f. Or, written as a regular
expression, its equivalent to:
[a-cg-z]
This can be helpful when used with pre-defined character classes. The following are some
helpful examples of this:

Regular Expression

Meaning

[a-z&&[^aeiou]]

All consonant characters (non-vowels)

[\p{Punct}&&[^:;]]

All punctuation other than colon or semi-colon

[\s&&[^\t]]

All white space, except a tab character

Using Regular Expressions (RegEx)

Version 4.0.1
Page 16

Using Regular Expressions

REFERENCED GROUPS
Grouping with the ( ) operators has previously been detailed. However, multiple and nested
groups are also allowed. For example, the following is a legal regular expression:
a*(b+(c*)([de]*(f))f+g)
In this example, there are four nested groups. Groups are sometimes called referenced
groups, and each is assigned a number that corresponds to the part of the string that matches
the grouped portion of the regular expression.
By convention, group 0 is the entire regular expression, group 1 is the regular expression
contained in the first group, group 2 is the regular expression contained in the second group,
etc.
Nested groups are numbered by reading left-to-right, and sequentially assigning integers 1n
based on the order, where the left parenthesis is found. In the example above, the following
group numbers are assigned to the portions of the regular expression:

Group
Number

Regular Expression Segment

a*(b+(c*)([de]*(f))f+g)

(b+(c*)([de]*(f))f+g)

(c*)

([de]*(f))

(f)

When a grouped regular expression is found to match a subject string, the groups are used to
extract the matched portion of the subject string. For example, apply the above regular
expression to the following subject string:
xxxaaabcccdeedeedffffghhhh
The above regular expression does indeed match this subject string (at least a contained
portion of it). The groups assigned to the regular expression can then be used to extract the
matched portion of the regular expression.
For example, group 0 contains the portion of the subject string that matches the entire regular
expression. In this example, that portion is:
aaabcccdeedeedffffg
since the leading xxx and trailing hhhh do not match the regular expression (and the regular
expression does not contain the beginning-of-line ^ or end-of-line $ operators). These
sequences are not contained in group 0.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 17

Using Regular Expressions


Further, the other groups return the portion of the string that matches their regular expression
segment. The following is a table that corresponds to the entire results of matching the
example regular expression with the example subject string:

Group
Number

Regular Expression
Segment

Matched Subject String

Group 0

a*(b+(c*)([de]*(f))f+g)

aaabcccdeedeedffffg

Group 1

(b+(c*)([de]*(f))f+g)

bcccdeedeedffffg

Group 2

(c*)

ccc

Group 3

([de]*(f))

deedeedf

Group 4

(f)

This named grouping can be quite useful in extracting information from subject strings. For
example, the following regular expression:
total is ([0-9]*)
when applied to the following subject string:
The grand total is 324, but the partial amount is only 15, ok?
will match the subject string. After the match, group 0 will contain total is 324, but group 1
will contain 324. Notice that this regular expression allowed the ability to successfully yank
out a useful number from a complex subject string that contained a great deal of unwanted
information.
Referenced groups are used most often in regular expression search/replace capabilities, and
similar actions. They are used extensively in DASL Device drivers for stripping information
from captured device text. In the first case, referenced groups can be used in the replace
substitutions. In the latter case, referenced groups are used to yank information from device
output.

Using Regular Expressions (RegEx)

Version 4.0.1
Page 18

Using Regular Expressions

WHERE TO GET HELP


Where to get help
EMC support, product, and licensing information can be obtained as follows.
Product information For documentation, release notes, software updates, or for information
about EMC products, licensing, and service, go to the EMC Powerlink website (registration
required) at:
http://Powerlink.EMC.com
Technical support For technical support, go to EMC Customer Service on Powerlink. To open
a service request through Powerlink, you must have a valid support agreement. Please contact
your EMC sales representative for details about obtaining a valid support agreement or to
answer any questions about your account.
Sales and customer service contacts For the list of EMC sales locations, please access the
EMC home page at:
http://EMC.com/contact

Using Regular Expressions (RegEx)

Version 4.0.1
Page 19

Using Regular Expressions

INDEX
A
Accessing Publications Online, 4
Advanced RegEx, 14
advanced regular expression subjects, 14
Advanced regular expressions, 5

M
multiple and nested groups, 17

N
Nested groups, 17

B
Basic RegEx Information, 6
Basic regular expressions, 5
beginning-of-line (^) operator, 12
beginning-of-line and end-of-line
operators, 12
Boundary Matchers, 11

C
character classes, 7
Character Classes, 7, 14
combination of characters, 7
complex filtering, 5
Configuration Audit filters, 5
constrain the regular expression, 12
copyright, 2
Create filters, 5

D
DASL Device Drivers, 5

O
Optional and Grouping, 10
optional operator,, 10

P
pattern, 6
pattern characters, 6
pattern does not match, 6
pattern matches, 6
Pattern Repetition, 9
pattern repetition operator, 9
pattern string, 6
perfect match, 6
pre-defined character classes, 16
Predefined Characters, 14
Preface, 4
previous pattern character, 9

Q
Quoting Special Characters, 11

E
entire character class, 7
Examples of Regular Expressions, 15
extracting information, 18

F
filtering the table view, 5

G
group numbers, 17
group of characters, 8
group portions, 10

I
Intersection, 16

Using Regular Expressions (RegEx)

R
range of characters, 7
referenced groups, 17
Referenced groups, 18
Referenced Groups, 17
RegEx, 5
regex pattern, 10
regular expressions, 13
Regular Expressions, 5
Related Publications, 4

S
searching capabilities, 5
sequences of characters, 10
Simple Patterns, 6
special character sequences, 14
subject string, 6, 18
Subtraction, 16
Subtraction rules, 16

Version 4.0.1
Page 20

Using Regular Expressions


Support, 19

T
table of contents, 3
target text, 5

U
Union, 16
Union, Intersection, and Subtraction in
Character Classes, 16

Using Regular Expressions (RegEx)

Useful Examples, 13
Using Regular Expressions, 5

V
variable content, 6
variable length, 6
VoyenceControl and RegEx, 5

W
Who Should Read this Document, 4

Version 4.0.1
Page 21

Вам также может понравиться