Вы находитесь на странице: 1из 22

The Character Classes

What is a Character Class?


A character class is a set of characters enclosed within square
brackets.
A character class specifies a collection of characters that if one
of them matches with a single character shown in a given string
then there is a match between the character class and the single
character.
Example
The “[abc]” regular expression construct represent a simple class composed of 'a', 'b'
and 'c'. Having this construct as part of a regular expression we work with means that
this specific construct can match with 'a', 'b' or 'c'.

09/24/08 © 2008 Haim Michael 2


The Available Character Classes
[abc] a, b, or c (simple)

[^abc] any character except for a, b, or c (negation)

[a-zA-Z] a through z or A through z (range)

[a-f[s-v]] a through f, or s through v (union)

09/24/08 © 2008 Haim Michael 3


The Available Character Classes

[a-m&&[f-h]] f, g or h (intersection)

[a-k&&[^gh]] a, b, c, d, e, f, i, j or k (subtraction)

[a-k&&[^e-g]] a, b, c, d, h, i, j or k (subtraction)

09/24/08 © 2008 Haim Michael 4


Simple Character Classes
The simplest form of a character class can be created by placing
a set of characters within square b rackets.
Examples
The [fkj] regular expression matches “f”, “k” and “k”.
The [fkj]bc regular expression matches any word of three letters that starts with 'f' or 'k'
or 'j' and the reminder of the string is “bc” (e.g. “fbc”, jbc” etc.).

09/24/08 © 2008 Haim Michael 5


The Negation Character Class
The negation character class includes the '^' character in its
beginning.
The negation character class matches all characters excepts the
ones listed within the square brackets.
Examples
The [^fkj] regular expression matches any character except for “f”, “k” and “j”.
The [^fkj]bc regular expression matches any word of three letters that starts with a
character different from 'f', 'k' and 'j' and the reminder of the string is “bc” (e.g. “xbc”,
abc” etc.).

09/24/08 © 2008 Haim Michael 6


The Ranges Character Class
The ranges character class includes a range of characters,
specified using the '-' meta character between the first and the
last characters of the range.
It is possible to place different ranges one aside the other within
the characters class.
Examples
The [a-d] regular expression matches any character in the range a-d.
The [x-zX-Z] regular expression matches any character in the range of x-z or X-Z.
The [A-X]abc regular expression matches any word of 4 letters that its 1st letter is a a
capital letter and the remaining 3 letters are “abc”.

09/24/08 © 2008 Haim Michael 7


The Unions Character Class
The unions character class is the result of combining together
two (or more) separated character classes by nesting one class
within the other.
Examples
The [a-d[3-5]] regular expression matches any character in the range a-d or 3-5. In
simple words this regular expression matches any of the following characters: a, b, c,
d, 3, 4 or 5.

09/24/08 © 2008 Haim Michael 8


The Intersections Character Class
The intersections character class is the result of intersecting
together two separated character classes using the '&&'
operator.
Examples
The [a-z&&[b-d]] regular expression matches any of the following characters: b, c or d.

09/24/08 © 2008 Haim Michael 9


The Subtraction Character Class
The subtraction character class is the result of intersecting
together a character classes with the negation of another
separated one.
Examples
The [a-f&&[^bcd]] regular expression matches any of the following characters: a, e or f.
The [a-z&&[^e-z]] regular expression matches any of the following characters: a, b, c
or d.

09/24/08 © 2008 Haim Michael 10


Predefined Character Class
The Regular Expression API includes several commonly used
predefined character classes:
\w A word character [a-zA-Z_0-9]
\W A non word character [^\w]
\s A whitespace character [\t\n\f\r]
\S A non whitespace character [^\s]
\d A digit [0-9]
\D A non digit [^0-9]
. Any character

09/24/08 © 2008 Haim Michael 11


© Haim Michael 09/24/08

The Character Classes

09/24/08 © 2008 Haim Michael 1

© Haim Michael 1
© Haim Michael 09/24/08

What is a Character Class?


A character class is a set of characters enclosed within square
brackets.
A character class specifies a collection of characters that if one
of them matches with a single character shown in a given string
then there is a match between the character class and the single
character.
Example
The “[abc]” regular expression construct represent a simple class composed of 'a', 'b'
and 'c'. Having this construct as part of a regular expression we work with means that
this specific construct can match with 'a', 'b' or 'c'.

09/24/08 © 2008 Haim Michael 2

Browsing the Pattern API documentation you will find tables


that summarize the regular expressions available
constructs. Among these constructs we can find the
available character classes.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html

© Haim Michael 2
© Haim Michael 09/24/08

The Available Character Classes


[abc] a, b, or c (simple)

[^abc] any character except for a, b, or c (negation)

[a-zA-Z] a through z or A through z (range)

[a-f[s-v]] a through f, or s through v (union)

09/24/08 © 2008 Haim Michael 3

© Haim Michael 3
© Haim Michael 09/24/08

The Available Character Classes

[a-m&&[f-h]] f, g or h (intersection)

[a-k&&[^gh]] a, b, c, d, e, f, i, j or k (subtraction)

[a-k&&[^e-g]] a, b, c, d, h, i, j or k (subtraction)

09/24/08 © 2008 Haim Michael 4

© Haim Michael 4
© Haim Michael 09/24/08

Simple Character Classes


The simplest form of a character class can be created by placing
a set of characters within square b rackets.
Examples
The [fkj] regular expression matches “f”, “k” and “k”.
The [fkj]bc regular expression matches any word of three letters that starts with 'f' or 'k'
or 'j' and the reminder of the string is “bc” (e.g. “fbc”, jbc” etc.).

09/24/08 © 2008 Haim Michael 5

Try to run the RXTest application from the previous topic with
various inputs...

Given the “[abc]gh” regular expression and the “bgh” string we


can say that this string matches this regular expression.

Given the “[xyz]bc” regular expression and the “xbc” string we


can say that this string matches this regular expression.,

© Haim Michael 5
© Haim Michael 09/24/08

The Negation Character Class


The negation character class includes the '^' character in its
beginning.
The negation character class matches all characters excepts the
ones listed within the square brackets.
Examples
The [^fkj] regular expression matches any character except for “f”, “k” and “j”.
The [^fkj]bc regular expression matches any word of three letters that starts with a
character different from 'f', 'k' and 'j' and the reminder of the string is “bc” (e.g. “xbc”,
abc” etc.).

09/24/08 © 2008 Haim Michael 6

Try to run the RXTest application from the previous topic with
various inputs...

Given the “[^abc]gh” regular expression and the “xgh” string we


can say that this string matches this regular expression.

Given the “[^xyz]bc” regular expression and the “abc” string we


can say that this string matches this regular expression.,

© Haim Michael 6
© Haim Michael 09/24/08

The Ranges Character Class


The ranges character class includes a range of characters,
specified using the '-' meta character between the first and the
last characters of the range.
It is possible to place different ranges one aside the other within
the characters class.
Examples
The [a-d] regular expression matches any character in the range a-d.
The [x-zX-Z] regular expression matches any character in the range of x-z or X-Z.
The [A-X]abc regular expression matches any word of 4 letters that its 1st letter is a a
capital letter and the remaining 3 letters are “abc”.

09/24/08 © 2008 Haim Michael 7

Try to run the RXTest application from the previous topic with
various inputs...

Given the “[a-c]gh” regular expression and the “agh” string we


can say that this string matches with this regular expression.

Given the “[x-z]bc” regular expression and the “ybc” string we


can say that this string matches with this regular expression.,

© Haim Michael 7
© Haim Michael 09/24/08

The Unions Character Class


The unions character class is the result of combining together
two (or more) separated character classes by nesting one class
within the other.
Examples
The [a-d[3-5]] regular expression matches any character in the range a-d or 3-5. In
simple words this regular expression matches any of the following characters: a, b, c,
d, 3, 4 or 5.

09/24/08 © 2008 Haim Michael 8

Try to run the RXTest application from the previous topic with
various inputs...

© Haim Michael 8
© Haim Michael 09/24/08

The Intersections Character Class


The intersections character class is the result of intersecting
together two separated character classes using the '&&'
operator.
Examples
The [a-z&&[b-d]] regular expression matches any of the following characters: b, c or d.

09/24/08 © 2008 Haim Michael 9

Try to run the RXTest application from the previous topic with
various inputs...

© Haim Michael 9
© Haim Michael 09/24/08

The Subtraction Character Class


The subtraction character class is the result of intersecting
together a character classes with the negation of another
separated one.
Examples
The [a-f&&[^bcd]] regular expression matches any of the following characters: a, e or f.
The [a-z&&[^e-z]] regular expression matches any of the following characters: a, b, c
or d.

09/24/08 © 2008 Haim Michael 10

Try to run the RXTest application from the previous topic with
various inputs...

© Haim Michael 10
© Haim Michael 09/24/08

Predefined Character Class


The Regular Expression API includes several commonly used
predefined character classes:
\w A word character [a-zA-Z_0-9]
\W A non word character [^\w]
\s A whitespace character [\t\n\f\r]
\S A non whitespace character [^\s]
\d A digit [0-9]
\D A non digit [^0-9]
. Any character

09/24/08 © 2008 Haim Michael 11

Try to run the RXTest application from the previous topic with
various inputs...

© Haim Michael 11

Вам также может понравиться