You are on page 1of 607

Use

regular expressions effectively in several


Learn how to validate and format input
Manage words, lines, special characters, and
Find solutions for using regular expressions in URLs,
Learn the nuances of more advanced regex features

Regular Expressions

Understand the basics of regular expressions

Understand how regular expressions APIs, syntax,


Write better regular expressions for custom needs

Regular
Expressions
Cookbook

Jan Goyvaerts, Steven Levithan

-
2010



.

.

.
.
.
.
.
.

., .
. . . . .: , 2010. 608 ., .
ISBN 978-5-93286-181-3
100 , .
, : C#, Java,
JavaScript, Perl, PHP, Python, Ruby VB.NET. : URL , ,
,
HTML, XML, CSV .
, , , , . , ,
, .

ISBN 978-5-93286-181-3
ISBN 978-0-596-52068-7 ()
-, 2010
Authorized translation of the English edition 2009 OReilly. This translation is published and sold by permission of OReilly, Inc., the owner of all rights to publish and
sell the same.
, . , , .

-. 199034, -, 16 , 7,
. (812) 380-5007, www.symbol.ru. N 000054 25.12.98.

005-93, 2; 953000 .
30.10.2009. 70100 1/16. .
38 . . 2000 .

199034, -, 9 , 12.


............................................................................... 11
1. .......................................... 19
.......................................... 19
..................... 25
................. 27

2. ......... 48
2.1. ....................................... 49
2.2. ..................................... 52
2.3. .................... 54
2.4. ........................................ 59
2.5. / ....................... 62
2.6. ........................................ 68
2.7. , , ......... 71
2.8. ................. 85
2.9. ................................ 87
2.10. .....91
2.11. ........................ 93
2.12.
......................................................97
2.13.
................................ 100
2.14. .................................... 104
2.15. ..................... 108
2.16.
.............................. 111
2.17. ............. 119
2.18. ............ 122
2.19. ............... 124
2.20.
........................................................ 128
2.21.
........................................................ 129
2.22. ............. 133

3.
......................................................... 135
..... 135
3.1. ......... 142
3.2. .......... 149
3.3. ......................... 151
3.4. ................... 159
3.5.
.......................................... 168
3.6. ............... 175
3.7. .......................................... 181
3.8. .......................... 188
3.9. .................................. 194
3.10. ................................... 202
3.11. .......................................... 208
3.12. ........ 215
3.13. ..................... 219
3.14. .................................................... 224
3.15.
........................................................... 232
3.16. ,
.............................. 238
3.17.
.................................... 245
3.18.
.................................... 247
3.19. ............................................................ 253
3.20. ,
............................................... 264
3.21. ........................................................... 269

4. ................................................. 274
4.1.
4.2.
4.3.
4.4.
4.5.
4.6.
4.7.
4.8.

................................. 274
............... 282
................. 288
............................. 290
.................. 295
...................... 300
ISO 8601 ...................... 303

- ..................................... 308
4.9. ............................................... 312
4.10. ...................................... 317

4.11. ..................................... 323


4.12. ....................... 325
4.13. ISBN.................................................... 328
4.14. ............................................ 337
4.15. , ......... 338
4.16. ,
...................................... 339
4.17. , ............ 339
4.18.
, ................................................ 341
4.19. .................................... 346
4.20. ... 353

5. , ................................... 361
5.1.
5.2.
5.3.
5.4.
5.5.

............................................... 361
...................................... 364
........................................................ 367
, ................... 371
,
............................... 373
5.6. ,
.................... 375
5.7. .................................... 379
5.8. .............................................. 387
5.9. ........................................ 389
5.10. ,
..................................... 395
5.11. ,
................................. 397
5.12.
...................................................... 398
5.13.
................................................... 402
5.14. ......... 403

6. ..................................................................................... 409
6.1.
6.2.
6.3.
6.4.
6.5.
6.6.
6.7.

................................................................... 409
............................................... 413
.............................................................. 416
.................................................. 418
....................................... 419
.......... 427
........................................................ 429

6.8. .............................. 433


6.9. ................................................................ 434

7. URL, ............................................... 438


7.1. URL ..................................................... 438
7.2. URL .............................................. 442
7.3. URL, .......... 444
7.4. URL, ............. 446
7.5. URL ............................... 448
7.6. URN ....................................................... 449
7.7. URL ............................... 452
7.8. URL ...................................... 458
7.9. URL ............................. 460
7.10. URL ........................................ 462
7.11. URL....................................... 465
7.12. URL ......................................... 467
7.13. URL .................................... 471
7.14 URL .......................................... 472
7.15. ................................................. 473
7.16. IPv4 .......................................... 476
7.17. IPv6 .......................................... 479
7.18. Windows ............................................... 495
7.19. Windows ............................... 498
7.20. Windows ............... 503
7.21.
UNC ..................................................... 504
7.22. Windows ........................ 506
7.23. Windows.......................... 508
7.24. Windows........ 510
7.25. ............... 511

8. ................................................... 513
8.1. XML............................................................. 521
8.2. <b> <strong>....................................... 541
8.3. XML- ,
<em> <strong> ...................................... 545
8.4. XML .......................................... 549
8.5. HTML
<p> <br> ........................................... 557
8.6.
XML- ..................................................... 560
8.7. cellspacing <table>,
.............................................. 566

8.8. XML- ............................. 569


8.9. XML- ........................ 574
8.10. , CSV .......... 579
8.11. CSV ................. 584
8.12. INI .................. 588
8.13. INI ................................ 589
8.14. - INI ............... 591

.............................................................. 593


.

, . , . .
, . ,
. , , ,
, ,
-. , , .
, , ,
, , , ,
.
, ,

. ,
,
.

12

,
, ,
,
. , ,
.
. -

- . , .
, ,
Perl, , . Perl ,
Perl .
,
, . , , , .
,

,.


, , , , .
. ,
.
- ,
.

13

,
, . ,
, ,
, . ,
. ,
.
, . , - , regex ( ),
, . , , , .
3 ,
. ,
,
.


.NET, Java, JavaScript, PCRE, Perl, Python Ruby . , .
. ,
, , .
, ( 3) ,
C#, Java, JavaScript, PHP, Perl,
Python, Ruby VB.NET. . , , , ,
, ,
.

14



, . , - .
1 ,
, .
2 ,
.
3
,
, .
4
, , , , .
5 , ,
.
6 ,
.
7 URL, , ,
Windows .
8 HTML, XML, CSV (comma-separated
values , ) INI.

, URL, , .

;
, ;

15

, , , . , .

, , ,
, .


, .
, , .

, ,
. .

, .
...
, , . , .
CR , LF CRLF

CR, LF CRLF , , \r, \n \r\n. Enter


, C#,
, , Python.
Return Enter, . , , . Enter
.

16

, .
.


.
. , . ,
, . OReilly. , . .
,
.
, ISBN. : Regular Expressions Cookbook by Jan Goyvaerts and Steven Levithan. Copyright 2009
Jan Goyvaerts and Steven Levithan, 978-0-596-2068-7.


permissions@oreilly.com.

Safari Books Online



Safari Books Online, ,
OReilly Network Safari Bookshelf.
Safari , .
, , , ,
. http://
safari.oreilly.com.

17


, , :
OReilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 ( )
707-829-0515 ( )
707-829-0104 ()
, :
http://www.regexcookbook.com

http://oreilly.com/catalog/9780596520687

:
bookquestions@oreilly.com
, ,
OReilly :
http://www.oreilly.com

(Andy Oram), OReilly Media, Inc., , . (Jeffrey Friedl), (Zak Greant), (Nikolaj Lindberg) c (Ian Morse), ,
.


, - ,
, .
, :
4 8.
. , ( , ), , . , , ,
, , ,
.

, , , , .



, .
, ,
;
, ; , , -

20

1.

; .
, , .



, , . (). , (backtracking).
,
grep, . , , Perl .
(). . -
: , ,
.

, , . , , , , . , 4.1, , ,
.
,
, , 1:
1


: http://regex.info/blog/2006-09-15/247.

21

-, , : -, ,
, .

, , , ,
, . . . , .


, .
. . ,
,
, . , ,
, , ,
, . .
, . , , ?
, , , Perl.
Perl. , .
. regex regexp, regexes .
-
. , .
, . ,

.

.
3; , . ,

22

1.

. , , , .

,

, . Perl. . , , ,
.
, .
-
, . , ,
, .
, ,
. , , ,
, :
Perl
, Perl , .
Perl 5.6, 5.8 5.10.
, , , Perl ,
Perl. , Perl, .
, ,
. Perl.
PCRE
PCRE Perl-Compatible Regular Expressions (Perl- ),
C (Philip Hazel).
: http://www.pcre.org.
PCRE 4 7.

23

, PCRE Perl,
, , , Perl. , , ,
Perl , Perl.
PCRE . PHP Delphi. , Perl , , PCRE.
.NET
Microsoft .NET Framework Perl
System.Text.RegularExpressions. .NET 1.0 3.5. , System.Text.RegularExpressions: 1.0 2.0.
.NET 1.1, 3.0 3.5 Regex
.
,
.NET, C#, VB.NET, Delphi for .NET COBOL.NET,
.NET. ,
.NET, , , .NET, , Perl. Visual Studio (VS). VS - , ,
Perl.
Java
Java 4 , , java.
util.regex. Java. , ,
Perl- , ,
C. java.
util.regex, Java 4, 5 6.

24

1.

,
Java , , , Java.
JavaScript
JavaScript , ECMA262 3.
ECMAScript, - JavaScript JScript. : Internet
Explorer 5.5 8.0, Firefox, Opera Safari ECMA-262.
.
, .
- ,
-, JavaScript
,
. Microsoft VBScript Adobe ActionScript.
Python
Python re. Python 2.4 2.5.
Python .
Ruby
Ruby
, Perl. Ruby 1.8 1.9. Ruby 1.8 ,
Ruby. Ruby 1.9 Onigurama.
Ruby 1.8 Onigurama, Ruby 1.9
Ruby.
Ruby Ruby 1.8, Onigurama
Ruby 1.9.
, Ruby
,
a++ . Ruby 1.8, ,
, Ruby 1.9
, .

25

Onigurama
Ruby 1.8, ,
. , , (?m) ,
(?s).



, . , ,
, , , .
, , , .
, , , . , 2.20 2.21.
,
2.22. 3 3.16 , .



, , , . .
, . .
,

.

, .
, ,
, -

26

1.

, PCRE. PCRE , , . PCRE , . ,


PCRE,
. , -.
, .
:
Perl
Perl ,
s/regex/replace/. Perl .
Perl 5.6 Perl 5.10.


.
PHP
PHP preg_replace.
PCRE PHP
.
,
PCRE, PHP . ,
,

PHP .
PHP ereg_replace.
(POSIX ERE)
. ereg PHP .
.NET
System.Text.RegularExpressions , . .NET .NET. .NET . -

27

, .NET 2.0,
.
Java
java.util.regex . Java 4, 5 6.
.
JavaScript
JavaScript
, , ECMA-262 3.
Python
re Python
sub(). Python Python. Python 2.4 2.5. Python
.
Ruby
Ruby , .
Ruby 1.8 1.9. Ruby 1.8
, Ruby,
Ruby 1.9 Onigurama. Ruby 1.8 Onigurama, Ruby 1.9
Ruby.
Ruby Ruby 1.8,
Onigurama Ruby 1.9.
Ruby 1.8 1.9 , , Ruby 1.9

. , Ruby 1.9.



,
-

28

1.

. , ,
( UNIX).
.
3 , .
,

. 3.1.
,
.
, , , , .

.

RegexBuddy
RegexBuddy (. 1.1) , ,
, . , ,
.
RegexBuddy (Jan Goyvaerts), . RegexBuddy , RegexBuddy
OReilly.
(. 1.1) ,
, , RegexBuddy. . .
- , , RegexBuddy.

29

, .
Create () .

. , Insert Token ( ),
. , ,
RegexBuddy .

. 1.1. RegexBuddy

Test
() Highlight () RegexBuddy , .
,
, :
List All ( )
.

30

1.

Replace ()
Replace (), ,
. Replace () Test () ,
.
Split () ( Test, )
, .
List All ( )
Update Automatically ( ),
.
, ( )
, Test ()
, , Debug (). RegexBuddy Debug () . , , , .
,
, .
Use ()
. RegexBuddy
. .

, GREP (, ) .

, , , RegexBuddy
Paste (), ,
. RegexBuddy
,
,

31

.
, Copy (), , .
Library () .
. .

Forum () Login ().
RegexBuddy , ,
OK
RegexBuddy. .
RegexBuddy Windows 98, ME, 2000, XP Vista. Linux Apple
VMware, Parallels, CrossOver
Office , , WINE.
RegexBuddy http://www.regexbuddy.com/RegexBuddyCookbook.exe. , .

RegexPal
RegexPal (. 1.2) -, .
(Steven Levithan), . ,
, -. RegexPal JavaScript.
JavaScript, -,
-.
- ,
http://www.regexpal.com Enter regex here. RegexPal
, . RegexPal ,
JavaScript. -
, RegexPal
.

32

1.

. 1.2. RegexPal

Enter test data


here, RegexPal ,
.
, , RegexPal - .

-

- . , 3 , . , , .

regex.larsolavtorvik.com
(Lars Olav Torvik) , ,
http://regex.larsolavtorvik.com (. 1.3).

33

. 1.3. regex.larsolavtorvik.com

, . PHP PCRE, PHP POSIX JavaScript. PHP


PCRE PCRE, PHP preg. POSIX , ereg PHP, . JavaScript JavaScript .
Pattern ()
Subject ( ).
Matches () , . Code ( ) , .

34

1.


. , , Result ().
- Ajax,

.
, PHP , .
.
. , , .
,
, , . 3.

Nregex
http://www.nregex.com (. 1.4) -, , (David Seruyange) .NET.
, , ,
.NET 1.x, , .
. Regular Expression
( ), . , , If I just had $5.00
then "she" wouldn't be so @#$! mad..
-, URL Load Target From URL ( URL) Load (), . , , Browse (), Load (), .
Matches & Replacements (
), , , . - Replacement String ( ),
. (...).
.NET, , . ,

35

. 1.4. Nregex

, - . Manually Evaluate Regex ( ), . Evaluate


(), Matches & Replacements ( ).

Rubular
http://www.rubular.com -, (. 1.5), Ruby 1.8.

36

1.

. 1.5. Rubular

Your regular expression ( ), .


, i , . , m . im . ,
Ruby, /regex/im,
Ruby.
Your test string (
) . Match result ( )
, .

myregexp.com
(Sergey Evdokimov) Java,
. http://www.myregexp.
com (. 1.6) -
. Java-, . , Java 4 ( ). -

37

. 1.6. myregexp.com

java.util.regex, Java 4. Java .


Regular Expression ( ). Flags (). ,
.
,
Java, , myregexp.com
Edit () Paste Regex from
Java String ( Java).
Copy Regex for Java Source (
Java). Edit
() JavaScript XML.

38

1.

, :
Find ()
, . Matcher.find() Java.
Match ()
, . ,
. String.
matches() Matcher.matches().
Split ()
, String.split() Pattern.split()
.
Replace ()
,
String.replaceAll() Matcher.replaceAll().
,
, , http://www.myregexp.com. Eclipse, IntelliJ IDEA.

reAnimator
reAnimator http://osteele.com/tools/reanimator
(. 1.7), (Oliver Steele), .
,
,
.
, reAnimator, . , . ,
reAnimator,
, , ,
. , , reAnimator, .
. 20.

39

. 1.7. reAnimator

Pattern ()
Edit (). Pattern () Set ().
Input ().

, , . , , . , . ,
.
reAnimator , ,
^ $ .
, .

40

1.



Expresso
Expresso ( espresso (), ) .NET,
. http://www.ultrapico.com/Expresso.htm. .NET 2.0
.
60- .
, Expresso .
, Ultrapico
. .
Expresso . 1.8. Regular Expression ( ), , . . Regex Analyzer ( )
. .
Design Mode ( )
,
Ignore Case ( ). , . , , Undock
() , . ,
Test Mode ( ).
Test Mode ( ) Run Match ( ),
Search Results ( ) .
. , .
Expression Library ( )
. , Run Match (-

41

. 1.8. Expresso

). Library () .

The Regulator
Regulator, http://
sourceforge.net/projects/regulator,
.NET, . .NET 2.0 . , .NET 1.x. Regulator ,

.

42

1.

. 1.9. The Regulator

Regulator (. 1.9). New Document ( ) , . , ,


, .
, .
. , , .
Input (), ,
. Replace with ( ) , .
. , Match (), Replace ()
Split () .
.
.
Regex Analyzer ( )
,
.
, Regex Analyzer (
) View (), .
.

43

grep
grep g/re/p, ed
UNIX, , .
UNIX, grep,
. UNIX, Linux OS X, man grep ,
.
Windows, , grep, .

PowerGREP
PowerGREP (Jan Goyvaerts), . , ,
grep Windows (. 1.10).
PowerGREP
, , . RegexBuddy
JGsoft.
,
Clear () Action () Search (), Action (). File Selector ( ) , File Selector ( ) Include File or Folder ( ) Include Folder and Subfolders ( ).
Execute () Action () .
, Action type ( ), Action (), Search-andreplace ( ). Search () Replace ().
. .
PowerGREP ,
. ,
, grep, -

44

1.

. 1.10. PowerGREP

PowerGREP,
.
PowerGREP Windows 98, ME, 2000, XP
Vista.
http://www.powergrep.com/PowerGREPCookbook.exe. ,
15 . , Results (),
, .

Windows Grep
Windows Grep (http://www.wingrep.com) grep- Windows. (. 1.11),
, .
POSIX ERE.
, , , . Windows Grep -

45

. 1.11. Windows Grep


(shareware), , , , .
, Search ()
Search ().
Options () : Beginner Mode ( ) Expert Mode ( ). , .
Windows Grep
, . , ,
.
, All Matches ( )
View ().
, Search () Replace ().

46

1.

RegexRenamer
RegexRenamer (. 1.12)
grep-. ,
. http://regexrenamer.sourceforge.net. RegexRenamer 2.0 Microsoft .NET.
Match (), Replace (). , /i,
/g ,
.
/x ,
,
.
, , .

. , . , -

. 1.12. RegexRenamer

47


, , .
,
. , EditPad Pro,
, . .
, :
Boxer Text Editor (PCRE)

Dreamweaver (JavaScript)

EditPad Pro ( ,
, . RegexBuddy
JGsoft)
Multi-Edit (PCRE, Perl)
NoteTab (PCRE)

UltraEdit (PCRE)

TextMate (Ruby 1.9 [Oniguruma])



, , , . , , , , . , ,
. ,
,
. , , , 2.1.
,
. ,
.
,
, , Mastering Regular Expressions
(Jeffrey E. F. Friedl),
OReilly,1 .
,
. .
,
. , ,

, 3- . . .
.: -, 2008.

49

2.1.

4 8,
.

- , . . 3,
,
. ,
, , . 22.

2.1.

, : The punctuation characters in the


ASCII table are: !#$%&()*+,-./:;<=>?@[\]^_`{|}~.

ThepunctuationcharactersintheASCIItableare:
!#\$%&\(\)\*\+,-\./:;<=>\?@\[\\]\^_`\{\|}~

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, $()*+.?
[\^{|, . , Mary had a little lamb, Maryhadalittlelamb . , Regular Expression ( ) .

, , .
,
, , .
\$\(\)\*\+\.\?\[\\\^\{\|


$()*+.?[\^{|

50

2.

, : ], - }.
,
[, } {.
}. , [ ], 2.3.
-
, , , .
-
, .
, ,
, .
, .
. ,
,
.


ThepunctuationcharactersintheASCIItableare:
\Q!#$%&()*+,-./:;<=>?@[\]^_`{|}~\E

:
: Java 6, PCRE, Perl

Perl, PCRE Java \Q \E .


\Q , , \E .
\E , , \Q , .

\Q...\E ,
, \.\.\..
, Java 4 5 , , .
,
\Q...\E ,
PCRE, Perl Java 6.
Java 6, ,
PCRE Perl.

51

2.1.


ascii
:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
(?i)ascii
:
: .NET, Java, PCRE, Perl, Python, Ruby
. regex regex, Regex, REGEX
ReGeX. regex
, .

. , , ,
. 3.4, , , , .

, (?i) , (?i)regex .
.NET, Java, PCRE, Perl, Python Ruby.

.NET, Java, PCRE, Perl Ruby , . , sensitive(?i)


caseless(?-i)sensitive sensitiveCASELESSsensitive,
SENSITIVEcaselessSENSITIVE. (?i) , (?-i) . .

2.10 ,
.

.
2.3 5.14.

52

2.

2.2.

, ,
ASCII: bell, escape,
form feed, line feed, carriage return, horizontal tab, vertical tab. ASCII: 07, 1B, 0C,
0A, 0D, 09, 0B.

\a\e\f\n\r\t\v

:
: .NET, Java, PCRE, Python, Ruby
\x07\x1B\f\n\r\t\v

:
: .NET, Java, JavaScript, PCRE, Python, Ruby
\a\e\f\n\r\t\0x0B

:
: .NET, Java, PCRE, Perl, Python, Ruby


ASCII .
. ,
. . 2.1 .
2.1.

\a
\e
\f
\n
\r

bell

0x07

escape

0x1B

form feed

0x0C

line feed (newline)

0x0A

carriage return

0x0D

53

2.2.

\t
\v

horizontal tab

0x09

vertical tab

0x0B

ECMA-262 \a \e .
JavaScript ,
\a \e . Perl \v ( ), Perl (\x0B) (\013) .

, , , .


26
\cG\x1B\cL\cJ\cM\cI\cK
:
: .NET, Java, JavaScript, PCRE, Perl, Ruby 1.9

\cA \cZ 26 ,
1 26 ASCII. c . , ,
,
. Java .
, , Ctrl . Ctrl-H (backspace).
\cH .

Python Ruby Ruby 1.8,


Onigurama Ruby 1.9.
escape, 27 ASCII,
, \x1B .

7-
\x07\x1B\x0C\x0A\x0D\x09\x0B

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

54

2.

\x ,
,
ASCII. . 2.1 \x00 \x7F ASCII. ,
, .

\x80 \xFF ,
. \x80 \xFF .
, 2.7.

Ruby 1.8 PCRE, UTF-8,


. Ruby 1.8 PCRE UTF-8 8- .
. \xAA 0xAA ,
0xAA .

. 2.1. ASCII

.
2.7.

2.3.

,
calendar, , . -

2.3.

55

,
a e. , . ,
, .

calendar
c[ae]l[ae]nd[ae]r

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


[a-fA-F0-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


[^a-fA-F0-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, , .
, .
, a e. . calendar, a, e a.
.
: \, ^, - ]. Java .NET [
. . [$()*+.?{|] , .

, ,
. , -

56

2.

.
, .
, . [][^-]
, ,
JavaScript, . ;
: [\]\[\^\-] . .

-
, , (
). , 2.2, , . , . , [\r\n]
(\r) (\n).

(^) ,
. ,
. , .
(-) , . ,
, , , ,
.
, , ASCII .
[A-z] ,
ASCII A z. ; , [A-Z\[\\\]\^_`a-z]
, . , , .

, [z-a] , .

[a-fA-F\d]

57

2.3.

:
: .NET, Java, PCRE, Perl, Python, Ruby
, ,
. ,
. \d [\d]
. , , , ,
. , \D
, , [^\d] .

\w .
, . , . , ,
. \W , .

Java, JavaScript, PCRE Ruby \w [a-zA-Z0-9_] . .NET Perl


- (,
). Python , UNICODE U. \d . .NET Perl
, Python ,
UNICODE U.

\s . , . .NET, Perl
JavaScript \s , . ,
JavaScript \s , \d \w
ASCII. \S , \s .

\b . \
b ,
. , \b , \w ,
ASCII, \w ASCII, .
. 70, 2.6.

58

2.


(?i)[A-F0-9]

:
: .NET, Java, PCRE, Perl, Python, Ruby
(?i)[^A-F0-9]

:
: .NET, Java, PCRE, Perl, Python, Ruby
,
( 3.4) ( 2.1),
. ,
.
JavaScript , (?i) . JavaScript, /i .

,
.NET
[a-zA-Z0-9-[g-zG-Z]]

, . - , , , g z.

: [class-[subtract]] .

, , . , \p{IsThai}
. \P{N} ,
Number.
10 .

,
Java
[a-f[A-F][0-9]]
[a-f[A-F[0-9]]]

Java .

.

59

2.4.

, . , , :
[\w&&[a-fA-F0-9\s]]

.
. . , . , [g-zG-Z_] , :

[a-zA-Z0-9&&[^g-zG-Z]]

. - ,
, , g z. : [class&&[^subtract]] .

,
, , . , \p{IsThai} . \p{N} , Number.
[\p{InThai}&&[\p{N}]] 10 .

\p , 2.7.
.
2.1, 2.2 2.7.

2.4.

, ,
.
, , . , .

60

2.

,
.

: ( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
.

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
[\s\S]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby



. .
, .
, , ,
. ,
, , , . , ,

. , 3.21 .
(Larry Wall), Perl, Perl
,
(\n).
, , .
. , .

61

2.4.

,

, . . Perl (single line mode), Java
(dot all mode).
3.4 . , , . , .
JavaScript, , . 2.3, \s , \S ,
\s . ,
[\s\S] , , . [\d\D] [\w\W] .


, . \d\d.\d\d.\d\d
. 05/06/08,
99/99/99. , 12345678.

,
, .
. \d\d[/.\]\d\d[/.\-]\d\d -
, . -
99/99/99, 12345678 .

,
.
. , , , -
.

, . .

62

2.

(?s).

:
: .NET, Java, PCRE, Perl, Python
(?m).

:
: Ruby

, . 2.1
. 51 , JavaScript .

(?s)
.NET, Java, PCRE, Perl Python. s
single line ( ), Perl .
,
Ruby . Ruby (?m) . , . Ruby 1.9 (?m) . (?m) Perl 2.5.

.
2.3, 3.4 3.21.

2.5.
/

. alpha,
. omega, . begin,
.
end, .

63

2.5. /


^alpha

: ( ^ $
)
: .NET, Java, JavaScript, PCRE, Perl, Python
\Aalpha

:
: .NET, Java, PCRE, Perl, Python, Ruby


omega$

: ( ^ $
)
: .NET, Java, JavaScript, PCRE, Perl, Python
omega\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby


^begin

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


end$

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^ , $ , \A , \Z \z . - ,
,
.
,
.
,
. , ,

64

2.

one two, ,
four:
one
two
four

one
two
four.

\A , .
, .
\A , , . A
.

JavaScript \A .

^ \A
, ^ $
.
, Ruby. Ruby
.
JavaScript,
^ \A . \A ,
.

\Z \z . \Z \z , ,
.

.NET, Java, PCRE, Perl Ruby \Z \z . Python \Z .


JavaScript .

\Z \z ,
. \Z
, , . , omega\Z , . ,

65

2.5. /

, ; \Z . \z , ,
.

$ \Z ,
, ^ $
.
, Ruby. Ruby . \Z , $ , .

,
Perl. , $/ ( ) \n, Perl , ( ):
$line = <>;

Perl $line.
, endofinput.\z ,
.
endofinput.\Z endofinput.$ , .

, Perl :
chomp $line;


. ( chomp
- .)
JavaScript, \Z $ . \Z
,
.

^ , \A . Ruby ^
. ,
. .

66

2.

, .
, , 2.4. ,
. .

^ .
,
, , . \n^ , ^
\n .

$
,
\Z . Ruby $ . ,
.

$ . (, , .) $\n , $ \n .


, . ,
.
, .
. \A \Z - . ^ $ ^ $ -
.

67

2.5. /


. \A\Z , , . \A\z
. ^$ ^ $ .

(?m)^begin

:
: .NET, Java, PCRE, Perl, Python
(?m)end$

:
: .NET, Java, PCRE, Perl, Python
^ $
, . JavaScript 2.1 . 51.
(?m) ^ $
.NET, Java, PCRE, Perl Python. m
multiline () , Perl ^ $ .
, ,
Ruby . (?m) Ruby
.
Ruby (?m)
^ $ . Ruby ^ $
.

, Ruby
^ $
. JavaScript,
.

EditPad Pro PowerGREP. ^ and $ match at line


breaks ( ^ $ ), dot matches newlines ( ). -

68

2.

(?-m) , , ,
\A \Z .

.
3.4 3.21.

2.6.

, cat
My cat is brown,
category bobcat. ,
cat staccato,
.


\bcat\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


\Bcat\B

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b . . . \b ,
, , .

, \b :
,
.

69

2.6.

, .

, , .
, , ,
. ,
, , . \b
, \b . \b \bx !\b . \b x\b
\b! . x\bx
!\b! .


, , \bcat\b . \b , c , . \b , t
, .

. \b , .
,
. ,
,
. \b (?m) , ,
^ $
.

\B ,
\b . , \B ,
.

, \B :
,
.

, .

70

2.

, .
.

\Bcat\B cat
staccato, My cat is brown, category bobcat.
,
( staccato, category bobcat, My cat
is brown), \Bcat cat\B \Bcat|cat\B . \Bcat cat
staccato bobcat. cat\B cat category ( staccato, \Bcat ). 2.8.



, . , . 2.3
. 57 , \w , . , \b .

, , , \b \B, , - , .

.NET, JavaScript, PCRE, Perl, Python Ruby


\b ,
\w , \W . \B
, \w ,
\W .

JavaScript, PCRE Ruby ASCII. \w [azA-Z0-9_] .


, , A Z , .
, .

.NET Perl .

71

2.7. , ,

, ,
.
Python . ,
ASCII, , UNICODE U. \b \w .

Java .
\w ASCII.
\b
. Java \b\w\b , , . \b\b , \b . \w+
, \w ASCII.

.
2.3.

2.7. ,
,


(), .
;
,
. 2.1.
,
, Currency Symbol ( ). .
,
, Greek Extended.
,
, .
, , ,
.

72

2.


\u2122

:
: .NET, Java, JavaScript, Python
Python,
, : u\u2122.
\x{2122}

:
: PCRE, Perl, Ruby 1.9
PCRE UTF-8,
PHP UTF-8 /u. Ruby 1.8
.

,
\p{Sc}

:
: .NET, Java, PCRE, Perl, Ruby 1.9
PCRE UTF-8,
PHP UTF-8 /u. JavaScript Python . Ruby 1.8 .


\p{IsGreekExtended}

:
: .NET, Perl
\p{InGreekExtended}

:
: Java, Perl
JavaScript, PCRE, Python Ruby
.


\p{Greek}

:
: PCRE, Perl, Ruby 1.9

73

2.7. , ,

PCRE 6.5
, , UTF-8. PHP UTF-8 /u. .NET, JavaScript Python
. Ruby 1.8 .


\X

:
: PCRE, Perl
PCRE Perl , , , .
\P{M}\p{M}*

:
: .NET, Java, PCRE, Perl, Ruby 1.9
PCRE UTF-8,
PHP UTF-8 /u. JavaScript Python . Ruby 1.8 .


.
, , ,
. ,
, .
U+2122 . \u2122
\x{2122} , .

\u . , U+0000
U+FFFF. \x
, U+000000 U+10FFFF.
U+00E0 \x{E0} , \x{00E0} . U+100000 -

74

2.

.

.

,

, .
. 30 , 7 :

\p{L}
.

\p{Ll}
, .

\p{Lu}
, .

\p{Lt}

, ,
.

\p{Lm}
, .

\p{Lo}
,
.

\p{PM}
, ( , , ).

\p{Mn}
, ,
(, , ).

\p{Mc}
, , ( ).

\p{Me}
, (, , ).

2.7. , ,

75

\p{Z}
-.

\p{Zs}
,
.

\p{Zl}
- U+2028.

\p{Zp}
- U+2029.


\p{S}

, ,
, , .

\p{Sm}
.

\p{Sc}
.

\p{Sk}
() .

\p{So}
,
,
.

\p{N}
.

\p{Nd}
0 9 , .

\p{Nl}
, , .

\p{No}
,
0...9 ( ).

\p{P}
.

\p{Pd}
.

76

2.

\p{Ps}
.

\p{Pe}

\p{Pi}
.

\p{Pf}
.

\p{Pc}
, , .

\p{Po}
, , ,
, , .

\p{C}

.

\p{Cc}
0x000x1F ASCII 0x80
0x9F Latin-1.

\p{Cf}
, .

\p{Co}
, .

\p{Cs}
UTF-16.

\p{Cn}
, .

\p{Ll} , Ll, lowercase letter ( ). \p{L} [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}] , letter ().

\P \p. \P{Ll} , Ll. \P{L} , - letter

77

2.7. , ,

(). [\P{Ll}\P{Lu}\P{Lt}\P{Lm}\P{Lo}] ,
. \P{Ll} , Lu ( , Ll), \P{Lu}
, Ll.
.



. .
U+0000 U+FFFF 105 :
U+0000U+007F
U+0080U+00FF
U+0100U+017F
U+0180U+024F
U+0250U+02AF
U+02B0U+02FF
U+0300U+036F
U+0370U+03FF
U+0400U+04FF
U+0500U+052F
U+0530U+058F
U+0590U+05FF
U+0600U+06FF
U+0700U+074F
U+0780U+07BF
U+0900U+097F
U+0980U+09FF
U+0A00U+0A7F
U+0A80U+0AFF
U+0B00U+0B7F
U+0B80U+0BFF
U+0C00U+0C7F
U+0C80U+0CFF
U+0D00U+0D7F
U+0D80U+0DFF
U+0E00U+0E7F
U+0E80U+0EFF
U+0F00U+0FFF
U+1000U+109F

<\p{InBasic_Latin}>
<\p{InLatin-1_Supplement}>
<\p{InLatin_Extended-A}>
<\p{InLatin_Extended-B}>
<\p{InIPA_Extensions}>
<\p{InSpacing_Modifier_Letters}>
<\p{InCombining_Diacritical_Marks}>
<\p{InGreek_and_Coptic}>
<\p{InCyrillic}>
<\p{InCyrillic_Supplementary}>
<\p{InArmenian}>
<\p{InHebrew}>
<\p{InArabic}>
<\p{InSyriac}>
<\p{InThaana}>
<\p{InDevanagari}>
<\p{InBengali}>
<\p{InGurmukhi}>
<\p{InGujarati}>
<\p{InOriya}>
<\p{InTamil}>
<\p{InTelugu}>
<\p{InKannada}>
<\p{InMalayalam}>
<\p{InSinhala}>
<\p{InThai}>
<\p{InLao}>
<\p{InTibetan}>
<\p{InMyanmar}>

78

2.
U+10A0U+10FF
U+1100U+11FF
U+1200U+137F
U+13A0U+13FF
U+1400U+167F
U+1680U+169F
U+16A0U+16FF
U+1700U+171F
U+1720U+173F
U+1740U+175F
U+1760U+177F
U+1780U+17FF
U+1800U+18AF
U+1900U+194F
U+1950U+197F
U+19E0U+19FF
U+1D00U+1D7F
U+1E00U+1EFF
U+1F00U+1FFF
U+2000U+206F
U+2070U+209F
U+20A0U+20CF
U+20D0U+20FF
U+2100U+214F
U+2150U+218F
U+2190U+21FF
U+2200U+22FF
U+2300U+23FF
U+2400U+243F
U+2440U+245F
U+2460U+24FF
U+2500U+257F
U+2580U+259F
U+25A0U+25FF
U+2600U+26FF
U+2700U+27BF
U+27C0U+27EF
U+27F0U+27FF
U+2800U+28FF
U+2900U+297F

<\p{InGeorgian}>
<\p{InHangul_Jamo}>
<\p{InEthiopic}>
<\p{InCherokee}>
<\p{InUnified_Canadian_Aboriginal_Syllabics}>
<\p{InOgham}>
<\p{InRunic}>
<\p{InTagalog}>
<\p{InHanunoo}>
<\p{InBuhid}>
<\p{InTagbanwa}>
<\p{InKhmer}>
<\p{InMongolian}>
<\p{InLimbu}>
<\p{InTai_Le}>
<\p{InKhmer_Symbols}>
<\p{InPhonetic_Extensions}>
<\p{InLatin_Extended_Additional}>
<\p{InGreek_Extended}>
<\p{InGeneral_Punctuation}>
<\p{InSuperscripts_and_Subscripts}>
<\p{InCurrency_Symbols}>
<\p{InCombining_Diacritical_Marks_for_Symbols}>
<\p{InLetterlike_Symbols}>
<\p{InNumber_Forms}>
<\p{InArrows}>
<\p{InMathematical_Operators}>
<\p{InMiscellaneous_Technical}>
<\p{InControl_Pictures}>
<\p{InOptical_Character_Recognition}>
<\p{InEnclosed_Alphanumerics}>
<\p{InBox_Drawing}>
<\p{InBlock_Elements}>
<\p{InGeometric_Shapes}>
<\p{InMiscellaneous_Symbols}>
<\p{InDingbats}>
<\p{InMiscellaneous_Mathematical_Symbols-A}>
<\p{InSupplemental_Arrows-A}>
<\p{InBraille_Patterns}>
<\p{InSupplemental_Arrows-B}>

2.7. , ,
U+2980U+29FF
U+2A00U+2AFF
U+2B00U+2BFF
U+2E80U+2EFF
U+2F00U+2FDF
U+2FF0U+2FFF
U+3000U+303F
U+3040U+309F
U+30A0U+30FF
U+3100U+312F
U+3130U+318F
U+3190U+319F
U+31A0U+31BF
U+31F0U+31FF
U+3200U+32FF
U+3300U+33FF
U+3400U+4DBF
U+4DC0U+4DFF
U+4E00U+9FFF
U+A000U+A48F
U+A490U+A4CF
U+AC00U+D7AF
U+D800U+DB7F
U+DB80U+DBFF
U+DC00U+DFFF
U+E000U+F8FF
U+F900U+FAFF
U+FB00U+FB4F
U+FB50U+FDFF
U+FE00U+FE0F
U+FE20U+FE2F
U+FE30U+FE4F
U+FE50U+FE6F
U+FE70U+FEFF
U+FF00U+FFEF
U+FFF0U+FFFF

79

<\p{InMiscellaneous_Mathematical_Symbols-B}>
<\p{InSupplemental_Mathematical_Operators}>
<\p{InMiscellaneous_Symbols_and_Arrows}>
<\p{InCJK_Radicals_Supplement}>
<\p{InKangxi_Radicals}>
<\p{InIdeographic_Description_Characters}>
<\p{InCJK_Symbols_and_Punctuation}>
<\p{InHiragana}>
<\p{InKatakana}>
<\p{InBopomofo}>
<\p{InHangul_Compatibility_Jamo}>
<\p{InKanbun}>
<\p{InBopomofo_Extended}>
<\p{InKatakana_Phonetic_Extensions}>
<\p{InEnclosed_CJK_Letters_and_Months}>
<\p{InCJK_Compatibility}>
<\p{InCJK_Unified_Ideographs_Extension_A}>
<\p{InYijing_Hexagram_Symbols}>
<\p{InCJK_Unified_Ideographs}>
<\p{InYi_Syllables}>
<\p{InYi_Radicals}>
<\p{InHangul_Syllables}>
<\p{InHigh_Surrogates}>
<\p{InHigh_Private_Use_Surrogates}>
<\p{InLow_Surrogates}>
<\p{InPrivate_Use_Area}>
<\p{InCJK_Compatibility_Ideographs}>
<\p{InAlphabetic_Presentation_Forms}>
<\p{InArabic_Presentation_Forms-A}>
<\p{InVariation_Selectors}>
<\p{InCombining_Half_Marks}>
<\p{InCJK_Compatibility_Forms}>
<\p{InSmall_Form_Variants}>
<\p{InArabic_Presentation_Forms-B}>
<\p{InHalfwidth_and_Fullwidth_Forms}>
<\p{InSpecials}>

. ,
, , 100% . .

80

2.

Currency . Basic_Latin
Latin-1_Supplement. Currency
Symbol.
\p{InCurrency} \p{Sc} .

, \p{Cn} .

.

\p{InBlockName} .NET
Perl. Java \p{IsBlockName} .

Perl Is,
In, . Perl \p{Script} \p{IsScript} , \p{InScript} .


, , .

. U+FFFF :
<\p{Common}>
<\p{Arabic}>
<\p{Armenian}>
<\p{Bengali}>
<\p{Bopomofo}>
<\p{Braille}>
<\p{Buhid}>
<\p{CanadianAboriginal}>
<\p{Cherokee}>
<\p{Cyrillic}>
<\p{Devanagari}>
<\p{Ethiopic}>
<\p{Georgian}>
<\p{Greek}>
<\p{Gujarati}>
<\p{Gurmukhi}>
<\p{Han}>
<\p{Hangul}>
<\p{Hanunoo}>
<\p{Hebrew}>
<\p{Hiragana}>

<\p{Katakana}>
<\p{Khmer}>
<\p{Lao}>
<\p{Latin}>
<\p{Limbu}>
<\p{Malayalam}>
<\p{Mongolian}>
<\p{Myanmar}>
<\p{Ogham}>
<\p{Oriya}>
<\p{Runic}>
<\p{Sinhala}>
<\p{Syriac}>
<\p{Tagalog}>
<\p{Tagbanwa}>
<\p{TaiLe}>
<\p{Tamil}>
<\p{Telugu}>
<\p{Thaana}>
<\p{Thai}>
<\p{Tibetan}>

2.7. , ,
<\p{Inherited}>
<\p{Kannada}>

81

<\p{Yi}>

,
. , Thai,
. , Latin,
. . , Japanese,
Hiragana, Katakana, Han Latin,
.
, , Common. , , ,
.


, .
U+0061 a , U+00E0 a . , .
U+0300 . . ,
U+0061 U+0300, , , , U+00E0. U+0300
U+0061.
, , . ,
,
, , .
, ,
, , , , . , . , .
U+0061 U+0300, , Java, \u0061\u0300,

82

2.

U+0061, a U+0300. .. .

Perl PCRE \X, . ,


- ..
, , , .
\P{M}\p{M}* , .
\X
, . \u00E0\
u0061\u0300, \u00E0, \u0061\u0300.

\P \p . ,
\P{Sc} , Currency Symbol. \P
, \p ,
, .

, \u , \x , \p \P , .
, , ,
, . , , , ,
(U+2122):
[\p{Pi}\p{Pf}\x{2122}]

:
: .NET, Java, PCRE, Perl, Ruby 1.9



, , ,
, . :
,
. , Greek Extended U+1F00 U+1FFF:
[\u1F00-\u1FFF]

83

2.7. , ,

: .NET, Java, JavaScript, Python


[\x{1F00}-\x{1FFF}]

:
: PCRE, Perl, Ruby 1.9
. ,
, . Greek:
[\u0370-\u0373\u0375\u0376-\u0377\u037A\u037B-\u037D\u0384\u0386
\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03F5\u03F6
\u03F7-\u03FF\u1D26-\u1D2A\u1D5D-\u1D61\u1D66-\u1D6A\u1DBF\u1F00-\u1F15
\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D
\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FBC\u1FBD\u1FBE\u1FBF-\u1FC1
\u1FC2-\u1FC4\u1FC6-\u1FCC\u1FCD-\u1FCF\u1FD0-\u1FD3\u1FD6-\u1FDB
\u1FDD-\u1FDF\u1FE0-\u1FEC\u1FED-\u1FEF\u1FF2-\u1FF4\u1FF6-\u1FFC
\u1FFD-\u1FFE\u2126]

:
: .NET, Java, JavaScript, Python
,
Greek http://www.unicode.org/Public/
UNIDATA/Scripts.txt :

1. ;.* . ,
.

2. ^ ,
^ $
\u
\u. \.\.
-\u .

3. , , \s+ , . . , \u / \u , ,
Scripts.txt.

,
, .
, .

84

2.

\x{}
:

1. ;.* . ,
.

2. ^
^ $
\x{,
\x{. \.\.
}-\x{ .

3. , , \s+ , } .
. ,
\x{ / \x{ ,
Scripts.txt.

:
[\x{0370}-\x{0373}\x{0375}\x{0376}-\x{0377}\x{037A}\x{037B}-\x{037D}
\x{0384}\x{0386}\x{0388}-\x{038A}\x{038C}\x{038E}-\x{03A1}
\x{03A3}-\x{03E1}\x{03F0}-\x{03F5}\x{03F6}\x{03F7}-\x{03FF}
\x{1D26}-\x{1D2A}\x{1D5D}-\x{1D61}\x{1D66}-\x{1D6A}\x{1DBF}
\x{1F00}-\x{1F15}\x{1F18}-\x{1F1D}\x{1F20}-\x{1F45}\x{1F48}-\x{1F4D}
\x{1F50}-\x{1F57}\x{1F59}\x{1F5B}\x{1F5D}\x{1F5F}-\x{1F7D}
\x{1F80}-\x{1FB4}\x{1FB6}-\x{1FBC}\x{1FBD}\x{1FBE}\x{1FBF}-\x{1FC1}
\x{1FC2}-\x{1FC4}\x{1FC6}-\x{1FCC}\x{1FCD}-\x{1FCF}\x{1FD0}-\x{1FD3}
\x{1FD6}-\x{1FDB}\x{1FDD}-\x{1FDF}\x{1FE0}-\x{1FEC}\x{1FED}-\x{1FEF}
\x{1FF2}-\x{1FF4}\x{1FF6}-\x{1FFC}\x{1FFD}-\x{1FFE}\x{2126}
\x{10140}-\x{10174}\x{10175}-\x{10178}\x{10179}-\x{10189}
\x{1018A}\x{1D200}-\x{1D241}\x{1D242}-\x{1D244}\x{1D245}]

:
: PCRE, Perl, Ruby 1.9

.
http://www.unicode.org - Unicode
Consortium, , , .
, .
Unicode Explained,
(Jukka K. Korpella) (OReilly).

2.8.

85

, , , .
, . ASCII .

2.8.

, Mary, Jane, and Sue went to Marys house


Mary, Jane, Sue Mary.
.

Mary|Jane|Sue

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


. Mary|Jane|Sue Mary,
Jane, Sue.
, .

, ,
, . , . 1 , -

, . , , , , , ,
. , , , , , 1.
Perl, ,
, .

86

2.

, .

Mary|Jane|Sue Mary,
Jane, and Sue went to Marys house,
Mary .
, Find Next ( ) , Mary . .
Jane , . Sue . .
,
.

J,
Mary . , J, Jane , Jane .

, Jane
, Mary,
Mary Jane . . . ,
, .

,
Sue.
Mary. , , , house.
,
. ,
Jane|Janet ,
Her name is Janet. . , Jane Janet Her name is Janet , .

Jane|Janet Jane Her name is Janet, ,

87

2.9.

, . , , ,
. ,
.

Jane|Janet J Her name is


Janet, , Jane .
. ,
t.
.

Jane Janet .
, :
Janet|Jane . ,
: , . , .

, \bJane\b|\bJanet\b \bJanet\b|\bJane\b
Janet Her name is Janet. . .

2.12 : \bJanet?\b .

.
2.9.

2.9.

, Mary, Jane
Sue, . , .
,
yyyy-mm-dd, ,
. , , . , ,
, .
, 9999-99-99, .

88

2.

\b(Mary|Jane|Sue)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\b(\d\d\d\d)-(\d\d)-(\d\d)\b

: None
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, , . , : \bMary|Jane|Sue\b ,
: \bMary , Jane Sue\b .
Jane Her
name is Janet.

- , . . , . \b(Mary|Jane|Sue)\b , Mary , Jane Sue , .


Her name
is Janet.

J
Janet ,
. . , Mary , . , Jane , . . \b .
e t . J .

, .
Mary-Jane-Sue
, . , , :
\b(\d\d\d\d)-(\d\d)-(\d\d)\b .

yyyy-mm-dd.
\b\d\d\d\d-\d\d\d\d\b . -

89

2.9.

, , .

\b(\d\d\d\d)-(\d\d)-(\d\d)\b
. , .
(\d\d\d\d) 1.
(\d\d) 2. (\d\d) 3.

, , , , .
2008-05-24, 2008
, 05
24 .
. 2.10 , . 2.21 , . 3.9, , ,
.

\b(Mary|Jane|Sue)\b .
:
\b(?:Mary|Jane|Sue)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(?: . ) .
, .

, , . : , ,
.

90

2.


.
( 2.10), ( 2.21) ( 3.9), ,
, .

,
.


2.1 , .NET, Java, PCRE, Perl Ruby , : sensitive(?i)caseless(?-i)sensitive . , (?i) ,
.

:
\b(?i:Mary|Jane|Sue)\b

:
: .NET, Java, PCRE, Perl, Ruby
sensitive(?i:caseless)sensitive

:
: .NET, Java, PCRE, Perl, Ruby
, ,
, .
. ,
(?i:...)
.
, : (?ism:group) . , , : (?-ism:group)
. (?i-sm:group)
(i) (s) ^ $
(m). 2.4 2.5.

91

2.10.

.
2.10, 2.11, 2.21 3.9.

2.10.

,
yyyy-mm-dd. ,
, . , 2008-08-08. , , , . ,
9999-99-99, . .

\b\d\d(\d\d)-\1-\1\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, , . , 2.9.
.
, . 10 99 \10 \99 .

\01 . .
, \xFF .

\b\d\d(\d\d)-\1-\1\b , 2008-08-08, \d\d


20. , .

92

2.

\d\d 08 , . 08
1. , . . : 08. .
, . .
. , 2008-08-08. -
08.
( 2.12) ( 2.13), . ,
.
2008-05-24 2007-07-07, , \b\d\d(\d\d) 2008, 08
( ) . . 08 05 .

, . . , 0 , \1
.

,
\b\d\d(\d\d) 2007,
07. .
07 . ,
,
. 2007-07-07.

, . \b\d\d\1-(\d\d)-\1
\b\d\d\1-\1-(\d\d)\b . , . JavaScript,
, ,
.

93

2.11.

, , ,
, . , ,
. (^)\1 , ^ , ; \1 . , .

JavaScript ,
. JavaScript
, , ,
JavaScript, ,
, , , .
\b\d\d\1-\1-(\d\d)\b JavaScript
12--34.

.
2.9, 2.11, 2.21 3.9.

2.11.

,
yyyy-mm-dd ,
. , , . , , , . ,
year, month day.
,
yyyy-mm-dd. , , .
, 2008-08-08. (08 ) magic.
, , , . -

94

2.

, 9999-99-99, .


\b(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d)\b

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\b(?year\d\d\d\d)-(?month\d\d)-(?day\d\d)\b

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\b(?P<year>\d\d\d\d)-(?P<month>\d\d)-(?P<day>\d\d)\b

:
: PCRE 4 and later, Perl 5.10, Python


\b\d\d(?<magic>\d\d)-\k<magic>-\k<magic>\b

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\b\d\d(?magic\d\d)-\kmagic-\kmagic\b

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\b\d\d(?P<magic>\d\d)-(?P=magic)-(?P=magic)\b

:
: PCRE 4 and later, Perl 5.10, Python

2.9 2.10
. ,
. , .
.
, .
.

95

2.11.


. ,
, .
Python , .
: (?P<name>regex) .
, \w . (?P<name> ,
) .

Regex .NET
, . (?<name>regex) Python, P.
, \w . (?<name> , ) .

, XML, DocBook XML. .NET


: (?nameregex) . . , . .

.NET
Python .NET. Perl 5.10 Onigurama Ruby 1.9.
PCRE Python , Perl . PCRE 7, Perl 5.10, .NET,
Python. , - PCRE
, Perl 5.10 Python. PCRE Perl 5.10 .NET
Python .
, .
PHP
PHP,
PCRE, Python.
,
.NET Ruby, .NET,
. PHP/PCRE
Python. , PCRE,

96

2.

,
. , .NET Ruby,
P .
PCRE 7 Perl 5.10 Python, , . , PCRE
PHP.



. ,

,
.
.
Python name (?P=name) . , , , .
. (?P=name) , \1 .

.NET \k<name>
\kname . . , ,
, , .

.
, . Perl 5.10 Ruby 1.9 .NET, , .NET
. , .
, .

.
2.9, 2.10, 2.21 3.9.

2.12.

97

2.12.

, :
( , 100 ).
32- .

32- h.

,
. .

\b\d{100}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


\b[a-f0-9]{1,8}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


\b[a-f0-9]{1,8}h?\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


\d*\.\d+(e\d+)?

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

{n} , n ,
n . -

98

2.

\d{100} \b\d{100}\b 100


. , \d 100 .


{0} ,
. ab{0}c ac.

{1} ,
. ab{1}c abc .


{n,m} , n m , n. \b[a-f0-9]{1,8}\b
. .
2.13.

n m , . \b\d{100,100}\b
\b\d{100}\b .

{n,} , n , . , ,
.

\d{1,} , \d+ . , , . 2.13 , .

\d{0,} , , \d* . . ,
, .


, n , , , . h{0,1} h
. h,
h{0,1} . h{0,1} ,

99

2.12.

, h. h
( h).

h? , h{0,1} . , , , .
, .
,
, . Perl , , Perl
. , , , . ,
. , (? .


,
, . (?:abc){3}
abcabcabc .

. (e\d+)? e, , . , , .
. 2.9, ,
, , . (\d\d){1,3} , . . 123456, 56, 56 . , 12 34, .

(\d\d){3} , \d\d\
d\d(\d\d) . ,
, , , ,
, -

100

2.

: ((?:\d\d){1,3}) . ,
. : ((\d\d){1,3}) .
123456, \1 123456, \2 56.

.NET , . Value
, , 56,
.
56, CaptureCollection , 56, 34 12.

.
2.9, 2.13, 2.14.

2.13.

, <p> </p>
XHTML . XHTML.

<p>.*?</p>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, 2.12,
, , ,
.
XHTML ( XML , ). XHTML:

101

2.13.
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>

<p> </p>. <p> </p>, . , <em>, ,


<.
:
<p>.*</p>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
. , 2.12.
<p>
, .* . , . .
, .*
, . : .*
XHTML, .

.* , < . . :
.
, , .
. ,
, , .

< , , .* .
< . ,

102

2.

, <
. , < . <
, .

< , / . . , </p> .

? ,
XHTML, <p> </p>. , XHTML, <p> </p>,
.
.
, : *? , +? , ?? {7,42}? .

,
. , .
,
. , , .

<p>.*?</p> ,
XHTML. <p> , .*? . </p>
<p>, . .*? , . </p> , .*? .
, </p> .*?
. , .*?
XHTML.

103

2.13.

* *?
. ,
.
. .
, , . , , 2.12, - , . , ,
, , , . \d , \b , \d , ,
( ).

\d+\b \d+?\b
.
,
-.

\d+\b 1234 \d+ .


\b . \d+?\b , \d+? 1.
\b 1 2 . \d+? 12 \b . ,
\d+? 1234, \b .

1234X, , \d+\b , - 1234.


\b . \d+
, 123. \b . \d+ , , 1, \b .
.

\d+?\b 1234X
\d+?
1. \b
1 2 . \d+?
12, \b .
, \d+? 1234, -

104

2.

\b . \d+? , \d
X .
.

\d+ ,
,
. . , \b\d+\b , . , \b\d++\b ,
.

.
2.8, 2.9, 2.12, 2.14 2.15.

2.14.


, .
.

\b\d+\b ,
\b\d+?\b . .
. . , .

\b\d++\b

:
: Java, PCRE, Perl 5.10, Ruby 1.9

.
.
\b(?>\d+)\b

:
: .NET, Java, PCRE, Perl, Ruby

105

2.14.

.
,
, .
JavaScript Python , .
.

: . ,
,
,
. .
,
. , : *+ , ++ , ?+ {7,42}+ .

Java 4 java.util.regex. PCRE, ( 4 7), . Perl , Perl 5.10.


Ruby , Onigurama, Ruby 1.9, .
, .
, , ,
. : (?>regex) , regex .
, . , (?> .

\b\d++\b ( ) 123abc 456 \b


, \d++
123. \b\d+\b (
). 3 a
\b .

.
,

106

2.

, .
1.
,
, . , ,
2.5. , 4,
456.
,
\b
.
( ) \b 2 3, 1 2.


.
\b(?>\d+)\b () 123abc 456 . \d+ 123. , \d+ , . \b , ,
. , 456.

, , .
, ,
. x++ (?>x+) , .
, .


,

,
.

\w++\d++ (?>\w+\d+) .
\w++\d++ , (?>\w+)(?>\d+) ,

107

2.14.

abc123. \w++
abc123,
\d++ .
, , \d++ . .


(?>\w+\d+) , . . , . abc123 \w+ abc123. . \d+ , \w+ . \d+ 3. ,
\w+ \d+ .
, . .

, , , (?>\w+\d+)\d+ , , \w++\d++ . \d+ . , .


.
, ,
, .
, , . ,
.

.
2.12 2.15.

108

2.

2.15.

,
HTML, html, head, title
body .
HTML, .

<html>(?>.*?<head>)(?>.*?<title>)(?>.*?</title>)
(?>.*?</head>)(?>.*?<body[^>]*>)(?>.*?</body>).*?</html>

: ,
: .NET, Java, PCRE, Perl, Ruby
JavaScript Python . .
JavaScript Python
, , .

, :
<html>.*?<head>.*?<title>.*?</title>
.*?</head>.*?<body[^>]*>.*?</body>.*?</html>

: ,
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
HTML, . .*? , . , ,
. 2.4 2.13.

, , - HTML.
</html>.

109

2.15.

, .*? .
</html> , .*?
. ,
.

. .*? , . .*? , , </body>. , , </body> . .*? ,


.
.*? , .

O(n7)
, . , .
,
128 ,
.
.
, .
,
.
,
,

. ,

.

, .
.*? , </body> . </html> ,
html.

110

2.

, , : (?>.*?</body>) . , (?>.*?</body>) , </body> .


</html> , (?>.*?</body>) .

.*? ,
.
- , .
O(n), . , .

,
(x+x+)+y xxxxxxxxxx.
,
x. ,

.
x, Perl .

, , Perl ,

.
O(2n). y , x+ , . , , ,
x+ xxx, x+ x, x+ x. x 1024 .
32, 4 , , , , .

111

2.16.


, xx+y , , . .

, ,

. ,
.
() , , , .

.
2.13 2.14.

2.16.

, <b> </b>
HTML,
. , My <b>cat</b> is furry,
cat.

(?<=<b>)\w+(?=</b>)

:
: .NET, Java, PCRE, Perl, Python, Ruby 1.9
JavaScript Ruby 1.8 (?=</b>) , (?<=</b>) .


, ,
,
. -

112

2.

, ,
.
.
, , . (?<=text) .
(?<= . , text , .
, , (?<=<b>) , .

, ,
, , ,
. My <b>cat</b> is furry (?<=<b>) , c.
,
. <b>
c. , , . , , , . c . <b> . , ,
. .

\w+ . cat. \w+ - ,


cat . , \w+ cat, , .

,
. . , . (?=regex) . (?=
. , , regex .

113

2.16.

\w+ (?<=<b>)\w+(?=</b>) cat My <b>cat</b> is furry, .


, ,
, . </b> , . , . , , , .
cat. ,
cat.

(?!regex) . , , , ,
, , ,
.
.


. ,
. , .

(?<!text) . ,
, , .


.
, , . ,
, .

114

2.

. , .
. . : ,
, .
. Perl, Python Ruby 1.9 ,
. (?<=one|two|three|fortytwo|gr[ae]y) ,
.

Perl, Python Ruby 1.9


. , one|two , , gray|grey ,
, three , , , , forty-two .

PCRE Java .
, . ,
, * , + {42,} .
PCRE Java , .
. ,
,
, .

, . , . JavaScript Ruby 1.8,


.

2.16.

115

, .
.NET 1,
. ,
.



,
, - , -
.
, .
2.3 , . 58 ( 2.3) , Thai.
.NET
Java.
, ()
Thai ( ).
:
(?=\p{Thai})\p{N}

:
: PCRE, Perl, Ruby 1.9
, , 2.7.

, .

, RegexBuddy,
, () , RegexOptions.RightToLeft, .NET, .

116

2.


(?=\p{Thai})\p{N} , , , .
Thai ( \p{Thai} ), . ,
.

Thai, \p{Thai} . (?=\p{Thai}) . , . Thai . \p{N} . , \p{N} ,


\p{Thai} . Number ,
\p{N} . \p{N}
- ,
.


, , .
, , ,
. . 2.15 .
.
, , , . , ,
.

( ,
) . ,
, , . ,

117

2.16.

, . , , .
,
, ,
.
:
(?=(\d+))\w+\1

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
123x12. \d+ 12 , \w+ 3x,
, , \1 12.

.
.
\d+ 123.
. , , , 123, .

\w+ . 123x12. \1 , 123, . \w+ . \1 . \w+ ,


1. \1 , 1, .

12 \1 ,
123 12, .

.
\w+ , , , \d+ , . .

118

2.


,
Python JavaScript, .
,
, .
.
(<b>)(\w+)(?=</b>)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
<b> .
, \w+ , .

My <b>cat</b> is
furry, <b>cat. <b>, cat.
, cat (
<b>), , ,
, , .

,
,
. ,
, . ,
, .
2.21.
, . . , , , .
, ,
( \z $ ). . , , .

2.17.

119

JavaScript :
var mainregexp = /\w+(?=<\/b>)/;
var lookbehind = /<b>$/;
if (match = mainregexp.exec(My <b>cat</b> is furry)) {
// </b>
var potentialmatch = match[0];
var leftContext = match.input.substring(0, match.index);
if (lookbehind.exec(leftContext)) {
// :
// potentialmatch <b>
} else {
// :
// potentialmatch ,
}
} else {
// </b>
}

.
5.5 5.6.

2.17.

, one, two
three, .
.

\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}(?(1)|(?!))(?(2)|(?!))(?(3)|(?!))

:
: .NET, JavaScript, PCRE, Perl, Python
Java Ruby . Java Ruby ( ) ,
,
.
\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

120

2.

.NET, JavaScript, PCRE, Perl Python


. (?(1)then|else) , . , then . , else .

, . . then else . , - . .

then else . .
, then . , , .

, (?!),
else. ,
, , . (?(1)|(?!)) , -.
,
,
.
.NET .
(?(name)then|else) , name .

, ,
(a)?b(?(1)c|d) . , , abc|bd) .

a, .
. , , ,
. a ,

121

2.17.

-. .

(a?) , .
, .
a .

a , b . .
,
( ), c . d .

, (a)?b(?(1)c|d)
ab, c, b,
d.
.NET, PCRE Perl, Python, . (?(?=if)then|else) (?=if) , . , 2.16.
, then . else .
, then
else ,
if ,
.

. , , , then else.
, ,
, : (?=if)then|(?!if)else . ,
then .
.
. , if ,
, (?=if)
.
else . ,
if .

122

2.

.
2.9 2.16.

2.18.

\d{4}-\d{2}-\d{2} yyyy-mmdd, . , , .
,
, .

\d{4}
\d{2}
\d{2}

#
#
#
#
#

:
: .NET, Java, PCRE, Perl, Python, Ruby


. , , .
, , JavaScript, , .
,
. .
.NET RegexOptions.IgnorePatternWhitespace. Java Pattern.COMMENTS. Python re.VERBOSE, Perl Ruby /x.

123

2.18.

. (#), . ,
( , ).
, , .
, [#]
\# .

, , , ,
. ,
[] \ . \x20 \u0020 \x{0020} . , \t . \r\n (Windows) \n
(UNIX/Linux/OS X).


. .
, .
, .

Java,

, . Java .
Java . Java ,
. [] [#] . \u0020 \# .

124

2.

(?#Year)\d{4}(?#Separator)-(?#Month)\d{2}-(?#Day)\d{2}

:
: .NET, PCRE, Perl, Python, Ruby
- , (?#comment) . (?# ) .

, JavaScript, , . Java .
(?x)\d{4}
\d{2}
\d{2}

#
#
#
#
#

:
: .NET, Java, PCRE, Perl, Python, Ruby

,
(?x) . , (?x) .
.

2.1
. 51.

2.19.

,
, : $%\*$1\1.

$%\*$$1\1

: .NET, JavaScript

2.19.

125

\$%\\*\$1\\1

: Java
$%\*\$1\\1

: PHP
\$%\*\$1\\1

: Perl
$%\*$1\\1

: Python, Ruby


, .
, , . .
,
,
,
. $1 / \1 . 2.21 , .

, ,
.

.NET JavaScript
.NET JavaScript .
, .
. , , , , , ,
. , .
, ,
.
, .

126

2.
$$%\*$$1\1

: .NET, JavaScript
, .NET ,
. .NET ${group} . JavaScript .

Java
Java
. . , .

PHP
PHP , , , .
. , , \\\\ . .

Perl
Perl : . , , $1 , Perl . ,
, .

, Perl \1 . , ,
. , , , .

. , , \\\\ . .

2.19.

127

Python Ruby
Python Ruby . , ,
.

Python \1 \9 \g< . .
Ruby ,
, , , .
,
. , , \\\\ . .



,
.
.

, replace()
, . ,
, , , , , .
RegexBuddy , .
, .
, , . .

.
3.14.

128

2.

2.20.

, URL
HTML, , URL .
, URL http:, . , Please visit http://www.regexcookbook.com Please visit <a href=http://www.regexcookbook.com>http://www.
regexcookbook.com</a>.


http:\S+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


<ahref="$&">$&</a>

: .NET, JavaScript, Perl


<ahref="$0">$0</a>

: .NET, Java, PHP


<ahref="\0">\0</a>

: PHP, Ruby
<ahref="\&">\&</a>

: Ruby
<ahref="\g<0>">\g<0></a>

: Python

, .
, - ,
Python.

2.21.

129

Perl $& .
Perl .

.NET JavaScript $& . Ruby ,


\& .

Java, PHP Python


, , .
0.
Python,
.
Python \0 .

.NET Ruby
, ,
. .

.
1
3.15.

2.21.

10
, 1234567890.
, (123) 456-7890.


\b(\d{3})(\d{3})(\d{4})\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


($1)$2-$3

: .NET, Java, JavaScript, PHP, Perl

130

2.
(${1})${2}-${3}

: .NET, PHP, Perl


(\1)\2-\3

: PHP, Python, Ruby


2.10 , , . , , . , , .
, Python Ruby, \1 ,
.
Perl $1 , . PHP
.

Perl $1 , .
, . .NET, Java,
JavaScript PHP $1
.
. 3.

$10
, ,
99 .
$10 \10 . 10, , 0 .

.NET, PHP Perl


, . ${10} 10, ${1}0
, 0 .

Java JavaScript $10 . , -

2.21.

131

, . , , , . $23 23,
.
,
3 .

.NET, PHP, Perl, Python Ruby $10 \10 10 .


, ,
.


, ,
. $4 \4 , .
.

Java Python ,
. . (
.) , $4 \4
, . 2.19.

PHP, Perl Ruby , , . , , , .


, .NET JavaScript , .
,
, .
.

,

\b(?<area>\d{3})(?<exchange>\d{3})(?<number>\d{4})\b

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9

132

2.
\b(?area\d{3})(?exchange\d{3})(?number\d{4})\b

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\b(?P<area>\d{3})(?P<exchange>\d{3})(?P<number>\d{4})\b

:
: PCRE 4 , Perl 5.10, Python


(${area})${exchange}-${number}

: .NET
(\g<area>)\g<exchange>-\g<number>

: Python
(\k<area>)\k<exchange>-\k<number>

: Ruby 1.9
(\karea)\k'exchange'-\k'number'

: Ruby 1.9
($1)$2-$3

: .NET, PHP, Perl 5.10


(${1})${2}-${3}

: .NET, PHP, Perl 5.10


(\1)\2-\3

: PHP, Python, Ruby 1.9

,
.NET, Python Ruby 1.9 , .
.NET Python
, .
.
Ruby
, .
Ruby 1.9 : \k<group> \kgroup . .

2.22.

133

Perl 5.10 PHP ( PCRE) ,


. , , .
Perl 5.10 PCRE
.
.NET, Python Ruby 1.9 .
.NET , 2.11.
.NET, Python Ruby . , , . .

.
1
2.9, 2.10, 2.11 3.15.

2.22.

, , ,
,
. , BeforeMatchAfter
Match, BeforeBeforeMatchAfterAfter, BeforeBeforeBeforeMatchAfterAfterAfter.

$`$_$

: .NET, Perl
\`\`\&\\

: Ruby
$`$`$&$$

: JavaScript

134

2.

, . : ,
. ,
, , , , , . , .

.NET Perl $` , $ $_ ,

. Perl ,
, , . $` .
U.S. , 1, . $ .
. , U.S.,
Enter. $_
. .NET Perl JavaScript
$` $ . JavaScript , .
, $& .

Ruby \` \ , \& . JavaScript,


, .

.
1
3.15.





,
. ,
; .

.
, ,

.
.
,
, , , . , - , , , ,
.
4 8 , .
. , , ,
,
.

136

3.

,
, . . , ,
. ,

.
. 21 , .
. 25 , , . , , .

,
. , .
,
.
, .
C#
C# Microsoft
.NET. System.Text.RegularExpressions
.NET.
C# 1.0 3.5 Visual Studio
2002 2008.
VB.NET
VB.NET Visual Basic.NET,
, Visual Basic 2002 , Visual Basic 6 . Visual Basic Microsoft .NET. System.Text.RegularExpressions .NET. Visual Basic 2002
2008.
Java
Java 4 , java.util.regex. -

137

java.util.regex Java. Java 4, 5 6.


JavaScript
, JavaScript. -
: Internet Explorer (
5.5), Firefox, Opera, Safari Chrome. JavaScript .
, JavaScript , ECMA-262. ECMAScript, JavaScript JScript.
, ECMA-262v3 ,
JavaScript.
JavaScript.
PHP
PHP ,
. preg. preg,
PHP, 4.2.0. PHP 4 5. preg PHP PCRE.
PCRE PCRE.
PCRE , , PHP preg_replace().
PHP.
mb_ereg
PHP, ,
, . PHP 5 mb_ereg Onigurama,
Ruby.
Onigurama Ruby 1.9. mb_ereg ,
mb_ PHP.

138

3.

ereg
PHP
, PHP 5.3.0.

POSIX ERE.
.
POSIX ERE Ruby 1.9
PCRE. ,
ereg, mb_ereg preg. preg Perl
( 3.1).
Perl
, Perl , .
, m// s/// Perl, Perl. Perl 5.6,
5.8 5.10.
Python
Python re.
, ,
Python. Python 2.4 2.5.
Ruby
Ruby
. Ruby 1.8
1.9. Ruby
. Ruby 1.9
Onigurama, , Ruby 1.8. , ,
. 22.

Ruby 1.8 1.9.
, Ruby 1.9. Ruby,
, , , Ruby Onigurama. Ruby 1.8
Onigurama.

139


, , ,
.
, , .
ActionScript
ActionScript
ECMA-262, Adobe. 3.0,
ActionScript ECMA-262v3. JavaScript. ,
ActionScript JavaScript. , JavaScript,
ActionScript.
C
C . , , , ,
PCRE, . C http://
www.pcre.org. ,
.
C++
C++ . , , , , PCRE, .
C API
- C++,
PCRE (
http://www.pcre.org).
Delphi for Win32
Delphi
Win32 .
VCL,
. ,
PCRE. Delphi
C , VCL, PCRE,
. .exe.

140

3.

TPerlRegEx http://www.regexp.info/delphi.html. VCL,


, . PCRE Delphi
TJclRegEx, JCL (http://www.delphi-jedi.
org). TJclRegEx TObject, .
Mozilla Public License.
Delphi Prism
Delphi Prism , .NET.
System.Text.RegularExpressions uses
Delphi Prism, .
, ,
C# VB.NET.
Groovy
Groovy
, java.util.
regex, Java. , Java, Groovy.
Groovy .
, , java.lang.String,
=~ java.util.regex.Matcher.
Groovy Java .
PowerShell
PowerShell , Microsoft, .NET. -match
-replace, PowerShell, .NET, .
R
R grep, sub regexpr base. perl,
FALSE, . TRUE, -

141

PCRE, . , PCRE 7, R 2.5.0 . R ,


PCRE 4 . , R, .
REALbasic
REALbasic RegEx. PCRE UTF-8. , PCRE, TextConverter REALbasic UTF-8, RegEx.
, PCRE 6, REALbasic.
REALbasic ,
^ $ ( )
. , ,
,
REALbasic.
Scala
Scala scala.util.matching. java.util.
regex Java. , Java Scala,
Java.
Visual Basic 6
Visual Basic 6 Visual Basic,
.NET. , Visual
Basic 6 , .NET.
VB.NET, ,
VB 6.
Visual Basic 6 , ActiveX COM.
Microsoft VBScript,
, 5.5, . -

142

3.

, JavaScript, ECMA-262v3.
Internet Explorer 5.5 . ,
Windows XP
Vista, Windows, ,
Internet Explorer 5.5 . , Windows .
Visual
Basic, Project|References
VB. Microsoft VBScript
Regular Expressions 5.5,
Microsoft VBScript Regular Expressions 1.0. , 5.5, 1.0. 1.0 , .
, . View|Object Browser.
Object Browser VBScript_RegExp_55.

3.1.

[$\n\d/\\] ,
. , , , , , 0 9, . .

C#
:
[$\\n\\d/\\\\]

:
@[$\n\d/\\]

3.1.

143

VB.NET
[$\n\d/\\]

Java
[$\\n\\d/\\\\]

JavaScript
/[$\n\d\/\\]/

PHP
%[$\\n\d/\\\\]%

Perl
:
/[\$\n\d\/\\]/
m![\$\n\d/\\]!

:
s![\$\n\d/\\]!!

Python
:
r[$\n\d/\\]

:
[$\\n\\d/\\\\]

Ruby
, :
/[$\n\d\/\\]/

, ,
:
%r![$\n\d/\\]!


( , ), .
. , RegexBuddy

144

3.

RegexPal,
. ,
, .

, . ,
,
, ,
. , - . , , , . , . , , .
.

.

C#
C# Regex() - Regex. , , .
C# . , , C++ Java. . , ,
\n . RegexOptions.IgnorePatternWhitespace
( 3.4) , 2.18, \n \\n.
\n ,
. \\n , \n , .

@ . , . -

145

3.1.

, . @\n \n , , . \n ,
. .

:
C#, .

VB.NET
VB.NET
Regex() - Regex. , , .
Visual Basic .
. - .

Java
Java
Pattern.compile()
String. , ,
.
Java . .
, ,
\n , , \uFFFF .

Pattern.COMMENTS ( 3.4) , 2.18,


\n \\n. \n , .
\\n , \n , .

JavaScript
JavaScript . . , .

146

3.

, RegExp
, , .
, .

PHP
, preg PHP, . JavaScript Perl, .
.
ereg mb_ereg. Perl - PCRE
PHP .
Perl. , ,
Perl /regex/, preg PHP
/regex/. Perl, . -
, . , -, . ,
, .
, ,
Perl JavaScript Ruby.
PHP .
( ),
, , . .
,
.

Perl
Perl .

, . -

147

3.1.

. - , , , $ @, ,
.

,
, m. (, ),
, m{regex}. , . .
. , , ,
.
.
m s . : s[regex]
[replace].
: s/regex/replace/.
Perl .
m/I am $name/ $name Jan, IamJan . ,
Perl $ ,

.

,
( 2.5).
. Perl
, ,
, ,
, , , . m/^regex$/ , , regex .

@ , Perl .
,
Perl,
.

148

3.

Python
re Python
.
, Python. , , .
(raw) .
Python -
. . r\d+ , \\d+,
.
, , . ,
, . ,
Python .
.

, 2.7, .
, u.
, \n.
. re. , . \n -
.
re.VERBOSE ( 3.4) , 2.18, \n
\\n r\n . \n , . \\n
r\n , \n , .

, r\n, , . -

3.2.

149

, \n ,
, .

Ruby
Ruby .
. , .
, r% .
, Regex
, , .
, .
Ruby JavaScript, , Ruby
Regexp, JavaScript
RegExp, camel caps.

.
2.3 ,
, .
3.4 , , .

3.2.

, .

150

3.

, ,
.

C#
using System.Text.RegularExpressions;

VB.NET
Imports System.Text.RegularExpressions

Java
import java.util.regex.*;

Python
import re

. -, .
, .
. .

C#

C# using, ,
. , System.Text.RegularExpressions.Regex()
Regex().

VB.NET

VB.NET Imports,
, -

3.3.

151

. , System.Text.RegularExpressions.
Regex() Regex().

Java
Java , java.
util.regex.

JavaScript
JavaScript .

PHP
preg PHP
4.2.0 .

Perl
Perl
.

Python
Python, re.

Ruby
Ruby .

3.3.

-
, .

C#
, :
Regex regexObj = new Regex(regex pattern);

152

3.


( UserInput):
try {
Regex regexObj = new Regex(UserInput);
} catch (ArgumentException ex) {
//
}

VB.NET
, :
Dim RegexObj As New Regex(regex pattern)


( UserInput):
Try
Dim RegexObj As New Regex(UserInput)
Catch ex As ArgumentException

End Try

Java
, :
Pattern regex = Pattern.compile(regex pattern);


( userInput):
try {
Pattern regex = Pattern.compile(userInput);
} catch (PatternSyntaxException ex) {
//
}


, Matcher:
Matcher regexMatcher = regex.matcher(subjectString);

, Matcher, , :
regexMatcher.reset(anotherSubjectString);

3.3.

153

JavaScript
:
var myregexp = /regex pattern/;

,
userinput:
var myregexp = new RegExp(userinput);

Perl
$myregex = qr/regex pattern/

,
$userinput:
$myregex = qr/$userinput/

Python
reobj = re.compile(regex pattern)

,
userinput:
reobj = re.compile(userinput)

Ruby
:
myregexp = /regex pattern/;

,
userinput:
myregexp = Regexp.new(userinput);

, .
. , , , ,
. , , . , -

154

3.

, ,
.

.NET
C# VB.NET System.Text.RegularExpressions.Regex.
: .
, Regex() ArgumentException. , . , ,

.
, , ,
.
, . , , ,
, .

, Regex. . Regex, , Regex, .
- ,
Regex, . Regex , , 15 . , Regex.CacheSize.
.
.
, , , .

3.3.

155

Java
Java Pattern
. Pattern.compile(), : .
,
Pattern.compile() PatternSyntaxException.
, .
,
,
. , , ,
.
,
. , , ,
, .
, Pattern , String. , ,
. .

.
Pattern .
Matcher. Matcher,
matcher() .
matcher() .
matcher() , . ,
, ,
. Pattern
Matcher . , Pattern.compile()
.

156

3.

, Matcher,
reset(), . , Matcher.
reset() Matcher, , , ,
regexMatcher.reset(nextString).find().

JavaScript
, 3.2,
. , .
(, , ), RegExp(). ,

. JavaScript RegExp , .
, JavaScript
.

, , . .

PHP
PHP . ,
- , preg .
preg 4 096 . , , ,
, , , ,

3.3.

157

Perl
Perl qr// . , ,
3.1, , m,
qr.
Perl .
qr//
. 3.5.
qr// , (,
). qr/$regexstring/
,
$regexstring.
m/$regexstring/ , m/$regexstring/o . /o 3.4.

Python
Python, re, compile(), .

, compile(). , re,
compile(), .
compile() 100 . 100 . , .
, ,
re. , compile().

158

3.

Ruby
, 3.2, .
,
.
(, , ), Regexp.new()
Regexp.compile(). , . Ruby Regexp ,
.
, Ruby .
,
,
.

.


CIL
C#
Regex regexObj = new Regex(regex pattern, RegexOptions.Compiled);

VB.NET
Dim RegexObj As New Regex(regex pattern, RegexOptions.Compiled)

Regex .NET , . 154. Regex() RegexOptions.


Compiled, Regex :
CIL, MSIL. CIL Common Intermediate Language ( ). , , C# Visual Basic.

3.4.

159

.NET CIL. , .NET


CIL ,
.

RegexOptions.Compiled ,
10 , ,
.
, , . ,
CIL , . CIL .
RegexOptions.Compiled ,
, , ,
. ,
.

.
3.1, 3.2 3.4.

3.4.

: , ,
^ $ .

C#
Regex regexObj = new Regex(regex pattern,
RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase |
RegexOptions.Singleline | RegexOptions.Multiline);

160

3.

VB.NET
Dim RegexObj As New Regex(regex pattern,
RegexOptions.IgnorePatternWhitespace Or RegexOptions.IgnoreCase Or
RegexOptions.Singleline Or RegexOptions.Multiline)

Java
Pattern regex = Pattern.compile(regex pattern,
Pattern.COMMENTS | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE |
Pattern.DOTALL | Pattern.MULTILINE);

JavaScript
:
var myregexp = /regex pattern/im;

, :
var myregexp = new RegExp(userinput, im);

PHP
regexstring = /regex pattern/simx;

Perl
m/regex pattern/simx;

Python
reobj = re.compile(regex pattern,
re.VERBOSE | re.IGNORECASE |
re.DOTALL | re.MULTILINE)

Ruby
:
myregexp = /regex pattern/mix;

, :
myregexp = Regexp.new(userinput,
Regexp::EXTENDED or Regexp::IGNORECASE or
Regexp::MULTILINE);

, ,
, ,
.
. ,

3.4.

161

, , . .
,
. .

, .

.NET
Regex() . RegexOptions.
: RegexOptions.IgnorePatternWhitespace
: RegexOptions.IgnoreCase
: RegexOptions.Singleline
^ $ : RegexOptions.Multiline

Java
Pattern.compile() . Pattern
.
,
|.
: Pattern.COMMENTS
: Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE
: Pattern.DOTALL
^ $ : Pattern.MULTILINE
,
, , , .
Pattern.CASE_INSENSITIVE,

A Z.
, . , Pattern.UNICODE_CASE,
,
, -

162

3.

ASCII. (?i) ASCII, (?iu


) .

JavaScript
, - JavaScript, RegExp , . /i /m,
.
.
RegExp()
. , . .
: JavaScript
: /i
: JavaScript
^ $ : /m

PHP
3.1 , preg PHP ,
, , .
, .
,
,
. /x, ,
.
: /x
: /i
: /s
^ $ : /m

Perl

-

3.4.

163

.
/x, ,
.
: /x
: /i
: /s
^ $ : /m

Python
compile() ( )
. |,
, re. re, , .
.
. .
,
, .
, ,
Perl.
: re.VERBOSE re.X
: re.IGNORECASE re.I
: re.DOTALL re.S
^ $ : re.MULTILINE re.M

Ruby
- Ruby, Regexp , . /i /m,
. .
Regexp.new()
. nil, , Regexp, or.

164

3.

: /r Regexp::EXTENDED
:/i Regexp::IGNORECASE
: /m Regexp::MULTILINE.
Ruby m multi line, ,
s single line.
^ $ : c Ruby .
.
\A \Z .

,

.NET
RegexOptions.ExplicitCapture , , . (group) (?:group) .
,
, (?:group) .
RegexOptions.ExplicitCapture (?n) . 2.9, 2.11 .

.NET JavaScript, , RegexOptions.ECMAScript.


JavaScript ASP.NET.
, , ,
\w \d ASCII,
JavaScript.

Java
Java Pattern.CANON_EQ,
. . 81,
. -

165

3.4.

, . , \u00E0 \u00E0, \u0061\u0300, .



. \u00E0 \u0061\u0300.
, .

, Pattern.UNIX_LINES
\n , ,
. , .

JavaScript

,
, , /g,
global ().

PHP
/u PCRE ,
UTF-8.
, \p{FFFF} \p{L} .
2.7. PCRE , .

/U , . .* , .*? . /U .* , .*? .

,
, /U, PHP. ,
/U /u, - . .

Perl

(, -

166

3.

, ),
/g, global ().
, m/I am
$name/, Perl
, $name . /o.
m/I am $name/o
, . $name , . , , 3.3.

Python
Python , ( 2.6)
\w , \d \s , ( 2.3). ASCII, .

re.LOCALE, re.L, . , ,
. , , , .
re.UNICODE, re.U,
. , , , . , , , .

Ruby
Regexp.new()
, , . ,
, .
.
,
. .
:

3.4.

167

n
None (). , . ASCII.
e
EUC .
s
Shift-JIS.
u
UTF-8,
, ( ,
).
, /n, /e, /s /u.
. /x, /i /m.
/s Ruby Perl, Java .NET. Ruby
/s Shift-JIS. Perl

.
Ruby /m.

.
, , 2. .
: 2.18
: . 51 2.1
: 2.4
^ $ : 2.5
3.1 3.3 . .

168

3.

3.5.

,
. . , regexpattern The regex pattern can be found. - , , .

C#

:
bool foundMatch = Regex.IsMatch(subjectString, regex pattern);

, :
bool foundMatch = false;
try {
foundMatch = Regex.IsMatch(subjectString, UserInput);
} catch (ArgumentNullException ex) {
// null
//
} catch (ArgumentException ex) {
//
}

, Regex:
Regex regexObj = new Regex(regex pattern);
bool foundMatch = regexObj.IsMatch(subjectString);

, Regex :
bool foundMatch = false;
try {
Regex regexObj = new Regex(UserInput);
try {
foundMatch = regexObj.IsMatch(subjectString);
} catch (ArgumentNullException ex) {

3.5.

169

// null
//
}
} catch (ArgumentException ex) {
//
}

VB.NET

:
Dim FoundMatch = Regex.IsMatch(SubjectString, regex pattern)

, :
Dim FoundMatch As Boolean
Try
FoundMatch = Regex.IsMatch(SubjectString, UserInput)
Catch ex As ArgumentNullException
Nothing

Catch ex As ArgumentException

End Try

, Regex:
Dim RegexObj As New Regex(regex pattern)
Dim FoundMatch = RegexObj.IsMatch(SubjectString)

IsMatch() SubjectString , RegexObj,


Regex:
Dim FoundMatch = RegexObj.IsMatch(SubjectString)

, Regex :
Dim FoundMatch As Boolean
Try
Dim RegexObj As New Regex(UserInput)
Try
FoundMatch = Regex.IsMatch(SubjectString)
Catch ex As ArgumentNullException
Nothing

170

3.
End Try
Catch ex As ArgumentException

End Try

Java

, Matcher:
Pattern regex = Pattern.compile(regex pattern);
Matcher regexMatcher = regex.matcher(subjectString);
boolean foundMatch = regexMatcher.find();

, :
boolean foundMatch = false;
try {
Pattern regex = Pattern.compile(UserInput);
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.find();
} catch (PatternSyntaxException ex) {
//
}

JavaScript
if (/regex pattern/.test(subject)) {
//
} else {
//
}

PHP
if (preg_match(/regex pattern/, $subject)) {
#
} else {
#
}

Perl
$_:
if (m/regex pattern/) {
#
} else {
#
}

3.5.

171

$subject:
if ($subject =~ m/regex pattern/) {
#
} else {
#
}

, :
$regex = qr/regex pattern/;
if ($subject =~ $regex) {
#
} else {
#
}

Python

:
if re.search(regex pattern, subject):
#
else:
#

, :
reobj = re.compile(regex pattern)
if reobj.search(subject):
#
else:
#

Ruby
if subject =~ /regex pattern/
#
else
#
end

:
if /regex pattern/ =~ subject
#
else
#
End

172

3.


.
,
, .
, , - . , .
,
.

. , (,
), .

C# VB.NET
Regex
IsMatch(), . IsMatch() . . , . null.
IsMatch() ArgumentNullException.
,
Regex.IsMatch(), Regex.
,
. ,
IsMatch() ArgumentException. , true , , false
.

, , Regex, IsMatch() . . , . , , , .

3.5.

173

, .
. IsMatch()
ArgumentOutOfRangeException.
, , .
IsMatch() , . ,
Regex.Match(subject, start, stop) Success Match.
3.8.

Java
, Matcher,
3.3. find()
Matcher.
String.matches(), Pattern.matches() Matcher.matches() , , .

JavaScript
, test() .
.
regexp.test() true, , false
.

PHP
preg_match() . : , .
, preg_match() 1,
,
0.
,
preg_match(), .

174

3.

Perl
Perl m//
, . //m
$_.
m// , =~,
. true,
,
false .
, !~,
=~.

Python
re search(), . ,
.
.
re.search() re.compile() search() .
: .
, search()
MatchObject. ,
search() None. if MatchObject ,
True, None , False. , , MatchObject.
search() match().
match() . match()
.

Ruby
=~ .
, . . , nil.

3.6.

175

, Regexp String.
Ruby 1.8 ,
. Ruby 1.9
, . 3.9.

Ruby
=~, .
Perl, Ruby =~,
Ruby 1.9,
, - .

.
3.6 3.7.

3.6.

,
. , , , . , ,
regexpattern , , regex pattern, The regex pattern can be found.

C#

:
bool foundMatch = Regex.IsMatch(subjectString, @\Aregex pattern\Z);

Regex:
Regex regexObj = new Regex(@\Aregex pattern\Z);
bool foundMatch = regexObj.IsMatch(subjectString);

176

3.

VB.NET

:
Dim FoundMatch = Regex.IsMatch(SubjectString, \Aregex pattern\Z)

Regex:
Dim RegexObj As New Regex(\Aregex pattern\Z)
Dim FoundMatch = RegexObj.IsMatch(SubjectString)

IsMatch() SubjectString , RegexObj,


Regex:
Dim FoundMatch = RegexObj.IsMatch(SubjectString)

Java
:
boolean foundMatch = subjectString.matches(regex pattern);


:
Pattern regex = Pattern.compile(regex pattern);
Matcher regexMatcher = regex.matcher(subjectString);
boolean foundMatch = regexMatcher.matches(subjectString);

JavaScript
if (/^regex pattern$/.test(subject)) {
//
} else {
//
}

PHP
if (preg_match(/\Aregex pattern\Z/, $subject)) {
#
} else {
#
}

Perl
if ($subject =~ m/\Aregex pattern\Z/) {
#

177

3.6.
} else {
#
}

Python

:
if re.match(rregex pattern\Z, subject):
#
else:
#


:
reobj = re.compile(rregex pattern\Z)
if reobj.match(subject):
#
else:
#

Ruby
if subject =~ /\Aregex pattern\Z/
#
else
#
End

,
- . , - ,
. , , , , .
IP- , .

, $ \Z ,
( 3.21),

. 2.5,
, .

178

3.

C# VB.NET
Regex .NET , , .
,
\A , ,
\Z , . ,
. , one|two|three ,
: \A(?:one|two|three)\Z .

, , IsMatch(), .

Java
Java matches(). , . ,
.
matches() String
. true false
.
matches() Pattern : , . Pattern.matches() CharSequence.
String.matches() Pattern.matches().
, String.matches() Pattern.matches(),
, Pattern.
compile(regex).matcher(subjectString).matches(). , (, )
.
. , PatternSyntaxException.

-

179

3.6.

Matcher, 3.3. matches() Matcher. , .

JavaScript
JavaScript , ,
.
,
^ ,
$ . /m . /m
. /m .


regexp.test(), .

PHP
PHP , , . ,
\A , , \Z , . , . ,
one|two|three ,
: \A(?:one|two|three)\Z .

, ,
preg_match(), .

Perl
Perl ,
. ,
,
\A , , \Z , . , . ,

180

3.

one|two|three ,
: \A(?:one|two|three)\Z .

, , ,
.

Python
match() search(),
. ,
match() , . , match() None. search(), ,
, ,
.
match() , . , , .
,
, \Z , .

Ruby
Regex Ruby , ,
.
,
\A , ,
\Z , . , .
, one|two|three , : \A(?:one|two|three)\Z .

, , =~, .

.
, ,
2.5.
2.8 2.9 .

181

3.7.

- , . ,
, , .
, 3.5.

3.7.

,
. . , . , \d+ Do you like 13 or 42?
13.

C#

:
string resultString = Regex.Match(subjectString, @\d+).Value;

, :
string resultString = null;
try {
resultString = Regex.Match(subjectString, @\d+).Value;
} catch (ArgumentNullException ex) {
// null
//
} catch (ArgumentException ex) {
//
}

, Regex:
Regex regexObj = new Regex(@\d+);
string resultString = regexObj.Match(subjectString).Value;

182

3.

, Regex :
string resultString = null;
try {
Regex regexObj = new Regex(@\d+);
try {
resultString = regexObj.Match(subjectString).Value;
} catch (ArgumentNullException ex) {
// null
//
}
} catch (ArgumentException ex) {
//
}

VB.NET

:
Dim ResultString = Regex.Match(SubjectString, \d+).Value

, :
Dim ResultString As String = Nothing
Try
ResultString = Regex.Match(SubjectString, \d+).Value
Catch ex As ArgumentNullException
Nothing

Catch ex As ArgumentException

End Try

, Regex:
Dim RegexObj As New Regex(\d+)
Dim ResultString = RegexObj.Match(SubjectString).Value

, Regex :
Dim ResultString As String = Nothing
Try
Dim RegexObj As New Regex(\d+)
Try
ResultString = RegexObj.Match(SubjectString).Value

3.7.

183

Catch ex As ArgumentNullException
Nothing

End Try
Catch ex As ArgumentException

End Try

Java
,
Matcher:
String resultString = null;
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
resultString = regexMatcher.group();
}

, :
String resultString = null;
try {
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
resultString = regexMatcher.group();
}
} catch (PatternSyntaxException ex) {
//
}

JavaScript
var result = subject.match(/\d+/);
if (result) {
result = result[0];
} else {
result = ;
}

PHP
if (preg_match(/\d+/, $subject, $groups)) {
$result = $groups[0];
} else {
$result = ;
}

184

3.

Perl
if ($subject =~ m/\d+/) {
$result = $&;
} else {
$result = ;
}

Python

:
matchobj = re.search(regex pattern, subject)
if matchobj:
result = matchobj.group()
else:
result =

, :
reobj = re.compile(regex pattern)
matchobj = reobj.search(subject)
if match:
result = matchobj.group()
else:
result =

Ruby
=~ $&:
if subject =~ /regex pattern/
result = $&
else
result =
end

, match() Regexp:
matchobj = /regex pattern/.match(subject)
if matchobj
result = matchobj[0]
else
result =
end

3.7.

185

, ,
, . , , . ,
.

.NET
Regex .NET -,
, .
Match(), Match.
Match Value, , .
,
Match, Value
.
Match(). ,
. null. Match()
ArgumentNullException.
, .
. .
, ArgumentException.

, , Regex
Match() . . ,
. , ,
. , . .
Match() ArgumentOutOfRangeException.

186

3.

, , . ( ) ( ). , regexObj.Match(123456, 3, 2) 45. , , Match() ArgumentOutOfRangeException. , , : IndexOutOfRangeException. ,


Match(), .
,
, .

Java
, ,
Matcher,
3.3, find() . find() true, group() , , .
find() false, group() , IllegalStateException.
Matcher.find() . , ,
. , .

, IndexOutOfBoundsException.
, find() , ,
. find() Pattern.
matcher() Matcher.reset(), .

JavaScript
string.match()
. ,
. , string.match() regexp.

3.7.

187

, string.match()
null. ,
.
, ,
null
.
, string.match()
.
, .
/g. string.match()
, 3.10.

PHP
preg_match(), , , , . preg_match() 1, . . 3.9.

Perl
m// , .
, $&, , .
.

Python
3.5 search().
MatchObject, search(), . , , group() .

Ruby
3.8 $~ MatchData. , .
, .

188

3.

$& , .
$~[0],
.

.
3.5, 3.8, 3.9, 3.10 3.11.

3.8.

, , , .

, .

C#

:
int matchstart, matchlength = -1;
Match matchResult = Regex.Match(subjectString, @\d+);
if (matchResult.Success) {
matchstart = matchResult.Index;
matchlength = matchResult.Length;
}

Regex:
int matchstart, matchlength = -1;
Regex regexObj = new Regex(@\d+);
Match matchResult = regexObj.Match(subjectString).Value;
if (matchResult.Success) {
matchstart = matchResult.Index;
matchlength = matchResult.Length;
}

VB.NET

:

3.8.

189

Dim MatchStart = -1
Dim MatchLength = -1
Dim MatchResult = Regex.Match(SubjectString, \d+)
If MatchResult.Success Then
MatchStart = MatchResult.Index
MatchLength = MatchResult.Length
End If

Regex:
Dim MatchStart = -1
Dim MatchLength = -1
Dim RegexObj As New Regex(\d+)
Dim MatchResult = Regex.Match(SubjectString, \d+)
If MatchResult.Success Then
MatchStart = MatchResult.Index
MatchLength = MatchResult.Length
End If

Java
int matchStart, matchLength = -1;
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
matchStart = regexMatcher.start();
matchLength = regexMatcher.end() - matchStart;
}

JavaScript
var matchstart = -1;
var matchlength = -1;
var match = /\d+/.exec(subject);
if (match) {
matchstart = match.index;
matchlength = match[0].length;
}

PHP
if (preg_match(/\d+/, $subject, $groups, PREG_OFFSET_CAPTURE)) {
$matchstart = $groups[0][1];
$matchlength = strlen($groups[0][0]);
}

Perl
if ($subject =~ m/\d+/g) {
$matchlength = length($&);

190

3.
$matchstart = length($`);
}

Python

:
matchobj = re.search(r\d+, subject)
if matchobj:
matchstart = matchobj.start()
matchlength = matchobj.end() - matchstart


:
reobj = re.compile(r\d+)
matchobj = reobj.search(subject)
if matchobj:
matchstart = matchobj.start()
matchlength = matchobj.end() - matchstart

Ruby
=~ $~:
if subject =~ /regex pattern/
matchstart = $~.begin()
matchlength = $~.end() - matchstart
end

, match() Regexp:
matchobj = /regex pattern/.match(subject)
if matchobj
matchstart = matchobj.begin()
matchlength = matchobj.end() - matchstart
end

.NET

Regex.Match(), .
Index Length Match, Regex.Match().
Index , . -

191

3.8.

, Index . ,
Index . Index . , .
, , \Z , .

Length . , . , , \b , .

, Regex.
Match() Match, Index Length . . ,
\A , . Match.Index Match.Length, ,
.
Match.Success.

Java
, Matcher.find(), . find() true, Matcher.start()
, . end() , . , , , . start()
end(), find(),
IllegalStateException.

JavaScript
exec() regexp .
. index , . , index
. . ,
length.

192

3.

,
regexp.exec() null.
lastIndex ,
exec(), . JavaScript lastIndex , regexp. regexp.lastIndex .
- ( 3.11). ,
, ,
match.index match[0].length.

PHP
, ,
, preg_
match(). , PREG_OFSET_CAPTURE. ,
preg_match() , 1.
, , , . PREG_OFSET_
CAPTURE, . ( ), , , ( ). : , .
,
, .
strlen(), . 1 , , .

Perl
, $&, . , $`, , .

3.8.

193

Python
start() MatchObject , . end()
, . , .
start() end() ,
. , start(1) , end(2) . Python
99 . 0 . start() end()
, , ( 99), IndexError. , , , start() end() -1.
, span() .

Ruby
3.5 =~. $~ MatchData.
.
,
=~; , -
- .
, match() Regexp.
. ,
MatchData nil . ,
MatchData $~, MatchData, . MatchData . 3.7
3.9 , , .
begin() , . end() -

194

3.

, . offset() . . , , 0.
, ,
. , begin(1) .
length() size(), . ,
MatchData ,
, 3.9.

.
3.5 3.9.

3.9.

3.7, ,
,
.
, , 2.9.
, Please visit http://www.regexcookbook.com for more
information http://([a-z0-9.-]+)
http://www.regexcookbook.com. ,
,
www.regexcookbook.com. , , .


. 7 , URL.

C#

:
string resultString = Regex.Match(subjectString,
http://([a-z0-9.-]+)).Groups[1].Value;

3.9.

195

Regex:
Regex regexObj = new Regex(http://([a-z0-9.-]+));
string resultString = regexObj.Match(subjectString).Groups[1].Value;

VB.NET

:
Dim ResultString = Regex.Match(SubjectString,
http://([a-z0-9.-]+)).Groups(1).Value

Regex:
Dim RegexObj As New Regex(http://([a-z0-9.-]+))
Dim ResultString = RegexObj.Match(SubjectString).Groups(1).Value

Java
String resultString = null;
Pattern regex = Pattern.compile(http://([a-z0-9.-]+));
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
resultString = regexMatcher.group(1);
}

JavaScript
var result = ;
var match = /http:\/\/([a-z0-9.-]+)/.exec(subject);
if (match) {
result = match[1];
} else {
result = ;
}

PHP
if (preg_match(%http://([a-z0-9.-]+)%, $subject, $groups)) {
$result = $groups[1];
} else {
$result = ;
}

Perl
if ($subject =~ m!http://([a-z0-9.-]+)!) {
$result = $1;
} else {

196

3.
$result = ;
}

Python

:
matchobj = re.search(http://([a-z0-9.-]+), subject)
if matchobj:
result = matchobj.group(1)
else:
result =


:
reobj = re.compile(http://([a-z0-9.-]+))
matchobj = reobj.search(subject)
if match:
result = matchobj.group(1)
else:
result =

Ruby
=~ , $1:
if subject =~ %r!http://([a-z0-9.-]+)!
result = $1
else
result =
end

, match() Regexp:
matchobj = %r!http://([a-z0-9.-]+)!.match(subject)
if matchobj
result = matchobj[1]
else
result =
end

2.10 2.21 ,
,

3.9.

197

.
, .
.
. , ,
,
, ,
.
. , , , , . , , , .

.NET
, - Regex.Match(), 3.7. Match
Groups. GroupCollection. , . Groups[1] , Groups[2]
, .
Groups Group . Group , Match,
Groups. Match.Groups[1].Value
, ,
, Match.Value
. Match.Groups[1].Index
Match.Groups[1].Length , . Index Length 3.8.
Groups[0] , . Match.Value Match.Groups[0].Value .
Groups
. ,
Groups[-1] Group, -1. Success. Groups[-1].Success false.

198

3.

, Match.Groups.Count. Count
, Count
.NET: , , . Groups Groups[0] Groups[1].
Groups.Count 2.

Java
,
, , ,

. group(), start() end() Matcher
. ,
, , .
.
, . , , IndexOutOfBoundsException. , , group(n) null,
start(n) end(n) -1.

JavaScript
, exec() .
. , , ,
.
, regexp.exec() null.

PHP
3.7 , , , preg_match().
preg_match() 1, . .

3.9.

199

, , .
.
.
preg_match() PREG_OFFSET_CAPTURE, , -
. .
. ,
.

Perl
m// ,
. $1, $2, $3 , , .

Python
,
3.7. group() . group(1)
, , group(2)
, . Python 99 . 0
. , , group()
IndexError. , , group()
None.
group() ,
. .
, groups() MatchObject. None , . groups()
, None
, .

200

3.

, groups()
groupdict(). groupdict() ,
None , .

Ruby
3.8 $~ MatchData. , ,
, . 1,
. .
$1, $2 , . $1 $~[1];

. $2 .


,
.

C#

:
string resultString = Regex.Match(subjectString,
http://(?<domain>[a-z0-9.-]+)).Groups[domain].Value;

Regex:
Regex regexObj = new Regex(http://(?<domain>[a-z0-9.-]+));
string resultString = regexObj.Match(subjectString).Groups[domain].Value;

C# Group
. Groups , . , .NET , . Match.
Groups[nosuchgroup].Success false.

3.9.

201

VB.NET

:
Dim ResultString = Regex.Match(SubjectString,
http://(?<domain>[a-z0-9.-]+)).Groups(domain).Value

Regex:
Dim RegexObj As New Regex(http://(?<domain>[a-z0-9.-]+))
Dim ResultString = RegexObj.Match(SubjectString).Groups(domain).Value

VB.NET Group
. Groups , . , .NET , . Match.
Groups[nosuchgroup].Success false.

PHP
if (preg_match(%http://(?P<domain>[a-z0-9.-]+)%, $subject, $groups)) {
$result = $groups[domain];
} else {
$result = ;
}


, $groups .

. , .
$groups[0] , $groups[1] $groups[domain]
, .

Perl
if ($subject =~ !http://(?<domain>[a-z0-9.-]+)%!) {
$result = $+{domain};
} else {
$result = ;
}

Perl,
5.10. $+
. Perl

202

3.

. $1 $+{name}
, .

Python
matchobj = re.search(http://(?P<domain>[a-z0-9.-]+), subject)
if matchobj:
result = matchobj.group(domain)
else:
result =


, group() .

.
2.9, .
2.11, .

3.10.

, . , ,
.
. , The lucky numbers are 7, 13, 16, 42, 65, and 99 \d+ : 7, 13,
16, 42, 65 99.

,
, .

C#
:
MatchCollection matchlist = Regex.Matches(subjectString, @\d+);

Regex:
Regex regexObj = new Regex(@\d+);
MatchCollection matchlist = regexObj.Matches(subjectString);

3.10.

203

VB.NET
:
Dim matchlist = Regex.Matches(SubjectString, \d+)

Regex:
Dim RegexObj As New Regex(\d+)
Dim MatchList = RegexObj.Matches(SubjectString)

Java
List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
resultList.add(regexMatcher.group());
}

JavaScript
var list = subject.match(/\d+/g);

PHP
preg_match_all(/\d+/, $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

Perl
@result = $subject =~ m/\d+/g;

,
, . 2.9.

Python
:
result = re.findall(r\d+, subject)


:
reobj = re.compile(r\d+)
result = reobj.findall(subject)

204

3.

Ruby
result = subject.scan(/\d+/)

.NET
Matches() Regex , . MatchCollection, .
.
. null. Matches()
ArgumentNullException.
, Matches(). , . .
, Regex
Matches().
. , . , , , . , . . Matches()
ArgumentOutOfRangeException.
, , .
Matches(), , . Regex.Match(subject, start, stop) ,
,
.

Java
Java , . , -

3.10.

205

3.7. find() if, while.


List ArrayList, , import java.util.*; .

JavaScript
string.match(), ,
3.7. , : /g. 3.4.
/g match() . list[0]
, list[1] . list.length. string.match() null, .
.
/g string.match() .
,
, 3.11.

PHP
PHP preg_match(), . preg_match_all() . , , .
preg_match_all()
preg_match(): , , , , . , , .
PREG_PATTERN_ORDER, PREG_SET_ORDER.
, PREG_PATTERN_
ORDER.
PREG_PATTERN_ORDER
, , . , -

206

3.

. preg_match(). , , , preg_match(),
, preg_match_all().
, preg_match_all().
, ,
PREG_PATTERN_ORDER .
, PREG_PATTERN_ORDER
. , preg_match_
all(%http://([a-z0-9.-]+)%, $subject, $result) $result[1] URL, .
PREG_SET_ORDER,
, . , preg_match_all(). , , . PREG_SET_ORDER, $result[0] , preg_match().
PREG_OFFSET_CAPTURE PREG_
PATTERN_ORDER PREG_SET_ORDER. , PREG_OFFSET_CAPTURE preg_match()
.
, , .

Perl
3.4 , /g,

. /g .
-, .
-
, . , . -

3.10.

207

,
. , . 2.9.

Python

findall() re. , . .
re.findall() re.compile(), findall() . : .
findall() , re.findall().
,
findall() . ,
findall() . , .
,
.
,
findall(), . , ,

.

Ruby
scan() String
. . scan() ,
.
, scan() . .
scan() .
. -

208

3.

, . .
, .
Ruby scan() , . , .

.
3.7, 3.11 3.12.

3.11.

,
, .
.

C#
:
Match matchResult = Regex.Match(subjectString, @\d+);
while (matchResult.Success) {
// ,
// matchResult
matchResult = matchResult.NextMatch();
}

Regex:
Regex regexObj = new Regex(@\d+);
matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
// ,
// matchResult
matchResult = matchResult.NextMatch();
}

3.11.

209

VB.NET
:
Dim MatchResult = Regex.Match(SubjectString, \d+)
While MatchResult.Success
,
matchResult
MatchResult = MatchResult.NextMatch
End While

Regex:
Dim RegexObj As New Regex(\d+)
Dim MatchResult = RegexObj.Match(SubjectString)
While MatchResult.Success
,
matchResult
MatchResult = MatchResult.NextMatch
End While

Java
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// ,
// regexMatcher
}

JavaScript
,
, exec() :
var regex = /\d+/g;
var match = null;
while (match = regex.exec(subject)) {
// , Firefox,
//
if (match.index == regex.lastIndex) regex.lastIndex++;
// ,
// match
}

210

3.

, ,
:
var regex = /\d+/g;
var match = null;
while (match = regex.exec(subject)) {
// ,
// match
}

PHP
preg_match_all(/\d+/, $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# = $result[0][$i];
}

Perl
while ($subject =~ m/\d+/g) {
# = $&
}

Python
:
for matchobj in re.finditer(r\d+, subject):
# ,
# matchobj

:
reobj = re.compile(r\d+)
for matchobj in reobj.finditer(subject):
# ,
# matchobj

Ruby
subject.scan(/\d+/) {|match|
# ,
# match
}

3.11.

211

.NET
3.7 , - Match() Regex .
, ,
Match() . Match() Match,
matchResult.
Success matchResult true, .
Match . 3.7
Value, 3.8 Index Length 3.9
- Groups.
, NextMatch() matchResult. Match.
NextMatch() Match, Regex.
Match(). .
, matchResult.
NextMatch(), matchResult
. matchResult.Success, , NextMatch() . NextMatch()
, Match, Success false.
matchResult, while
NextMatch().
NextMatch() Match, .
Match .
NextMatch() .
, Regex.
Match(). Match .
Regex.Match() ,

. Regex.Match() , Match
.
Match.NextMatch()

212

3.

, Match,
Regex.Match(). Regex , Regex.Match()
,
.

Java
Java . while find(), 3.7. find() , Matcher, ,
.

JavaScript
, /g.
3.4. while (regexp.exec()) , regexp = /\d+/g.
/\d+/, while (regexp.exec()) ,
.
, while (/\d+/g.exec()) ( /g)
, ,
JavaScript, while.
, . ,
.
, regexp.exec(),
3.8 3.9. , exec() . , .
/g
lastIndex regexp, exec() . , .
exec() lastIndex. lastIndex
, .
lastIndex . ECMA-262v3 JavaScript ,
exec() lastIndex -

3.11.

213

, . , , , ,
.
, ( JavaScript), ,

, .
3.8 lastIndex ,
Internet Explorer .
Firefox ECMA-262v3, , regexp.
exec() .
. , re = /^.*$/gm; while (re.exec()),
, Firefox .
, 1
lastIndex, exec() .
, JavaScript. , , .
string.
match() ( 3.14)
string.replace() ( 3.10). , lastIndex, ECMA-262v3 , lastIndex 1 .

PHP
preg_match() ,
, . 3.8 preg_match()
$matchstart + $matchlength
,
, preg_match() 0.
3.18.

preg_match() -

. preg_match()

214

3.


.
preg_match_all(), ,
.

Perl
3.4 , /g, . /g
, .
while . ,
$& ( 3.7),
while.

Python
finditer() re ,
. , . , .
re.finditer() re.compile(), finditer() . : .
finditer() ,
re.finditer(). ,
finditer() . , . , . , .

Ruby
scan() String
. , .
,
.
.

3.12.

215


, .
, . . , .
subject.scan(/(a)(b)(c)/) {|a, b, c|
# a, b c
}

, , , .
, ,
nil.
,
,
. .
, :
subject.scan(/(a)(b)(c)/) {|abc|
# abc[0], abc[1] abc[2]
#
}

.
3.7, 3.8, 3.10 3.12.

3.12.

3.10 , , , .
, , ( ) .
, , 13.

216

3.

C#
:
StringCollection resultList = new StringCollection();
Match matchResult = Regex.Match(subjectString, @\d+);
while (matchResult.Success) {
if (int.Parse(matchResult.Value) % 13 == 0) {
resultList.Add(matchResult.Value);
}
matchResult = matchResult.NextMatch();
}

Regex:
StringCollection resultList = new StringCollection();
Regex regexObj = new Regex(@\d+);
matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
if (int.Parse(matchResult.Value) % 13 == 0) {
resultList.Add(matchResult.Value);
}
matchResult = matchResult.NextMatch();
}

VB.NET
:
Dim ResultList = New StringCollection
Dim MatchResult = Regex.Match(SubjectString, \d+)
While MatchResult.Success
If Integer.Parse(MatchResult.Value) Mod 13 = 0 Then
ResultList.Add(MatchResult.Value)
End If
MatchResult = MatchResult.NextMatch
End While

Regex:
Dim ResultList = New StringCollection
Dim RegexObj As New Regex(\d+)
Dim MatchResult = RegexObj.Match(SubjectString)
While MatchResult.Success

3.12.
If Integer.Parse(MatchResult.Value) Mod 13 = 0 Then
ResultList.Add(MatchResult.Value)
End If
MatchResult = MatchResult.NextMatch
End While

Java
List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (Integer.parseInt(regexMatcher.group()) % 13 == 0) {
resultList.add(regexMatcher.group());
}
}

JavaScript
var list = [];
var regex = /\d+/g;
var match = null;
while (match = regex.exec(subject)) {
// , Firefox,
//
if (match.index == regex.lastIndex) regex.lastIndex++;
// , match
if (match[0] % 13 == 0) {
list.push(match[0]);
}
}

PHP
preg_match_all(/\d+/, $subject, $matchdata, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matchdata[0]); $i++) {
if ($matchdata[0][$i] % 13 == 0) {
$list[] = $matchdata[0][$i];
}
}

Perl
while ($subject =~ m/\d+/g) {
if ($& % 13 == 0) {
push(@list, $&);
}
}

217

218

3.

Python
:
list = []
for matchobj in re.finditer(r\d+, subject):
if int(matchobj.group()) % 13 == 0:
list.append(matchobj.group())

:
list = []
reobj = re.compile(r\d+)
for matchobj in reobj.finditer(subject):
if int(matchobj.group()) % 13 == 0:
list.append(matchobj.group())

Ruby
list = []
subject.scan(/\d+/) {|match|
list << match if (Integer(match) % 13 == 0)
}

. \d+ , , ,
, .

- , , 13, , , , , .
, , . .
, , . , 13 .
, . , .

3.13.

219

.
3.7, 3.10 3.11.

3.13.

, .
,
.
, HTML,
, <b>. , . ,
<b> , . , 1 <b>2</b> 3 4 <b>5 6 7</b>
: 2, 5, 6 7.

C#
StringCollection resultList = new StringCollection();
Regex outerRegex = new Regex(<b>(.*?)</b>, RegexOptions.Singleline);
Regex innerRegex = new Regex(@\d+);
//
Match outerMatch = outerRegex.Match(subjectString);
while (outerMatch.Success) {
//
Match innerMatch = innerRegex.Match(outerMatch.Groups[1].Value);
while (innerMatch.Success) {
resultList.Add(innerMatch.Value);
innerMatch = innerMatch.NextMatch();
}
//
outerMatch = outerMatch.NextMatch();
}

VB.NET
Dim ResultList = New StringCollection
Dim OuterRegex As New Regex(<b>(.*?)</b>, RegexOptions.Singleline)
Dim InnerRegex As New Regex(\d+)

220

3.
Dim OuterMatch = OuterRegex.Match(SubjectString)
While OuterMatch.Success

Dim InnerMatch = InnerRegex.Match(OuterMatch.Groups(1).Value)
While InnerMatch.Success
ResultList.Add(InnerMatch.Value)
InnerMatch = InnerMatch.NextMatch
End While
OuterMatch = OuterMatch.NextMatch
End While

Java
, Java 4 :
List<String> resultList = new ArrayList<String>();
Pattern outerRegex = Pattern.compile(<b>(.*?)</b>, Pattern.DOTALL);
Pattern innerRegex = Pattern.compile(\\d+);
Matcher outerMatcher = outerRegex.matcher(subjectString);
while (outerMatcher.find()) {
Matcher innerMatcher = innerRegex.matcher(outerMatcher.group());
while (innerMatcher.find()) {
resultList.add(innerMatcher.group());
}
}

(
innerMatcher ), Java 5 :
List<String> resultList = new ArrayList<String>();
Pattern outerRegex = Pattern.compile(<b>(.*?)</b>, Pattern.DOTALL);
Pattern innerRegex = Pattern.compile(\\d+);
Matcher outerMatcher = outerRegex.matcher(subjectString);
Matcher innerMatcher = innerRegex.matcher(subjectString);
while (outerMatcher.find()) {
innerMatcher.region(outerMatcher.start(), outerMatcher.end());
while (innerMatcher.find()) {
resultList.add(innerMatcher.group());
}
}

JavaScript
var
var
var
var

result = [];
outerRegex = /<b>([\s\S]*?)<\/b>/g;
innerRegex = /\d+/g;
outerMatch = null;

3.13.

221

while (outerMatch = outerRegex.exec(subject)) {


if (outerMatch.index == outerRegex.lastIndex)
outerRegex.lastIndex++;
var innerSubject = subject.substr(outerMatch.index,
outerMatch[0].length);
var innerMatch = null;
while (innerMatch = innerRegex.exec(innerSubject)) {
if (innerMatch.index == innerRegex.lastIndex)
innerRegex.lastIndex++;
result.push(innerMatch[0]);
}
}

PHP
$list = array();
preg_match_all(%<b>(.*?)</b>%s, $subject, $outermatches,
PREG_PATTERN_ORDER);
for ($i = 0; $i < count($outermatches[0]); $i++) {
if (preg_match_all(/\d+/, $outermatches[0][$i], $innermatches,
PREG_PATTERN_ORDER)) {
$list = array_merge($list, $innermatches[0]);
}
}

Perl
while ($subject =~ m!<b>(.*?)</b>!gs) {
push(@list, ($& =~ m/\d+/g));
}

, ( \d+ ) , .
2.9.

Python
list = []
innerre = re.compile(r\d+)
for outermatch in re.finditer((?s)<b>(.*?)</b>, subject):
list.extend(innerre.findall(outermatch.group(1)))

Ruby
list = []
subject.scan(/<b>(.*?)<\/b>/m) {|outergroups|
list += outergroups[0].scan(/\d+/)
}

222

3.


, .
, , , , , . , .
.
. ,
, , ,
. . ,
, 1.
. HTML- <b> <b>(.*?)</b> 2.
\d+ .
, :

\d+(?=(?:(?!<b>).)*</b>)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
, . , , ; -
, . .

,
.
,
, .

,
, . JavaScript <b>
([\s\S]*?)</b> .

223

3.13.

, ,
.
, , , . , <b>(.*?)</b> ,
;
,
, .

, , , . . HTML- <b> ,
.
,
3.11. , , , ,
, .
.
, 3.10. ,
,
.
, ( )
. , .
, , .
, ,
.
, . , . , (HTML- <b>) , , , <b>. , ,
, ,
.

224

3.

.
3.8, 3.10 3.11.

3.14.


before after .

C#
:
string resultString = Regex.Replace(subjectString, before, after);

, :
string resultString = null;
try {
resultString = Regex.Replace(subjectString, before, after);
} catch (ArgumentNullException ex) {
// null
// ,
} catch (ArgumentException ex) {
//
}

Regex:
Regex regexObj = new Regex(before);
string resultString = regexObj.Replace(subjectString, after);

, Regex :
string resultString = null;
try {
Regex regexObj = new Regex(before);
try {
resultString = regexObj.Replace(subjectString, after);
} catch (ArgumentNullException ex) {

3.14.

225

// null
//
}
} catch (ArgumentException ex) {
//
}

VB.NET
:
Dim ResultString = Regex.Replace(SubjectString, before, after)

, :
Dim ResultString As String = Nothing
Try
ResultString = Regex.Replace(SubjectString, before, after)
Catch ex As ArgumentNullException
null
,
Catch ex As ArgumentException

End Try

Regex:
Dim RegexObj As New Regex(before)
Dim ResultString = RegexObj.Replace(SubjectString, after)

, Regex :
Dim ResultString As String = Nothing
Try
Dim RegexObj As New Regex(before)
Try
ResultString = RegexObj.Replace(SubjectString, after)
Catch ex As ArgumentNullException
null

End Try
Catch ex As ArgumentException

End Try

226

3.

Java
:
String resultString = subjectString.replaceAll(before, after);

, :
try {
String resultString = subjectString.replaceAll(before, after);
} catch (PatternSyntaxException ex) {
//
} catch (IllegalArgumentException ex) {
// ( $
?)
} catch (IndexOutOfBoundsException ex) {
//
}

Matcher:
Pattern regex = Pattern.compile(before);
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll(after);

, Matcher :
String resultString = null;
try {
Pattern regex = Pattern.compile(before);
Matcher regexMatcher = regex.matcher(subjectString);
try {
resultString = regexMatcher.replaceAll(after);
} catch (IllegalArgumentException ex) {
//
// ( $ ?)
} catch (IndexOutOfBoundsException ex) {
//
}
} catch (PatternSyntaxException ex) {
//
}

JavaScript
result = subject.replace(/before/g, after);

3.14.

227

PHP
$result = preg_replace(/before/, after, $subject);

Perl
$_,
$_:
s/before/after/g;

$subject, $subject:
$subject =~ s/before/after/g;

$subject, $result:
($result = $subject) =~ s/before/after/g;

Python
:
result = re.sub(before, after, subject)

:
reobj = re.compile(before)
result = reobj.sub(after, subject)

Ruby
result = subject.gsub(/before/, after)

.NET

.NET Regex.
Replace(). Replace() 10 .
.
MatchEvaluator 3.16.
, Replace(), , , -

228

3.

.
null. Replace() ArgumentNullException. Replace() , .
, . . . .
, ArgumentException.

,
, Regex, Replace() . , . .
Replace() Regex
, . ,
. Replace()
,
.
, , .
, . , Replace(subject, replacement, 3) ,
. , . , , . ,
. -1, . -1, Replace() ArgumentOutOfRangeException.
, ,
, .
, , , , . , . -

3.14.

229


. Replace() ArgumentOutOfRangeException.
Match(), Replace() ,
, .

Java

, replaceFirst() replaceAll() . :
. : Pattern.
compile(before).matcher(subjectString).replaceFirst(after) Pattern.
compile(before).matcher(subjectString).replaceAll(after).
, Matcher,
3.3, replaceFirst()
replaceAll() , .
, . PatternSyntaxException
Pattern.compile(), String.replaceFirst() String.replaceAll(), .
ArgumentException replaceFirst() replaceAll(),
.
, , IndexOutOfBoundsException.

JavaScript

, replace() .
, . replace() .
, /g. , 3.4. /g .

PHP

, preg_replace().

230

3.

, , .
, .

.
-1, . 0, . , preg_replace() , .
, .
, .
.
preg_replace() ,
.
,
preg_replace() ,
.
, preg_replace()
, . ,

. , ( ) . preg_replace()
. , , .
, ksort(),
preg_replace().
$replace :
$regex[0] =
$regex[1] =
$regex[2] =
$replace[2]
$replace[1]

/a/;
/b/;
/c/;
= 3;
= 2;

3.14.

231

$replace[0] = 1;
echo preg_replace($regex, $replace, abc);
ksort($replace);
echo preg_replace($regex, $replace, abc);

preg_replace() 321,
, .
ksort() 123, . ksort() , (true false)
preg_replace().

Perl
Perl s///
. s/// ,
$_, $_.
, =~, .
. ,
.
s/// , .
, ,
, .
Perl ,
.
,
/g, 3.4.
Perl .

Python

, sub() re. ,
, . sub() .
re.sub() re.compile() sub() . : .

232

3.

sub() ,
. ,
.
, . , . ,
.

Ruby

, gsub() String. ,
. ,
. , gsub() .
gsub() , . ,
gsub!(). , gsub!() nil.
, ,
.

.
1
3.15.

3.15.

,
. ,
, , 2.9.
, , , .

3.15.

233

C#
:
string resultString = Regex.Replace(subjectString, @(\w+)=(\w+), $2=$1);

Regex:
Regex regexObj = new Regex(@(\w+)=(\w+));
string resultString = regexObj.Replace(subjectString, $2=$1);

VB.NET
:
Dim ResultString = Regex.Replace(SubjectString, (\w+)=(\w+), $2=$1)

Regex:
Dim RegexObj As New Regex((\w+)=(\w+))
Dim ResultString = RegexObj.Replace(SubjectString, $2=$1)

Java
String.replaceAll():
String resultString = subjectString.replaceAll((\\w+)=(\\w+), $2=$1);

Matcher:
Pattern regex = Pattern.compile((\\w+)=(\\w+));
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll($2=$1);

JavaScript
result = subject.replace(/(\w+)=(\w+)/g, $2=$1);

PHP
$result = preg_replace(/(\w+)=(\w+)/, $2=$1, $subject);

Perl
$subject =~ s/(\w+)=(\w+)/$2=$1/g;

234

3.

Python
:
result = re.sub(r(\w+)=(\w+), r\2=\1, subject)

:
reobj = re.compile(r(\w+)=(\w+))
result = reobj.sub(r\2=\1, subject)

Ruby
result = subject.gsub(/(\w+)=(\w+)/, \2=\1)

(\w+)=(\w+)
. , , , , ,
.
,
, , ,
. -.
. 1, 2.21 , .

.NET
.NET Regex.Replace(), ,
.
.NET,
2.21.

Java
Java replaceFirst()
replaceAll(), . Java, .

3.15.

235

JavaScript
JavaScript string.replace(), .

JavaScript, .

PHP
PHP preg_replace(),
. PHP, .

Perl
Perl replace s/regex/replace/ . $&, $1, $2 ,
3.7 3.9. , , .
Perl. ,
.
, , ,
. , , $1 \1. $1 .

Python
Python sub(),
.

Python, .

Ruby
Ruby String.gsub(),
. Ruby, .
Ruby , $1, . Ruby gsub(). -

236

3.

gsub() ,
. $1, , gsub().
, \1 . gsub() . . , ,
. \1 \\1
, \1
0x01.

C#
:
string resultString = Regex.Replace(subjectString,
@(?<left>\w+)=(?<right>\w+), ${right}=${left});

Regex:
Regex regexObj = new Regex(@(?<left>\w+)=(?<right>\w+));
string resultString = regexObj.Replace(subjectString, ${right}=${left});

VB.NET
:
Dim ResultString = Regex.Replace(SubjectString,
(?<left>\w+)=(?<right>\w+), ${right}=${left})

Regex:
Dim RegexObj As New Regex((?<left>\w+)=(?<right>\w+))
Dim ResultString = RegexObj.Replace(SubjectString, ${right}=${left})

3.15.

237

PHP
$result = preg_replace(/(?P<left>\w+)=(?P<right>\w+)/, $2=$1, $subject);

preg PHP PCRE, . preg_match()


preg_match_all() . ,
preg_replace() . , , . .

Perl
$subject =~ s/(?<left>\w+)=(?<right>\w+)/$+{right}=$+{left}/g;

Perl,
5.10. $+
, . , .

Python
:
result = re.sub(r(?P<left>\w+)=(?P<right>\w+), r\g<right>=\g<left>,
subject)

:
reobj = re.compile(r(?P<left>\w+)=(?P<right>\w+))
result = reobj.sub(r\g<right>=\g<left>, subject)

Ruby
result = subject.gsub(/(?<left>\w+)=(?<right>\w+)/, \k<left>=\k<right>)

.
1,
.
2.21, , .

238

3.

3.16. ,

, . , .
, ,
.

C#
:
string resultString = Regex.Replace(subjectString, @\d+,
new MatchEvaluator(ComputeReplacement));

Regex:
Regex regexObj = new Regex(@\d+);
string resultString = regexObj.Replace(subjectString,
new MatchEvaluator(ComputeReplacement));

ComputeReplacement. , :
public String ComputeReplacement(Match matchResult) {
int twiceasmuch = int.Parse(matchResult.Value) * 2;
return twiceasmuch.ToString();
}

VB.NET
:
Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = Regex.Replace(SubjectString, \d+, MyMatchEvaluator)

Regex:
Dim RegexObj As New Regex(\d+)
Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = RegexObj.Replace(SubjectString, MyMatchEvaluator)

3.16. ,

239

ComputeReplacement. , :
Public Function ComputeReplacement(ByVal MatchResult As Match) As String
Dim TwiceAsMuch = Int.Parse(MatchResult.Value) * 2;
Return TwiceAsMuch.ToString();
End Function

Java
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile(\\d+);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
Integer twiceasmuch = Integer.parseInt(regexMatcher.group()) * 2;
regexMatcher.appendReplacement(resultString, twiceasmuch.toString());
}
regexMatcher.appendTail(resultString);

JavaScript
var result = subject.replace(/\d+/g,
function(match) { return match * 2; }
);

PHP
:
$result = preg_replace_callback(/\d+/, compute_replacement, $subject);
function compute_replacement($groups) {
return $groups[0] * 2;
}

:
$result = preg_replace_callback(
/\d+/,
create_function(
$groups,
return $groups[0] * 2;
),
$subject
);

Perl
$subject =~ s/\d+/$& * 2/eg;

240

3.

Python
:
result = re.sub(r\d+, computereplacement, subject)

:
reobj = re.compile(r\d+)
result = reobj.sub(computereplacement, subject)

computereplacement. ,
sub().
def computereplacement(matchobj):
return str(int(matchobj.group()) * 2)

Ruby
result = subject.gsub(/\d+/) {|match|
Integer(match) * 2
}

,
.
- , , .

C#
3.14
Regex.Replace(), .
, . Regex(), Replace() , .
MatchEvaluator. -, , .
new MatchEvaluator().
- MatchEvaluator() .

3.16. ,

241

, , System.Text.RegularExpressions.Match .
Match, - Regex.Match(), .
Replace() MatchEvaluator
, , , .
.
Match. , , matchResult.Value.
, , matchResult.Groups[].
- , matchResult.Value. null ,
( ).

VB.NET
3.14
Regex.Replace(), .
, . Dim ,
Replace() , .
MatchEvaluator. , , .
MatchEvaluator Dim. MatchEvaluator() AddressOf,
-. AddressOf
, .
, MatchEvaluator,
System.Text.
RegularExpressions.Match .
Match, - Regex.
Match(), . ,
ByVal.

242

3.

Replace() MatchEvaluator
, , , .
.
Match. , , MatchResult.Value.
, , MatchResult.Groups[].
- , MatchResult.Value. Nothing , (. . ).

Java
Java . , 3.11. appendReplacement()
Matcher. find() , appendTail(). appendReplacement() appendTail() .
appendReplacement() .
StringBuffer
.
, find(). , $1. ,
IllegalArgumentException.
, IndexOutOfBoundsException. appendReplacement()
find(),
IllegalStateException.
appendReplacement() . -, , ,
. , , . -,
,
.
- , . -

3.16. ,

243

,
appendReplacement() .
, ,
appendReplacement(). -
appendReplacement() ,
,
.
, appendTail().
, , appendReplacement().

JavaScript
JavaScript , .
, , string.replace() , .
, .
, , ,
.
. , , .

.
, ,
JavaScript , , . JavaScript .

PHP
preg_replace_callback() , preg_replace(), 3.14. , , , .

.
preg_replace_callback() , , . , , -

244

3.

create_function().
( -, ).
, preg_replace_callback() , . .
,
. ,
.

Perl
s/// , m//: /e. /e, execute
(),
, ,
Perl, , . $&
. .

Python
sub() Python
. .
, . ,
MatchObject, , search().
( ) . 3.7 3.9.
.

Ruby
gsub() String
: .
.
gsub() . . -

3.17. .

245

,
nil, .
, ,
. , , $~, $&
$1. 3.7, 3.8 3.9.
, \1 . .

.
3.9 3.15.

3.17.

, .
.
, HTML, <b>. <b> <before> <after>. , before <b>first before</b>
before <b>before before</b> : before
<b>first after</b> before <b>after after</b>.

C#
Regex outerRegex = new Regex(<b>.*?</b>, RegexOptions.Singleline);
Regex innerRegex = new Regex(before);
string resultString = outerRegex.Replace(subjectString,
new MatchEvaluator(ComputeReplacement));
public String ComputeReplacement(Match matchResult) {
//
//
return innerRegex.Replace(matchResult.Value, after);
}

246

3.

VB.NET
Dim
Dim
Dim
Dim

OuterRegex As New Regex(<b>.*?</b>, RegexOptions.Singleline)


InnerRegex As New Regex(before)
MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
ResultString = OuterRegex.Replace(SubjectString, MyMatchEvaluator)

Public Function ComputeReplacement(ByVal MatchResult As Match) As String




Return InnerRegex.Replace(MatchResult.Value, after);
End Function

Java
StringBuffer resultString = new StringBuffer();
Pattern outerRegex = Pattern.compile(<b>.*?</b>);
Pattern innerRegex = Pattern.compile(before);
Matcher outerMatcher = outerRegex.matcher(subjectString);
while (outerMatcher.find()) {
outerMatcher.appendReplacement(resultString,
innerRegex.matcher(outerMatcher.group()).replaceAll(after));
}
outerMatcher.appendTail(resultString);

JavaScript
var result = subject.replace(/<b>.*?<\/b>/g,
function(match) {
return match.replace(/before/g, after);
}
);

PHP
$result = preg_replace_callback(%<b>.*?</b>%,
replace_within_tag, $subject);
function replace_within_tag($groups) {
return preg_replace(/before/, after, $groups[0]);
}

Perl
$subject =~ s%<b>.*?</b>%($match = $&) =~ s/before/after/g; $match;%eg;

Python
innerre = re.compile(before)
def replacewithin(matchobj):
return innerre.sub(after, matchobj.group())

3.18. .

247

result = re.sub(<b>.*?</b>, replacewithin, subject)

Ruby
innerre = /before/
result = subject.gsub(/<b>.*?<\/b>/) {|match|
match.gsub(innerre, after)
}

. <b>.*?</b> <b> . before, after.

3.16 , , .
. ,
<b>, , 3.14. ,
.

.
3.11, 3.13 3.16.

3.18.

, .
, . ,
, .
, HTML, () ( ), HTML.
ASCII, - HTML. , text

248

3.

class=middletext/span text
span
text span class=middletext/span text.

C#
string resultString = null;
Regex outerRegex = new Regex(<[^<>]*>);
Regex innerRegex = new Regex(\([^\]*)\);
//
int lastIndex = 0;
Match outerMatch = outerRegex.Match(subjectString);
while (outerMatch.Success) {
//
//
string textBetween =
subjectString.Substring(lastIndex, outerMatch.Index - lastIndex);
resultString = resultString +
innerRegex.Replace(textBetween, \u201C$1\u201D);
lastIndex = outerMatch.Index + outerMatch.Length;
//
resultString = resultString + outerMatch.Value;
//
outerMatch = outerMatch.NextMatch();
}
//
//
string textAfter = subjectString.Substring(lastIndex,
subjectString.Length - lastIndex);
resultString = resultString + innerRegex.Replace(textAfter,
\u201C$1\u201D);

VB.NET
Dim ResultString As String = Nothing
Dim OuterRegex As New Regex(<[^<>]*>)
Dim InnerRegex As New Regex(([^]*))

Dim LastIndex = 0
Dim OuterMatch = OuterRegex.Match(SubjectString)
While OuterMatch.Success


Dim TextBetween = SubjectString.Substring(LastIndex,
OuterMatch.Index - LastIndex);
ResultString = ResultString + InnerRegex.Replace(TextBetween,
ChrW(&H201C) + $1 + ChrW(&H201D))

3.18. . 249
LastIndex = OuterMatch.Index + OuterMatch.Length

ResultString = ResultString + OuterMatch.Value

OuterMatch = OuterMatch.NextMatch
End While


Dim TextAfter = SubjectString.Substring(LastIndex,
SubjectString.Length - LastIndex);
ResultString = ResultString +
InnerRegex.Replace(TextAfter, ChrW(&H201C) + $1 + ChrW(&H201D))

Java
StringBuffer resultString = new StringBuffer();
Pattern outerRegex = Pattern.compile(<[^<>]*>);
Pattern innerRegex = Pattern.compile(\([^\]*)\);
Matcher outerMatcher = outerRegex.matcher(subjectString);
int lastIndex = 0;
while (outerMatcher.find()) {
//
//
String textBetween = subjectString.substring(lastIndex,
outerMatcher.start());
Matcher innerMatcher = innerRegex.matcher(textBetween);
resultString.append(innerMatcher.replaceAll(\u201C$1\u201D));
lastIndex = outerMatcher.end();
// ,
resultString.append(outerMatcher.group());
}
//
//
String textAfter = subjectString.substring(lastIndex);
Matcher innerMatcher = innerRegex.matcher(textAfter);
resultString.append(innerMatcher.replaceAll(\u201C$1\u201D));

JavaScript
var result = ;
var outerRegex = /<[^<>]*>/g;
var innerRegex = /([^]*)/g;
var outerMatch = null;
var lastIndex = 0;
while (outerMatch = outerRegex.exec(subject)) {
if (outerMatch.index == outerRegex.lastIndex) outerRegex.lastIndex++;
//
//

250

3.
var textBetween = subject.substring(lastIndex, outerMatch.index);
result = result + textBetween.replace(innerRegex, \u201C$1\u201D);
lastIndex = outerMatch.index + outerMatch[0].length;
// ,
result = result + outerMatch[0];
}
//
//
var textAfter = subject.substr(lastIndex);
result = result + textAfter.replace(innerRegex, \u201C$1\u201D);

PHP
$result = ;
$lastindex = 0;
while (preg_match(/<[^<>]*>/, $subject, $groups, PREG_OFFSET_CAPTURE,
$lastindex)) {
$matchstart = $groups[0][1];
$matchlength = strlen($groups[0][0]);
//
//
$textbetween = substr($subject, $lastindex, $matchstart-$lastindex);
$result .= preg_replace(/([^]*)/, $1, $textbetween);
// ,
$result .= $groups[0][0];
//
$lastindex = $matchstart + $matchlength;
if ($matchlength == 0) {
// ,
//
//
$lastindex++;
}
}
//
//
$textafter = substr($subject, $lastindex);
$result .= preg_replace(/([^]*)/, $1, $textafter);

Perl
use encoding utf-8;
$result = ;
while ($subject =~ m/<[^<>]*>/g) {
$match = $&;
$textafter = $;
($textbetween = $`) =~ s/([^]*)/\x{201C}$1\x{201D}/g;
$result .= $textbetween . $match;

3.18. . . 251
}
$textafter =~ s/([^]*)/\x{201C}$1\x{201D}/g;
$result .= $textafter;

Python
innerre = re.compile(([^]*))
result = ;
lastindex = 0;
for outermatch in re.finditer(<[^<>]*>, subject):
#
#
textbetween = subject[lastindex:outermatch.start()]
result += innerre.sub(u\u201C\\1\u201D, textbetween)
lastindex = outermatch.end()
# ,
result += outermatch.group()
#
#
textafter = subject[lastindex:]
result += innerre.sub(u\u201C\\1\u201D, textafter)

Ruby
result = ;
textafter =
subject.scan(/<[^<>]*>/) {|match|
textafter = $
textbetween = $`.gsub(/([^]*)/, \1)
result += textbetween + match
}
result += textafter.gsub(/([^]*)/, \1)

3.13 , , ( )
( ).
.
, ,
, , . ,
, , .
, -

252

3.

. , ^ ,
, , , ^ .

, ,
.
. <[^<>]*> , , . HTML.
, HTML -
, ( ) . 3.11. , , , , , .

, 3.14. , , . . , .
,
,
.

([^]*) ,
, ,
, .
.

, . U+201C U+201D.
. Visual Studio 2008, , .
\u201C \x{201C} ,

3.19.

253

, , . , ,
, . , , ,
. , C# Java \u201C
, VB.NET
. VB.NET
ChrW().

Perl Ruby
Perl Ruby , , .
$` ( ) , , $ ( ) , . , , . $` , .

Python
, . ,
encode(), :
print result.encode(1252)

.
3.11, 3.13 3.16.

3.19.

.
,
.

254

3.

, , HTML,
. Ilike<b>bold</b>and<i>italic
</i>fonts : Ilike, bold, and, italic
fonts.

C#
:
string[] splitArray = Regex.Split(subjectString, <[^<>]*>);

, :
string[] splitArray = null;
try {
splitArray = Regex.Split(subjectString, <[^<>]*>);
} catch (ArgumentNullException ex) {
// null
//
} catch (ArgumentException ex) {
//
}

Regex:
Regex regexObj = new Regex(<[^<>]*>);
string[] splitArray = regexObj.Split(subjectString);

, Regex :
string[] splitArray = null;
try {
Regex regexObj = new Regex(<[^<>]*>);
try {
splitArray = regexObj.Split(subjectString);
} catch (ArgumentNullException ex) {
// null
//
}
} catch (ArgumentException ex) {
//
}

3.19.

255

VB.NET
:
Dim SplitArray = Regex.Split(SubjectString, <[^<>]*>)

, :
Dim SplitArray As String()
Try
SplitArray = Regex.Split(SubjectString, <[^<>]*>)
Catch ex As ArgumentNullException
null

Catch ex As ArgumentException

End Try

Regex:
Dim RegexObj As New Regex(<[^<>]*>)
Dim SplitArray = RegexObj.Split(SubjectString)

, Regex :
Dim SplitArray As String()
Try
Dim RegexObj As New Regex(<[^<>]*>)
Try
SplitArray = RegexObj.Split(SubjectString)
Catch ex As ArgumentNullException
null

End Try
Catch ex As ArgumentException

End Try

Java
, String.Split():
String[] splitArray = subjectString.split(<[^<>]*>);

256

3.

, :
try {
String[] splitArray = subjectString.split(<[^<>]*>);
} catch (PatternSyntaxException ex) {
//
}

Pattern:
Pattern regex = Pattern.compile(<[^<>]*>);
String[] splitArray = regex.split(subjectString);

, Pattern :
String[] splitArray = null;
try {
Pattern regex = Pattern.compile(<[^<>]*>);
splitArray = regex.split(subjectString);
} catch (ArgumentException ex) {
//
}

JavaScript
string.split():
result = subject.split(/<[^<>]*>/);

, ,
string.split() .
:
var list = [];
var regex = /<[^<>]*>/g;
var match = null;
var lastIndex = 0;
while (match = regex.exec(subject)) {
// , Firefox,
//
if (match.index == regex.lastIndex) regex.lastIndex++;
//
list.push(subject.substring(lastIndex, match.index));
lastIndex = match.index + match[0].length;
}
//
list.push(subject.substr(lastIndex));

3.19.

257

PHP
$result = preg_split(/<[^<>]*>/, $subject);

Perl
@result = split(m/<[^<>]*>/, $subject);

Python
:
result = re.split(<[^<>]*>, subject))

:
reobj = re.compile(<[^<>]*>)
result = reobj.split(subject)

Ruby
result = subject.split(/<[^<>]*>/)

, 3.10. , .
.

C# VB.NET
, .NET Regex.Split().
, , . null. Split() ArgumentNullException.
Split() .
, . .
. , ArgumentException.

258

3.


,
, Regex, Split() . .
Split() Regex
, . ,
. Split()
, .
, , . , regexObj.Split(subject, 3) ,
. Split() , , . .
, , Split()
,
. regexObj.Split(subject, 1) , . regexObj.
Split(subject, 0) ,
Split() .
, Split()
ArgumentOutOfRangeException.
, , ,
, . , ,
, ,
. ,

.
, ,
, .
, , , .
. Split()

3.19.

259

ArgumentOutOfRangeException. Match() Split()


, .
,
. ,
, - ,
.
,
.

Java
, split() . .
Pattern.compile(regex).split(subjectString).
, Pattern Pattern.compile().
. split() Pattern, .
Matcher. Matcher split().
Pattern.split() , String.split() .
, . , Pattern.split(subject, 3) , . split() , , .
. , , split() ,
. Pattern.split(subject, 1) , .
,
. ,
, - ,
.
,
.

260

3.

Java . , , Pattern.split()
. , . , ,
. Java
.

JavaScript
JavaScript, split() . , ,
split() .

. .
, . ,
. /g ( 3.4) .
, -
split(), JavaScript.
, , . , , , . , , split(),
( 2.9).
JavaScript . , .
, , JavaScript,
. , , .
3.12.
, , .
, ,
3.8.

3.19.

261

String.prototype.split, , , (Steven Levithan),


: http://blog.stevenlevithan.com/archives/cross-browser-split.

PHP

preg_split().
, . , $_.
, . , preg_split($regex, $subject, 3) ,
. preg_split() ,
,
.
. ,
, preg_split() ,
. -1, .
,
. ,
, - ,
. , .
preg_split() . ,
PREG_SPLIT_NO_EMPTY .

Perl

split(). , .
, . ,
split(/regex/, subject, 3) ,

262

3.

. split() , , . . ,
, split() ,
.
, Perl . -, , . , , .
, , . , ($one,
$two, $three) = split(/,/)
$_ 4 .
,
. ,
, - ,
.
,
.

Python
, split() re. , . split()
.
re.split() re.split() split()
. : .
split()
.
,
. , .
, ,
. -

3.19.

263

, .

. , .

Ruby

, split()
.
split() , , . , subject.split(re, 3) , . split()
,
,
. . ,
, split()
, . split(re, 1)
, .
,
. ,
, - ,
.
,
.
Ruby . , , split() .
,
. , ,
. Ruby
.

.
3.20.

264

3.

3.20. ,

.
, ,
.
, HTML
. , Ilike<b>bold</b>and<i>italic</i>fonts : Ilike, <b>, bold, </b>, and, <i>, italic, </i>
fonts.

C#
:
string[] splitArray = Regex.Split(subjectString, (<[^<>]*>));

Regex:
Regex regexObj = new Regex((<[^<>]*>));
string[] splitArray = regexObj.Split(subjectString);

VB.NET
:
Dim SplitArray = Regex.Split(SubjectString, (<[^<>]*>))

Regex:
Dim RegexObj As New Regex((<[^<>]*>))
Dim SplitArray = RegexObj.Split(SubjectString)

Java
List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile(<[^<>]*>);
Matcher regexMatcher = regex.matcher(subjectString);
int lastIndex = 0;
while (regexMatcher.find()) {

3.20. ,

265

resultList.add(subjectString.substring(lastIndex,
regexMatcher.start()));
resultList.add(regexMatcher.group());
lastIndex = regexMatcher.end();
}
resultList.add(subjectString.substring(lastIndex));

JavaScript
var list = [];
var regex = /<[^<>]*>/g;
var match = null;
var lastIndex = 0;
while (match = regex.exec(subject)) {
// , Firefox,
//
if (match.index == regex.lastIndex) regex.lastIndex++;
//
list.push(subject.substring(lastIndex, match.index), match[0]);
lastIndex = match.index + match[0].length;
}
//
list.push(subject.substr(lastIndex));

PHP
$result = preg_split(/(<[^<>]*>)/, $subject, -1,
PREG_SPLIT_DELIM_CAPTURE);

Perl
@result = split(m/(<[^<>]*>)/, $subject);

Python
:
result = re.split((<[^<>]*>), subject))

:
reobj = re.compile((<[^<>]*>))
result = reobj.split(subject)

Ruby
list = []
lastindex = 0;
subject.scan(/<[^<>]*>/) {|match|

266

3.
list << subject[lastindex..$~.begin(0)-1];
list << $&
lastindex = $~.end(0)
}
list << subject[lastindex..subject.length()]

Regex.Split() .NET , . .NET 1.0


1.1
. .NET 2.0 . ,
. .NET
2.0 , , .
, Split() ,
. regexObj.Split(subject, 4) ,
, , . , , , . ,
: Ilike, <b>, bold, </b>, and, <i> italic</i>fonts.
10 ,
regexObj.Split(subject, 4) .NET 2.0
34 .
.NET . ,
. , RegexOptions.ExplicitCapture
(
).

Java
Pattern.split() Java .
3.12, ,
. , , 3.8.

3.20. ,

267

JavaScript
string.split() JavaScript
. JavaScript . , - , .
, ,
3.12
,
. ,
, 3.8.

PHP
preg_split()
PREG_SPLIT_DELIM_CAPTURE, . PREG_SPLIT_DELIM_CAPTURE PREG_SPLIT_NO_EMPTY
|.
, preg_split() , . , , , , ,
.
,
, , . ,
: Ilike, <b>, bold, </b>, and, <i> italic</i>fonts.

Perl
split() Perl
. ,
.
, split() ,
. split(/(<[^<>]*>)/, $subject, 4)
, , , . , , ,

268

3.

. , : Ilike, <b>, bold, </b>, and, <i> italic</i>fonts. 10 , split($regex,


$subject, 4) 34 .
Perl .
,
.

Python
split() Python . , .
, split() , . split((<[^<>]*>), subject, 3)
, , , . , , ,
. , : I like, <b>, bold, </b>, and , <i> italic</i> fonts.
10 ,
split(regex, subject, 3) 34 .
Python . , .

Ruby
String.split() Ruby .
3.12, ,
. , , 3.8.

.
2.9.
2.11.

3.21.

269

3.21.

grep
, (
) . , , .

C#
,
,
:
string[] lines = Regex.Split(subjectString, \r?\n);

:
Regex regexObj = new Regex(regex pattern);
for (int i = 0; i < lines.Length; i++) {
if (regexObj.IsMatch(lines[i])) {
// lines[i]
} else {
// lines[i]
}
}

VB.NET
,
,
:
Dim Lines = Regex.Split(SubjectString, \r?\n)

:
Dim RegexObj As New Regex(regex pattern)
For i As Integer = 0 To Lines.Length - 1
If RegexObj.IsMatch(Lines(i)) Then
Lines[i]
Else
Lines[i]
End If
Next

270

3.

Java
,
,
:
String[] lines = subjectString.split(\r?\n);

:
Pattern regex = Pattern.compile(regex pattern);
Matcher regexMatcher = regex.matcher();
for (int i = 0; i < lines.length; i++) {
regexMatcher.reset(lines[i]);
if (regexMatcher.find()) {
// lines[i]
} else {
// lines[i]
}
}

JavaScript
,
,
. 3.19, .
var lines = subject.split(/\r?\n/);

:
var regexp = /regex pattern/;
for (var i = 0; i < lines.length; i++) {
if (lines[i].match(regexp)) {
// lines[i]
} else {
// lines[i]
}
}

PHP
,
,
:
$lines = preg_split(/\r?\n/, $subject)

:
foreach ($lines as $line) {
if (preg_match(/regex pattern/, $line)) {

3.21.

271

// line
} else {
// line
}
}

Perl
,
,
:
@lines = split(m/\r?\n/, $subject)

:
foreach $line (@lines) {
if ($line =~ m/regex pattern/) {
# $line
} else {
# $line
}
}

Python
,
,
:
lines = re.split(\r?\n, subject);

:
reobj = re.compile(regex pattern)
for line in lines[:]:
if reobj.search(line):
# line
else:
# line

Ruby
,
,
:
lines = subject.split(/\r?\n/)

:
re = /regex pattern/
lines.each { |line|

272

3.
if line =~ re
# line
else
# line
}

, ,
, .
, , . , . ,
, .
. , ,
, , . , , , ,
.
, . , . . , ,
, ,
,
.
3.19 , ,
. \r\n

, Microsoft Windows. \n , UNIX , Linux OS X.



, .
, .

, , -

273

3.21.

\r?\n .
,
Windows,
UNIX.
,
. , , , 3.5.

.
3.11 3.19.


, . ,
, , ,
.
, , ,
.
,
. , , , . , .

, ,
, .

,
.

4.1.

- , . ,

4.1.

275

. .

.
@ :
^\S+@\S+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
\A\S+@\S+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby



, , @, , . , , @, , , ,
:
^[A-Z0-9+_.-]+@[A-Z0-9.-]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
\A[A-Z0-9+_.-]+@[A-Z0-9.-]+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby



.
, ,
RFC 2822, . , , ()

276

4.

(|). , ,
, ,
SQL:
^[\w!#$%&*+/=?`{|}~^.-]+@[A-Z0-9.-]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
\A[\w!#$%&*+/=?`{|}~^.-]+@[A-Z0-9.-]+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby

,

, , ,
. ,
:
^[\w!#$%&*+/=?`{|}~^-]+(?:\.[\w!#$%&*+/=?`{|}~^-]+)*@
[A-Z0-9-]+(?:\.[A-Z0-9-]+)*$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
\A[\w!#$%&*+/=?`{|}~^-]+(?:\.[\w!#$%&*+/=?`{|}~^-]+)*@
[A-Z0-9-]+(?:\.[A-Z0-9-]+)*\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby



: . , , , secondlevel.com thirdlevel.secondlevel.com. .com . , , . (.com) (.museum):
^[\w!#$%&*+/=?`{|}~^-]+(?:\.[\w!#$%&*+/=?`{|}~^-]+)*@
(?:[A-Z0-9-]+\.)+[A-Z]{2,6}$

4.1.

277

: .NET, Java, JavaScript, PCRE, Perl, Python


\A[\w!#$%&*+/=?`{|}~^-]+(?:\.[\w!#$%&*+/=?`{|}~^-]+)*@
(?:[A-Z0-9-]+\.)+[A-Z]{2,6}\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby


- ,
,
, , . ,
,
, . , , , .
, .
RFC 2822, , asdf@asdf.asdf .
,
, . asdf.
,
, john.doe@somewhere.com
, .
-, somewhere.com , , John Doe Delete , -.
, ,
. , , .
, ,
. , , #$%@.-, ,
, .

278

4.

,
. 276 .

. , ,
, . , ,
,
,
. .
,
.
, com|net|org|mil|edu . , ,
.


, , . 2, 90% , .
, .
.
[A-Za-z] [A-Z] , .
.
X [Xx] .

2.3, \S \w . \S
, \w .

@ \. @ , . ,
. @
,
. , , 2.1.

279

4.1.

[A-Z0-9.-] ,
, . A Z, 0 9,
. , , .
2.3, ,
: [\w!#$%&*+/=?`{|}~^.-] .
, 19 .

+ * , . .
. , [A-Z0-9.-]+ , , / .

, (?:[A-Z0-9-]+\.)+ , / ,
.
. , ,
, , .
2.12.
(?:group) .
,
. (group) , .
, ,
(?: ( .

- , , .
2.9.

^ $ , . , .
.
- drop database; -- joe@server.com haha!
.
-

280

4.

, joe@server.
com . 2.5. ,
^ $ .
Ruby . , ,
Ruby, . , ,
^ $ , drop
database; -joe@server.com
haha!,

, \
A \Z .
,
, JavaScript. JavaScript \A \Z . 2.5.

^ $ , \A \Z
, . . , , JavaScript Ruby .
, , Ruby .
Ruby, \A
\Z , .


, .
, RegexBuddy.
. .
,
. . ^\S+@\

281

4.1.

S+$ :
, @ .
, ,
.
, .
, , , , ,
.

, , , ,
, ^ $ . . , , , , , asdf@asdf.as asdf@asdf.as99.

, , .

.
^ $ \b . , ^[A-Z0-9+_.-]+@(?:[A-Z09-]+\.)+[A-Z]{2,6}$ \b[A-Z0-9+_.-]+@(?:[A-Z0-9]+\.)+[A-Z]{2,6}\b .


. 275
. 276. .

.
RFC 2822 , , . RFC 2822 http://www.ietf.org/rfc/rfc2822.txt.

282

4.

4.2.

, , , . : 1234567890, 123-456-7890,
123.456.7890, 123 456 7890, (123) 456 7890 . ,
(123) 456-7890,
.

,
- . ,
.


^\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


($1)$2-$3

: .NET, Java, JavaScript, Perl, PHP


(\1)\2-\3

: Python, Ruby

C#
Regex regexObj =
new Regex(@^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$);
if (regexObj.IsMatch(subjectString)) {
string formattedPhoneNumber =
regexObj.Replace(subjectString, ($1) $2-$3);
} else {
//
}

283

4.2.

JavaScript
var regexObj = /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/;
if (regexObj.test(subjectString)) {
var formattedPhoneNumber =
subjectString.replace(regexObj, ($1) $2-$3);
} else {
//
}



3.5 3.15.

. ,
-
(, ). , ,
:
^
\(

#
#
?
#
(
#
[0-9] #
{3} #
)
#
\)
#
?
#
[-. ]
#
?
#
...
#
$
#

.
(...
.
1...
...
.
1.
)...
.
-. ...
.
[ .]
.

^ $ ,
, .
, . , ^
, $ . ,
, 123-456-78901.

, , -

284

4.

,
. ,
, . \( \) , , . ,
. ,
.

, , , , , .
, .
. . [0-9] , . , ,
\d , ,
\d , ,
. \d 2.3.

[-.] , -. .
,
, [0-9] .
, . , [.\-] .

, . {3} , ,
. [0-9]{3}
[0-9][0-9][0-9] , . ( ) ,
. {0,1} .

4.2.

285

, , . , , .
, , , ,
NANP (North American Numbering Plan ). NANP , 1. , ,
16 . .


, 10- . , , :
29,
08 .

, ,
, 29,
.
, ,
.

.
^\(?([2-9][0-8][0-9])\)?[-.]?([2-9][0-9]{2})[-.]?([0-9]{4})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, . ,
. , , ; , , , .

286

4.



:
\(?\b([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^ $, \b , - . , \b
, . ,

^ $ . ( \b ), ,
.

( 2.6).
,
. , ,
, . , ,
.

1

1, (
,
), ,
:
^(?:\+?1[-.]?)?\(?([0-9]{3})\)?[-.]?([0-9]{3})[-.]?([0-9]{4})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, , , +1
(123) 456-7890 1-123-456-7890. ,
(?:...) . ,
, , .

4.2.

287

, , .
, ,
.
, ,
, $1 $2 ( ).

(?:\+?1[-.]?)?. 1
, 1
- (, ).
, ,
1 ,
-
, 1.


, , :
^(?:\(?([0-9]{3})\)?[-.]?)?([0-9]{3})[-.]?([0-9]{4})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
($1)$2-$3 , , : () 1234567, . , , .

.
4.3 ,
.
(NANP)
, , 16 .
http://www.nanpa.com.

288

4.

4.3.

. , , .


^\+(?:[0-9]?){6,14}[0-9]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

JavaScript
function validate (phone) {
var regex = /^\+(?:[0-9] ?){6,14}[0-9]$/;
if (regex.test(phone)) {
//
} else {
//
}
}



3.5.

, , , . , , ITU-T E.123. , (
, )
. ,

(~) -

289

4.3.

, ,
( ,
) . (ITU-T E.164) , 15 . .
, ,
. , \x20 :

^
\+
(?:
[0-9]
\x20
?
)
{6,14}
[0-9]
$

#
#
#
#
#
#
#
#
#
#

.
+.
...
.
...
.
.
6 14 .
.
.

:
: .NET, Java, PCRE, Perl, Python, Ruby

^ $
, . , (?:...) , , . {6,14} , - .
[0-9] ( 6
14 7 15) ,
.


EPP
^\+[0-9]{1,3}\.[0-9]{4,14}(?:x.+)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

290

4.


, (Extensible Provisioning Protocol, EPP).
(
2004 ) .
, .com, .info, .net, .org .us. ,
EPP , ( ) .
EPP +CCC.NNNNNNNNNNxEEEE, C , 1
3 , N 14 E
. , ,
. x .

.
4.2 .
ITU-T recommendation E.123 (Notation for national and international telephone numbers, e-mail addresses and Web addresses
,
-)
http://www.itu.int/rec/T-REC-E.123.
ITU-T Recommendation E.164 (The international public
telecommunication numbering plan
) http://www.itu.int/rec/TREC-E.164.
http://
www.itu.int/ITU-T/inr/nnp.
RFC 4933
EPP, .
RFC 4933 http://tools.ietf.org/html/rfc4933.

4.4.

mm/
dd/yy, mm/dd/yyyy, dd/mm/yy dd/mm/yyyy. -

4.4.

291

, , , ,
31 .

, :
^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
^[0-3][0-9]/[0-3][0-9]/(?:[0-9][0-9])?[0-9][0-9]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
m/d/yy mm/dd/yyyy. :
^(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
mm/dd/yyyy, :
^(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
d/m/yy dd/mm/yyyy. :
^(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
dd/mm/yyyy, :
^(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])/[0-9]{4}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

292

4.


, :
^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, :
^(?:(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])|
(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9]))/[0-9]{4}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
^(?:
# m/d mm/dd
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
# d/m dd/mm
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy /yyyy
/(?:[0-9]{2})?[0-9]{2}$

:
: .NET, Java, PCRE, Perl, Python, Ruby
^(?:
# mm/dd
(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])
|
# dd/mm
(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])
)
# /yyyy
/[0-9]{4}$

:
: .NET, Java, PCRE, Perl, Python, Ruby

, - ,
, .
. -,

293

4.4.

,
. , - 4/1 . - 1, . , , .
, .
, , , :
1 31. . 3[01]|[12][0-9]|0?[1-9] , 3, 0 1,
1 2, , 0, 1 9. , [1-9] , .
, , 0 9, ASCII .
6.

. ,
, , \d{2}/\d{2}/\d{4} .
, , 99/99/9999,
, . , .

, ,
0/0/00 31/31/2008. , - , ( 2.3), , ( 2.12), . (?:[0-9]{2})?[0-9]{2} , . [0-9]{2} . (?:[0-9]{2})?
. ( 2.9) , {2} . [0-9]{2}? ,
[0-9]{2} . ,

4 . . .

294

4.

, {2} , .
3 6 1
12, 1 31. ( 2.8), .
, ,
.
,
. . JavaScript . , . , ,
12/31 31/12 ,
31/31.

, , , ^ $ .


. , , 12/12/2001
9912/12/200199. ,
, .
.
.
^ $ \b . :

\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

.
4.5, 4.6 4.7.

4.5.

295

4.5.

mm/dd/yy, mm/
dd/yyyy, dd/mm/yy dd/mm/yyyy. , , 31 .

C#
, :
DateTime foundDate;
Match matchResult = Regex.Match(SubjectString,
^(?<month>[0-3]?[0-9])/(?<day>[0-3]?[0-9])/ +
(?<year>(?:[0-9]{2})?[0-9]{2})$);
if (matchResult.Success) {
int year = int.Parse(matchResult.Groups[year].Value);
if (year < 50) year += 2000;
else if (year < 100) year += 1900;
try {
foundDate = new DateTime(year,
int.Parse(matchResult.Groups[month].Value),
int.Parse(matchResult.Groups[day].Value));
} catch {
//
}
}

, :
DateTime foundDate;
Match matchResult = Regex.Match(SubjectString,
^(?<day>[0-3]?[0-9])/(?<month>[0-3]?[0-9])/ +
(?<year>(?:[0-9]{2})?[0-9]{2})$);
if (matchResult.Success) {
int year = int.Parse(matchResult.Groups[year].Value);
if (year < 50) year += 2000;
else if (year < 100) year += 1900;
try {
foundDate = new DateTime(year,
int.Parse(matchResult.Groups[month].Value),
int.Parse(matchResult.Groups[day].Value));
} catch {
//
}

296

4.
}

Perl
, :
@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
$validdate = 0;
if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) {
$month = $1;
$day = $2;
$year = $3;
$year += 2000 if $year < 50;
$year += 1900 if $year < 100;
if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 ||
$year % 400 == 0)) {
$validdate = 1 if $day >= 1 && $day <= 29;
} elsif ($month >= 1 && $month <= 12) {
$validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1];
}
}

, :
@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
$validdate = 0;
if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) {
$day = $1;
$month = $2;
$year = $3;
$year += 2000 if $year < 50;
$year += 1900 if $year < 100;
if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 ||
$year % 400 == 0)) {
$validdate = 1 if $day >= 1 && $day <= 29;
} elsif ($month >= 1 && $month <= 12) {
$validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1];
}
}



, :
^(?:
# (29 )
(?<month>0?2)/(?<day>[12][0-9]|0?[1-9])

4.5.
|
# 30
(?<month>0?[469]|11)/(?<day>30|[12][0-9]|0?[1-9])
|
# 31
(?<month>0?[13578]|1[02])/(?<day>3[01]|[12][0-9]|0?[1-9])
)
#
/(?<year>(?:[0-9]{2})?[0-9]{2})$

:
: .NET
^(?:
# (29 )
(0?2)/([12][0-9]|0?[1-9])
|
# 30
(0?[469]|11)/(30|[12][0-9]|0?[1-9])
|
# 31
(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9])
)
#
/((?:[0-9]{2})?[0-9]{2})$

:
: .NET, Java, PCRE, Perl, Python, Ruby
^(?:(0?2)/([12][0-9]|0?[1-9])|(0?[469]|11)/(30|[12][0-9]|0?[1-9])|
(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))/((?:[0-9]{2})?[0-9]{2})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
^(?:
# (29 )
(?<day>[12][0-9]|0?[1-9])/(?<month>0?2)
|
# 30
(?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[469]|11)
|
# 31
(?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[13578]|1[02])
)
#
/(?<year>(?:[0-9]{2})?[0-9]{2})$

297

298

4.

:
: .NET
^(?:
# (29 )
([12][0-9]|0?[1-9])/(0?2)
|
# 30
(30|[12][0-9]|0?[1-9])/([469]|11)
|
# 31
(3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02])
)
#
/((?:[0-9]{2})?[0-9]{2})$

:
: .NET, Java, PCRE, Perl, Python, Ruby
^(?:([12][0-9]|0?[1-9])/(0?2)|(30|[12][0-9]|0?[1-9])/([469]|11)|
(3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02]))/((?:[0-9]{2})?[0-9]{2})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, . , , //, ,
. , 0 39 .
mm/dd/yyyy dd/mm/yyyy , , .
, . . C#
DateTime, .NET.
. ,
. -

4.5.

299

, , , .
,
, ,

. . , . 1 2. 30
3 4. 31
5 6. 7
.
.NET . .NET ( 2.11)

.
.NET , , month day, , . , , ,

, , . .
, , , , . , ,
. .
, . , , 2 2007 29 2008 ,
d/m/yy dd/mm/yyyy:
# 2 2007 29 2008
^(?:
# 2 2007 31 2007
(?:
# 2 31
(?<day>3[01]|[12][0-9]|0?[2-9])/(?<month>0?5)/(?<year>2007)
|
# 1 31
(?:

300

4.
# 30
(?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[69]|11)
|
# 31
(?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[78]|1[02])
)
/(?<year>2007)
)
|
# 1 2008 29 2008
(?:
# 1 29
(?<day>[12][0-9]|0?[1-9])/(?<month>0?8)/(?<year>2008)
|
# 1 30
(?:
#
(?<day>[12][0-9]|0?[1-9])/(?<month>0?2)
|
# 30
(?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[46])
|
# 31
(?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[1357])
)
/(?<year>2008)
)
)$

:
: .NET, Java, PCRE, Perl, Python, Ruby

.
4.4, 4.6 4.7.

4.6.

, hh:mm hh:mm:ss 12-


24- .

4.6.

301

12- :
^(1[0-2]|0?[1-9]):([0-5]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
24- :
^(2[0-3]|[01]?[0-9]):([0-5]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 12- :
^(1[0-2]|0?[1-9]):([0-5]?[0-9]):([0-5]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 24- :
^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

. , .

, .
60 , 60 . , .
. [0-5]?[0-9] 0 5, 0 9. 0 59.
.
0 9. 10 00 09, . 2.3 2.12
, .

( 2.8).

302

4.

,
. 12- , 0, 10 , 1, 0, 1 2.
1[0-2]|0?[1-9] . 24- ,
0 1, 10 ,
2, 0 3. ,
2[0-3]|[01]?[0-9] . 10 . , , .

, , . , , . 2.9 , .
3.9 , , .
, , , .
, . , , , ,
.

, , ,
,
^ $ . . , , 12:12 9912:1299.
, , .

.
.
^ $ \b . :

\b(2[0-3]|[01]?[0-9]):([0-5]?[0-9])\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.7. ISO 8601

303

, . , 24- , 16:08 The


time is 16:08:42 sharp. ,
1 ,
. 8 , , \b .

,
( 2.16). The
time is 16:08:42 sharp. , :
(?<![:\w])(2[0-3]|[01]?[0-9]):([0-5]?[0-9])(?![:\w])

:
: .NET, Java, PCRE, Perl, Python, Ruby 1.9

.
4.4, 4.5 4.7.

4.7.
ISO 8601

/ ISO 8601, . ,
XML Schema date, time dateTime
ISO 8601.

,
2008-08. :
^([0-9]{4})-(1[0-2]|0[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-(?<month>1[0-2]|0[1-9])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9

304

4.
^(?P<year>[0-9]{4})-(?P<month>1[0-2]|0[1-9])$

:
: PCRE, Python
, 2008-08-30. . YYYYMMDD
YYYYMM-DD, ISO 8601:
^([0-9]{4})-?(1[0-2]|0[1-9])-?(3[0-1]|0[1-9]|[1-2][0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?(?<month>1[0-2]|0[1-9])-?
(?<day>3[0-1]|0[1-9]|[1-2][0-9])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, 2008-08-30. .
YYYY-MMDD YYYYMM-DD. :
^([0-9]{4})(-)?(1[0-2]|0[1-9])(?(2)-)(3[0-1]|0[1-9]|[1-2][0-9])$

:
: .NET, PCRE, Perl, Python
, 2008-08-30. .
YYYY-MMDD YYYYMM-DD. :
^([0-9]{4})(?:(1[0-2]|0[1-9])|-?(1[0-2]|0[1-9])-?)
(3[0-1]|0[1-9]|[1-2][0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 2008-W35. :
^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?W(?<week>5[0-3]|[1-4][0-9]|0[1-9])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9

4.7. ISO 8601

305

, 2008-W35-6. :
^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])-?([1-7])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?W(?<week>5[0-3]|[1-4][0-9]|0[1-9])-?(?<day>[1-7])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, 2008-243. :
^([0-9]{4})-?(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?
(?<day>36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, 17:21. :
^(2[0-3]|[01]?[0-9]):?([0-5]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[01]?[0-9]):?(?<minute>[0-5]?[0-9])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, , 17:21:59. :
^(2[0-3]|[01]?[0-9]):?([0-5]?[0-9]):?([0-5]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[01]?[0-9]):?(?<minute>[0-5]?[0-9]):?
(?<second>[0-5]?[0-9])$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, Z, +07 +07:00. :
^(Z|[+-](?:2[0-3]|[01]?[0-9])(?::?(?:[0-5]?[0-9]))?)$

306

4.

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, ,
17:21:59+07:00. .
:
^(2[0-3]|[01]?[0-9]):?([0-5]?[0-9]):?([0-5]?[0-9])
(Z|[+-](?:2[0-3]|[01]?[0-9])(?::?(?:[0-5]?[0-9]))?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[01]?[0-9]):?(?<minute>[0-5]?[0-9]):?(?<sec>[0-5]?[0-9])
(?<timezone>Z|[+-](?:2[0-3]|[01]?[0-9])(?::?(?:[0-5]?[0-9]))?)$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, 2008-0830 2008-08-30+07:00. . date
XML Schema:
^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[0-1]|0[1-9]|[1-2][0-9])
(Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>-?(?:[1-9][0-9]*)?[0-9]{4})-(?<month>1[0-2]|0[1-9])-
(?<day>3[0-1]|0[1-9]|[1-2][0-9])
(?<timezone>Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
, 01:45:36 01:45:36.123+07:00. time XML
Schema:
^(2[0-3]|[0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\.[0-9]+)?
(Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[0-1][0-9]):(?<minute>[0-5][0-9]):(?<second>[0-5][0-9])
(?<ms>\.[0-9]+)?(?<timezone>Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9

307

4.7. ISO 8601

, 2008-08-30T01:45:36 2008-08-30T01:45:36.123Z. dateTime


XML Schema:
^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[0-1]|0[1-9]|[1-2][0-9])
T(2[0-3]|[0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\.[0-9]+)?
(Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>-?(?:[1-9][0-9]*)?[0-9]{4})-(?<month>1[0-2]|0[1-9])-
(?<day>3[0-1]|0[1-9]|[1-2][0-9])T(?<hour>2[0-3]|[0-1][0-9]):
(?<minute>[0-5][0-9]):(?<second>[0-5][0-9])(?<ms>\.[0-9]+)?
(?<timezone>Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9

ISO 8601 . , ,
,
, ISO 8601, . , XML Schema
. ,
, . , , . , (?:group) . ,
.

, ISO 8601, . , 1733:26 ISO 8601


, . ,
, , . ,
, XML Scheme, , ,
.

, , ,
, , , . 2.9 , .

308

4.

3.9 , , , .
,
, .

. .
.NET, PCRE 7, Perl 5.10 Ruby 1.9 (?<name>group) . PCRE Python,
, (?P<name>group) ,
P . 2.11 3.9.


. , 01
31. 32- 13- .
, 31 .
4.5 , .
, , ,
, 4.4 4.6.

.
4.4, 4.5 4.6.

4.8.
-

- .


. , . , , ,
. .


^[A-Z0-9]+$

4.8. -

309

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Ruby
if subject =~ /^[A-Z0-9]+$/i
puts -
else
puts -
end



3.4 3.5.

, :
^
[A-Z0-9]
+
$

# .
# A Z 0 9...
#
.
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

^ $ , . , . + . , + * . *
, .

ASCII

128 7- ASCII. 33 :
^[\x00-\x7F]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

310

4.


ASCII
,
ASCII, .
( 0x0A 0x0D, ) , \n (
) \r ( ):

^[\n\r\x20-\x7E]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
ISO-8859-1 Windows-1252
ISO-8859-1 Windows-1252 ( ANSI) , Latin-1 ( , ISO/IEC 8859-1).
0x80
0x9F. ISO-8859-1
, Windows-1252
.
,
, , ,
Windows.
, ISO8859-1 Windows-1252 ( ):
^[\x00-\x7F\xA0-\xFF]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
, [A-Z0-9] , . : \x00-\x7F \xA0-\xFF.

-

. -

4.8. -

311

,
, :
^[\p{L}\p{N}]+$

:
: .NET, Java, PCRE, Perl, Ruby 1.9
, , . , JavaScript, Python
Ruby 1.8. , PCRE, , PCRE UTF-8.
preg PHP ( PCRE) /u .

Python:
^[^\W_]+$

:
: Python
Python UNICODE U . , . \w ,
-
. \W ,
. , , ,
.1

.
4.9 , ,
.

( ) , , ( 2.16) ( ,
. 58 2.3).

312

4.

4.9.

, 1 10 A Z.

, , . , JavaScript length, , .
, , . , 1 10 , A Z. , ,
, AZ.


^[A-Z]{1,10}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Perl
if ($ARGV[0] =~ /^[A-Z]{1,10}$/) {
print \n;
} else {
print \n;
}



3.5.

,
:
^
[A-Z]

# .
# A Z...

313

4.9.

{1,10} #
1 10 .
# .

:
: .NET, Java, PCRE, Perl, Python, Ruby

^ $ ,
,
10 . [A-Z] A Z, {1,10}
1 10 . , , ,
.

, [A-Z]
.
a z,
[A-Za-z]
. 3.4 , .


, , [A-z] .
,
. AZ az
ASCII .
[A-z] [A-Z[\]^_`a-z] .

, {1,10} ,
, , , , .
2.16, ( ) , , ^ $ , . ,
. , (?=...) , , -

314

4.

, . , . :
^(?=.{1,10}$).*

:
: .NET, Java, PCRE, Perl, Python, Ruby
^(?=[\S\s]{1,10}$)[\S\s]*

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, $ , , . , . .* ( [\S\s]* , , JavaScript) , .

, , . , 3.4. JavaScript , , . , .
60 2.4.


, 10 100 :
^\s*(?:\S\s*){10,100}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Java, PCRE, Python Ruby \s ASCII, \S . Python \s ,


UNICODE U . Java,

315

4.9.

PCRE Ruby 1.9, ,


, ( 2.7):
^[\p{Z}\s]*(?:[^\p{Z}\s][\p{Z}\s]*){10,100}$

:
: .NET, Java, PCRE, Perl, Ruby 1.9
PCRE, PCRE UTF-8. PHP UTF-8
/u.
Separator
, \p{Z} , \s , . , \p{Z}
\s , . \s
0x09 0x0D (, , , ), Separator. \p{Z} \s .

{10,100} , . ,
. , .


, ,
, . , 10 100 , , ,
:
^\W*(?:\w+\b\W*){10,100}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Java, JavaScript, PCRE Ruby \w , ,


AZ, az, 09 _ , ,

316

4.

ASCII. .NET Perl \w


( ,
\W , \b ) . Python , ,
, UNICODE U .

,
, ASCII, .
^[^\p{L}\p{N}_]*(?:[\p{L}\p{N}_]+\b[^\p{L}\p{N}_]*){10,100}$

:
: .NET, Java, Perl
^[^\p{L}\p{N}_]*(?:[\p{L}\p{N}_]+(?:[^\p{L}\p{N}_]+|$)){10,100}$

:
: .NET, Java, PCRE, Perl, Ruby 1.9
PCRE UTF-8. UTF-8
PHP /u.
, ( ) , ,
. 70,
2.6.

,
( \p{L} \p{N} ), , , \w
\W .

, , . \W ( [^\p{L}\p{N}_] )
, . , , , ,
\b \w \W ( [\p{L}\p{N}_] [^\p{L}\p{N}_] ), , .

317

4.10.


,
.
(
PCRE Ruby 1.9) . ( ) ( ) , . , , , PCRE Ruby \b . Java \
b , , \w .

,
, , ASCII, JavaScript Ruby 1.8. ,
,
, :
^\s*(?:\S+(?:\s+|$)){10,100}$

:
: .NET, Java, JavaScript, Perl, PCRE, Python, Ruby
,
, . , , , ( far-reaching), , .

.
4.8 4.10.

4.10.

,
, .

, , , ,

318

4.

. , . , , ,
MS-DOC/Windows ( \r\n ), Mac OS
( \r ) UNIX/Linux/OS X ( \n ).



.
, (?:...) , , (? ...) ,
, . , ( \A ^ ,
\z , \Z $ ). . :

\A(?>(?>\r\n?|\n)?[^\r\n]*){0,5}\z

:
: .NET, Java, PCRE, Perl, Ruby
\A(?:(?:\r\n?|\n)?[^\r\n]*){0,5}\Z

:
: Python
^(?:(?:\r\n?|\n)?[^\r\n]*){0,5}$

:
: JavaScript

PHP (PCRE)
if (preg_match('/\A(?>(?>\r\n?|\n)?[^\r\n]*){0,5}\z/', $_POST['subject'])) {
print ;
} else {
print ;
}



3.5.

4.10.

319

, ,
, , MS-DOS/Windows, Mac OS UNIX/
Linux/OS X,
. , , .
JavaScript, . JavaScript , .
:
^
(?:
(?:
\r

#
#
#
#
#
\n
#
#
?
#
|
#
\n
#
)
#
?
#
[^\r\n] #
*
#
)
#
{0,5}
#
$
#

.
, ...
, ...

(CR, ASCII- 0x0D).

(LF, ASCII- 0x0A)...
.
...
.
.
.
, CR LF...
.
.
.
.

:
: .NET, Java, PCRE, Perl, Python, Ruby

^ . , ,
, ,
.

, . , ( ). -

320

4.

. ,
, .
( , , ).
, (?:[^\r\n]*(?:\r\n?|\n)?) ,
.
,
.

, ( \r\n , , MS-DOS/
Windows)

( \r ,
Mac OS)

(\n, ,
UNIX/Linux/OS X)

.
( , Python JavaScript) , . , , ,
(
2.15).
, . ^ $
. , , , ,
\A , \Z \z .
, , .
....

Perl ,
. Perl

321

4.10.

$ , . $ Perl . , Perl
, : \Z \z .
\Z Perl ,
$ , , , ^ $ . \z , . , , , \z , , .


Perl , /. .NET, Java, PCRE Ruby , \Z \z , ,
Perl. Python \Z (
), ,
, \z (
) Perl. JavaScript z, , , , $ ( , ^ $ ).

\A .
, , JavaScript
( ).
, , , , . ,
, , .


, , , MSDOS/Windows, Mac OS UNIX/Linux/OS X. -

322

4.


, . , .
\A(?>\R?\V*){0,5}\z

:
: PCRE 7 (with the PCRE_BSR_UNICODE option), Perl 5.10
\A(?>(?>\r\n?|[\n-\f\x85\x{2028}\x{2029}])?
[^\n-\r\x85\x{2028}\x{2029}]*){0,5}\z

:
: PCRE, Perl
\A(?>(?>\r\n?|[\n-\f\x85\u2028\u2029])?[^\n-\r\x85\u2028\u2029]*){0,5}\z

:
: .NET, Java, Ruby
\A(?:(?:\r\n?|[\n-\f\x85\u2028\u2029])?[^\n-\r\x85\u2028\u2029]*){0,5}\Z

:
: Python
^(?:(?:\r\n?|[\n-\f\x85\u2028\u2029])?[^\n-\r\x85\u2028\u2029]*){0,5}$

:
: JavaScript
, . 4.1 .
4.1.

U+000D U+000A

\r\n

U+000A

\n

U+000B

\v



(CRLF)

Windows MS-DOS


(LF)

UNIX,
Linux OS X

(VT)

()

323

4.11.

U+000C

\f

U+000D

\r

U+0085

\x85

U+2028
U+2029

\x2028

\x{2028}
\x2029

\x{2029}


(FF)

()


(CR)

Mac OS

(NEL)


IBM ()

()

()

.
4.9.

4.11.

,
, .
,
, true, t, yes, y, okay, ok 1, .

,
.


^(?:1|t(?:rue)?|y(?:es)?|ok(?:ay)?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

324

4.

JavaScript
var yes = /^(?:1|t(?:rue)?|y(?:es)?|ok(?:ay)?)$/i;
if (yes.test(subject)) {
alert(Yes);
} else {
alert(No);
}



3.4 3.5.

,
. , ,
:
^
(?:
1
|
t(?:rue)?

#
#
#
#
#
#
|
#
y(?:es)? #
#
|
#
ok(?:ay)? #
#
)
#
$
#

.
, ...
1.
...
t,
rue.
...
y,
es.
...
ok,
ay.
.
.

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

. .

, ^(?:[1ty]|true|yes|ok(?:ay)?)$
. , ^(?:1|t|true|y|yes|ok|okay)$ , , , , , -

325

4.12.

| ( ? ).
,
,
. .

, . , , , ^true|yes$ ,
, true
yes, . ^(?:true|yes)$

, true yes, .

.
5.2 5.3.

4.12.

, .

,
,
. ,
, , , . .


^(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-
(?!00)[0-9]{2}-(?!0000)[0-9]{4}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

326

4.

Python
if re.match(r^(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-
(?!00)[0-9]{2}-(?!0000)[0-9]{4}$, sys.argv[1]):
print
else:
print



3.5.

AAA-GG-SSSS:

. 000 666

, 772.

, 01 99.
,
0001 9999.

,
. , :
^
(?!000|666)
(?:
[0-6]
[0-9]{2}
|
7
(?:
[0-6]
[0-9]
|
7
[0-2]
)
)
(?!00)

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

.
000 666 .
, ...
0 6.
.
...
7.
, ...
0 6.
.
...
7.
0 2.
.
.
-.
00 .

327

4.12.
[0-9]{2}
(?!0000)
[0-9]{4}
$

#
#
#
#
#

.
-.
0000 .
.
.

:
: .NET, Java, PCRE, Perl, Python, Ruby

^ $ ,
,
, . . - , , , , .

, . (?!000|666) , 000 666. 772.

,
, . -, ,
, 0 6,
,
000 666. : [0-6][0-9]{2} . , 7, , , , (?:[0-6][0-9]{2}|7) ,
.

, 7, ,
700 772, , ,
7. 0 6, . 7,
, 0 2.
, 7, 7(?:[0-6][0-9]|7[0-2]) , 7,

.

, ,
, (?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2])) .
, 000 772.

328

4.



, ^ $ . - .

\b(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-
(?!00)[0-9]{2}-(?!0000)[0-9]{4}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

.
http://www.socialsecurity.
gov ,
.
(Social Security
Number Verification Service, SSNVS), : http://www.socialsecurity.
gov/employer/ssnv.htm, , .
,
, , 6.5.

4.13. ISBN
(International Standard Book Number, ISBN), (ISBN-10) (ISBN-13) .
ISBN
. ISBN 9780-596-52068-7, ISBN-13: 978-0-596-52068-7, 978 0 596 52068 7, 9780596520687,
ISBN-10 0-596-52068-9 0-596-52068-9
ISBN.

ISBN
, .

4.13. ISBN

329

, , ISBN, .


ISBN-10:
^(?:ISBN(?:-10)?:?)?(?=[-0-9X]{13}$|[0-9X]{10}$)[0-9]{1,5}[-]?
(?:[0-9]+[-]?){2}[0-9X]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
ISBN-13:
^(?:ISBN(?:-13)?:?)?(?=[-0-9]{17}$|[0-9]{13}$)97[89][-]?[0-9]{1,5}
[-]?(?:[0-9]+[-]?){2}[0-9]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
ISBN-10 ISBN-13:
^(?:ISBN(?:-1[03])?:?)?(?=[-0-9]{17}$|[-0-9X]{13}$|[0-9X]{10}$)
(?:97[89][-]?)?[0-9]{1,5}[-]?(?:[0-9]+[-]?){2}[0-9X]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

JavaScript
// ` ` ISBN-10 ISBN-13
var regex = /^(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}$|[-0-9X ]{13}$|
[0-9X]{10}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$/;
if (regex.test(subject)) {
// , ISBN,
//
var chars = subject.replace(/[^0-9X]/g, ).split();
// ISBN `chars`,
// `last`
var last = chars.pop();
var sum = 0;
var digit = 10;
var check;
if (chars.length == 9) {
// ISBN-10
for (var i = 0; i < chars.length; i++) {
sum += digit * parseInt(chars[i], 10);

330

4.
digit -= 1;
}
check = 11 - (sum % 11);
if (check == 10) {
check = X;
} else if (check == 11) {
check = 0;
}
} else {
// ISBN-13
for (var i = 0; i < chars.length; i++) {
sum += (i % 2 * 2 + 1) * parseInt(chars[i], 10);
}
check = 10 - (sum % 10);
if (check == 10) {
check = 0;
}
}
if (check == last) {
alert( ISBN);
} else {
alert( ISBN);
}
} else {
alert( ISBN);
}

Python
import re
import sys
# ISBN-10 ISBN-13
regex = re.compile(^(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}$|
[-0-9X ]{13}$|[0-9X]{10}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?
(?:[0-9]+[- ]?){2}[0-9X]$)
subject = sys.argv[1]
if regex.search(subject):
# , ISBN, ,
#
chars = re.sub([^0-9X], , subject).split()
# ISBN `chars`
# `last`
last = chars.pop()

4.13. ISBN

331

if len(chars) == 9:
# ISBN-10
val = sum((x + 2) * int(y) for x,y in enumerate(reversed(chars)))
check = 11 - (val % 11)
if check == 10:
check = X
elif check == 11:
check = 0
else:
# ISBN-13
val = sum((x % 2 * 2 + 1) * int(y) for x,y in enumerate(chars))
check = 10 - (val % 10)
if check == 10:
check = 0
if (str(check) == last):
print ISBN
else:
print ISBN
else:
print ISBN



3.5.

ISBN
. 10- ISBN ISO 2108 1970 . ISBN, 1 2007 , 13 .
ISBN-10 ISBN-13
, . ,
. . :
13- ISBN 978 979.

,
.
.

ISBN.

332

4.

.
ISBN-10 0
9 X ( 10 ),
ISBN-13 0 9. ISBN , .
ISBN-10
ISBN-13, . ,
.
Java ,
:
^
(?:
ISBN
(?:-1[03])?
:?
\
)?
(?=
[-0-9\ ]{17}$
|
[-0-9X\ ]{13}$
|
[0-9X]{10}$
)
(?:
97[89]
[-\ ]?
)?
[0-9]{1,5}
[-\ ]?
(?:
[0-9]+
[-\ ]?
){2}
[0-9X]
$

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

.
, ...
ISBN.
-10 -13.
:.
().
.
...
17 , ,
. ...
13 , , X ,
. ...
X, .
.
, ...
978 979.
.
.
.
.
, ...
.
.
.
X.
.

:
: .NET, Java, PCRE, Perl, Python, Ruby

333

4.13. ISBN

(?:ISBN(?:-1[03])?:?)? , , ( , , ):
ISBN

ISBN-10
ISBN-13
ISBN:

ISBN-10:
ISBN-13:

( ISBN )

(?=[-0-9]{17}$|[-09X]{13}$|[0-9X]{10}$) , ( | ),
.
( )
$ , , :

[-0-9]{17}$

ISBN-13 ( 17 )

[-0-9X]{13}$
ISBN-13 ISBN-10 ( 13 )

[0-9X]{10}$
ISBN-10 ( 10 )

, ISBN, . (?:97[89][-]?)? 978 979, ISBN-13. , , ISBN-10. [0-9]{1,5}
[-]?
, . (?:[0-9]+[-]?){2} , , . , [0-9X]$
.

, ,
(-

334

4.

X), , ISBN. ,
ISBN , ( , ISBN-10 ISBN-13). JavaScript Python.
, .

ISBN-10
ISBN-10
0 10 ( 10
X). :
1. 9 , 10 2, .
2. 11.
3. ( ) 11.
4. 11, 0, , 10, X.

0-596-52068-?, ISBN-10:
1:
sum = 100 + 95 + 89 + 76 + 65 + 52 + 40 + 36 + 28
=
0 + 45 + 72 + 42 + 30 + 10 + 0 + 18 + 16
= 233
2:
233 11 = 21, 2
3:
11 2 = 9
4:
9 [ ]

9,
: ISBN 0-596-52068-9.

ISBN-13
ISBN-13
0 9 :
1. 12
1 3, .

4.13. ISBN

335

2. 10.
3. ( ) 10.
4. 10, 0.

978-0-596-52068-?, ISBN-13:
1:
sum = 19 + 37 + 18 + 30 + 15 + 39 + 16 + 35 + 12 + 30 + 16 + 38
= 9 + 21 + 8 + 0 + 5 + 27 + 6 + 15 + 2 + 0 + 6 + 24
= 123
2:
123 10 = 12, 3
3:
10 3 = 7
4:
7 [ ]

7,
: ISBN 978-0-596-52068-7.

ISBN
ISBN-10 ISBN-13
,
ISBN . , ISBN
. -,
( 10- 13-
) -, ISBN :
\bISBN(?:-1[03])?:?(?=[-0-9]{17}$|[-0-9X]{13}$|[0-9X]{10}$)
(?:97[89][-]?)?[0-9]{1,5}[-]?(?:[0-9]+[-]?){2}[0-9X]\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

ISBN
ISBN-10, ISBN-13, .
( 2.17), ,
ISBN-10 ISBN-13 ISBN .
ISBN-10 ISBN-13,

336

4.

. , ,
, ISBN-10 ISBN-13,
. :
^
(?:ISBN(-1(?:(0)|3))?:?\ )?
(?(1)
(?(2)
(?=[-0-9X ]{13}$|[0-9X]{10}$)
[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$
|
(?=[-0-9 ]{17}$|[0-9]{13}$)
97[89][- ]?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9]$
)
|
(?=[-0-9 ]{17}$|[-0-9X ]{13}$|[0-9X]{10}$)
(?:97[89][- ]?)?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$
)
$

:
: .NET, PCRE, Perl, Python

.
ISBN Users Manual ISBN http://
www.isbn-international.org.

http://www.isbn-international.org/en/identifiers/allidentifiers.html
, 1 5 , ISBN.

4.14.

ZIP- ( ),
:
(ZIP + 4). 12345 12345-6789
1234, 123456, 123456789 1234-56789.

337

4.14.


^[0-9]{5}(?:-[0-9]{4})?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

VB.NET
If Regex.IsMatch(subjectString, ^[0-9]{5}(?:-[0-9]{4})?$) Then
Console.WriteLine( ZIP-)
Else
Console.WriteLine( ZIP-)
End If



3.5.

ZIP-, :
^
[0-9]{5}
(?:
[0-9]{4}
)
?
$

#
#
#
#
#
#
#
#

.
.
, ...
-.
.
.
.
.

:
: .NET, Java, PCRE, Perl, Python, Ruby
,
. ZIP-
,
^ $ , \b[0-9]{5}(?:-[0-9]{4})?\b .

.
4.15, 4.16 4.17.

338

4.

4.15. ,

, , .

^(?!.*[DFIOQU])[A-VXY][0-9][A-Z][0-9][A-Z][0-9]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, , D, F, I, O, Q U. [A-VXY] , , W Z . , . , K1A
0B1, , .

.
4.14, 4.16 4.17.

4.16. ,

,
, .

^[A-Z]{1,2}[0-9R][0-9A-Z]?[0-9][ABD-HJLNP-UW-Z]{2}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.17. ,

339


- , . ,
, , .
.

.
BS7666 http://www.govtalk.gov.uk/gdsc/
html/frames/PostCode.htm, .
4.14, 4.16 4.17.

4.17. ,

, ,
, .


^(?:Post(?:Office)?|P[.]?O\.?)?Box\b

: , ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

C#
Regex regexObj = new Regex(
@^(?:Post (?:Office )?|P[. ]?O\.? )?Box\b,
RegexOptions.IgnoreCase | RegexOptions.Multiline
);
if (regexObj.IsMatch(subjectString) {
Console.WriteLine( , );
} else {
Console.WriteLine( );
}

340

4.



3.5.

, :
^
# .
(?:
# , ...
Post\
# Post .
(?:Office\ )? # Office .
|
# ...
P[.\ ]?
# P , , .
O\.?\
# O, .
)?
# .
Box
# Box.
\b
# .

: , ^ $
: .NET, Java, PCRE, Perl, Python, Ruby
, , :
Post Office Box
post box
P.O. box
P O Box
Po. box
PO Box
Box

,
,
, .
, , . ,
, , , .

4.18. , 341

.
4.14, 4.15 4.16.

4.18.
,


, ,
.
, , , .

, . ,
, .

. , ,
.


.


^(.+?)([^\s,]+)(,?(?:[JS]r\.?|III?|IV))?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


$2,$1$3

: .NET, Java, JavaScript, Perl, PHP


\2,\1\3

: Python, Ruby

342

4.

JavaScript
function formatName (name) {
return name.replace(/^(.+?) ([^\s,]+)(,? (?:[JS]r\.?|III?|IV))?$/i,
$2, $1$3);
}



3.15.

. , ,
, .

, :
^
(

#
#
.+?
#
#
)
#
\
#
(
#
[^\s,]+
#
#
)
#
(
#
,?\
#
(?:
#
[JS]r\.? #
|
#
III?
#
|
#
IV
#
)
#
)?
#
$
#

.
1...
,
.
.
.
2...

.
.
3...
, .
, ...
Jr, Jr., Sr Sr..
...
II III.
...
IV.
.
.
.

:
: .NET, Java, PCRE, Perl, Python, Ruby
:

4.18. , 343

, ,
( ).
.

, , : Jr, Jr., Sr, Sr., II, III IV .


, :

, . ,
Sacha Baron Cohen Cohen, Sacha Baron,
Baron Cohen, Sacha.

, (, Charles de Gaulle (
) de Gaulle, Charles 15 Chicago
Manual of Style,
- (Merriam-Websters Biographical Dictionary)).
^ $
, , . , (, ), .

,
, .
.
.+?
, von de. , . ,
, , Mary Lou Norma Jeane, ,
. .

[^\s,]+ . , , ,
Latin.

344

4.

, Jr. III, . , .
. +? ,
+ ?
(
) , ( ) .
, , , , .
, ,
.

. 4.2

.
4.2.

Robert Downey, Jr.

Downey, Robert, Jr.

John F. Kennedy

Kennedy, John F.

Scarlett OHara

OHara, Scarlett

Pep Le Pew

Pew, Pep Le

J.R.R. Tolkien

Tolkien, J.R.R.

Catherine Zeta-Jones

Zeta-Jones, Catherine


,
, -

4.19.

345

. , : De, Du, La, Le, St,


St., Ste, Ste., Van Von.
(, de la):
^(.+?)((?:(?:D[eu]|L[ae]|Ste?\.?|V[ao]n))*[^\s,]+)
(,?(?:[JS]r\.?|III?|IV))?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
$2,$1$3

: .NET, Java, JavaScript, Perl, PHP


\2,\1\3

: Python, Ruby

4.19.
,
, .
, , , ,
.
, . , -.
10 30 .


, ,
. . :
[-]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
3.14 , .

346

4.


, ,

, . , , :
^(?:
(?<visa>4[0-9]{12}(?:[0-9]{3})?) |
(?<mastercard>5[1-5][0-9]{14}) |
(?<discover>6(?:011|5[0-9][0-9])[0-9]{12}) |
(?<amex>3[47][0-9]{13}) |
(?<diners>3(?:0[0-5]|[68][0-9])[0-9]{11}) |
(?<jcb>(?:2131|1800|35\d{3})\d{11})
)$

:
: .NET, PCRE 7, Perl 5.10, Ruby 1.9
^(?:
(?P<visa>4[0-9]{12}(?:[0-9]{3})?) |
(?P<mastercard>5[1-5][0-9]{14}) |
(?P<discover>6(?:011|5[0-9][0-9])[0-9]{12}) |
(?P<amex>3[47][0-9]{13}) |
(?P<diners>3(?:0[0-5]|[68][0-9])[0-9]{11}) |
(?P<jcb>(?:2131|1800|35\d{3})\d{11})
)$

:
: PCRE, Python
Java, Perl 5.6, Perl 5.8 Ruby 1.8 . . 1 Visa,
2 MasterCard , 6 JCB:
^(?:
(4[0-9]{12}(?:[0-9]{3})?) |
(5[1-5][0-9]{14}) |
(6(?:011|5[0-9][0-9])[0-9]{12}) |
(3[47][0-9]{13}) |
(3(?:0[0-5]|[68][0-9])[0-9]{11}) |
((?:2131|1800|35\d{3})\d{11})
)$

#
#
#
#
#
#

Visa
MasterCard
Discover
AMEX
Diners Club
JCB

:
: .NET, Java, PCRE, Perl, Python, Ruby

347

4.19.

JavaScript . , :
^(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|
(6(?:011|5[0-9][0-9])[0-9]{12})|(3[47][0-9]{13})|
(3(?:0[0-5]|[68][0-9])[0-9]{11})|((?:2131|1800|35\d{3})\d{11}))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
:
^(?:
4[0-9]{12}(?:[0-9]{3})? |
5[1-5][0-9]{14} |
6(?:011|5[0-9][0-9])[0-9]{12} |
3[47][0-9]{13} |
3(?:0[0-5]|[68][0-9])[0-9]{11} |
(?:2131|1800|35\d{3})\d{11}
)$

#
#
#
#
#
#

Visa
MasterCard
Discover
AMEX
Diners Club
JCB

:
: .NET, Java, PCRE, Perl, Python, Ruby
JavaScript:
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|
3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
3.6 , . , 3.9, , . , .

- JavaScript
<html>
<head>
<title>Credit Card Test</title>
</head>
<body>
<h1> </h1>

348

4.
<form>
<p>, :</p>
<p><input type=text size=20 name=cardnumber
onkeyup=validatecardnumber(this.value)></p>
<p id=notice>( )</p>
</form>
<script>
function validatecardnumber(cardnumber) {
//
cardnumber = cardnumber.replace(/[ -]/g, );
//
//
var match = /^(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|
(6(?:011|5[0-9][0-9])[0-9]{12})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])
[0-9]{11})|((?:2131|1800|35\d{3})\d{11}))$/.exec(cardnumber);
if (match) {
// , ,
var types = [Visa, MasterCard, Discover, American Express,
Diners Club, JCB];
//
// ( )
for (var i = 1; i < match.length; i++) {
if (match[i]) {
//
document.getElementById(notice).innerHTML = types[i - 1];
break;
}
}
} else {
document.getElementById(notice).innerHTML = ( );
}
}
</script>
</body>
</html>



, . .
,
, .

349

4.19.

, , ,
, , ,
. , ,
, , .

[-] .
.
. [-] , , \D ,
, .


, ,
. , .
:
Visa
13 16 , 4.
MasterCard
16 , 51 55.
Discover
16 , 6011, 65.
American Express
15 , 34 37.
Diners Club
14 , 300 305, 36 38.
JCB
15 , 2131 1800, 16 , 35.
,
. JCB | .
|| |) ,

350

4.


.
,
Visa, MasterCard AMEX, :
^(?:
4[0-9]{12}(?:[0-9]{3})? | # Visa
5[1-5][0-9]{14} |
# MasterCard
3[47][0-9]{13}
# AMEX
)$

:
: .NET, Java, PCRE, Perl, Python, Ruby
:
^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$

Regex options:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, \b .

-
- JavaScript
. 347 , . onkeyup,
validatecardnumber().
,

. .
, regexp.
exec() null (
). , regexp.exec() .
. 1 6 ,
.

, . ,
.
( undefined ,
). , . -

4.19.

351

, .




, . ,
. ,
.
, ,
luhn(cardnumber); else validatecardnumber().
, . .
.
JavaScript ,
:
function luhn(cardnumber) {
// ,
var getdigits = /\d/g;
var digits = [];
while (match = getdigits.exec(cardnumber)) {
digits.push(parseInt(match[0], 10));
}
//
var sum = 0;
var alt = false;
for (var i = digits.length - 1; i >= 0; i--) {
if (alt) {
digits[i] *= 2;
if (digits[i] > 9) {
digits[i] -= 9;
}
}
sum += digits[i];
alt = !alt;
}
//

352

4.
if (sum % 10 == 0) {
document.getElementById(notice).innerHTML += ; ;
} else {
document.getElementById(notice).innerHTML +=
; ;
}
}


. .
validatecardnumber()
,
.

\d , , .
/g. match[0] .
( ), parseInt(), , ,
sum , ,
.

. 10
, . .

4.20.

- , .
:
, (
) , - ( ),
,
( ) . , . , . ,

4.20.

353

, , .
, .
,
, . , JavaScript,
, , .
,
.


, , . ,
, :
[-.]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
3.14 , . , -
, , .
.


27 , , :
^(
(AT)?U[0-9]{8} |
(BE)?0?[0-9]{9} |
(BG)?[0-9]{9,10} |
(CY)?[0-9]{8}L |
(CZ)?[0-9]{8,10} |
(DE)?[0-9]{9} |
(DK)?[0-9]{8} |
(EE)?[0-9]{9} |

#
#
#
#
#
#
#
#

354

4.
(EL|GR)?[0-9]{9} |
#
(ES)?[0-9A-Z][0-9]{7}[0-9A-Z] |
#
(FI)?[0-9]{8} |
#
(FR)?[0-9A-Z]{2}[0-9]{9} |
#
(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3}) | #
(HU)?[0-9]{8} |
#
(IE)?[0-9]S[0-9]{5}L |
#
(IT)?[0-9]{11} |
#
(LT)?([0-9]{9}|[0-9]{12}) |
#
(LU)?[0-9]{8} |
#
(LV)?[0-9]{11} |
#
(MT)?[0-9]{8} |
#
(NL)?[0-9]{9}B[0-9]{2} |
#
(PL)?[0-9]{10} |
#
(PT)?[0-9]{9} |
#
(RO)?[0-9]{2,10} |
#
(SE)?[0-9]{12} |
#
(SI)?[0-9]{8} |
#
(SK)?[0-9]{10}
#
)$

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
, . , -
, . , JavaScript . :
^((AT)?U[0-9]{8}|(BE)?0?[0-9]{9}|(BG)?[0-9]{9,10}|(CY)?[0-9]{8}L|
(CZ)?[0-9]{8,10}|(DE)?[0-9]{9}|(DK)?[0-9]{8}|(EE)?[0-9]{9}|
(EL|GR)?[0-9]{9}|(ES)?[0-9A-Z][0-9]{7}[0-9A-Z]|(FI)?[0-9]{8}|
(FR)?[0-9A-Z]{2}[0-9]{9}|(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})|
(HU)?[0-9]{8}|(IE)?[0-9]S[0-9]{5}L|(IT)?[0-9]{11}|
(LT)?([0-9]{9}|[0-9]{12})|(LU)?[0-9]{8}|(LV)?[0-9]{11}|(MT)?[0-9]{8}|
(NL)?[0-9]{9}B[0-9]{2}|(PL)?[0-9]{10}|(PT)?[0-9]{9}|(RO)?[0-9]{2,10}|
(SE)?[0-9]{12}|(SI)?[0-9]{8}|(SK)?[0-9]{10})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
3.6 .

4.20.

355



, , . ,
DE 123.456.789.
, 27 , . ,
.

[-.] , . , .
. , [-.]
[^A-Z0-9] .


, ,
. ,
,
, , . JavaScript , .

27 , .
:

U99999999

999999999 0999999999

999999999 9999999999

99999999L

356

4.


99999999, 999999999 9999999999

999999999

99999999

999999999

999999999

X9999999X

99999999

XX999999999

999999999, 999999999999 XX999

99999999

9S99999L

99999999999

999999999 99999999999

99999999

99999999999

99999999

999999999B99

357

4.20.

999999999

999999999

99, 999, 9999, 99999, 999999, 9999999, 99999999, 999999999


9999999999

99999999999

99999999

999999999
, . , .
. , ,
. ,
.

, ,
. , | ,
. , , || . || , , ,
,
.

27 . ,

.
.

, \b .

358

4.

,
27 , , .
, 27 .
, , ,
:

^(AT)?U[0-9]{8}$

^(BE)?0?[0-9]{9}$

^(BG)?[0-9]{9,10}$

^(CY)?[0-9]{8}L$

^(CZ)?[0-9]{8,10}$

^(DE)?[0-9]{9}$

^(DK)?[0-9]{8}$

^(EE)?[0-9]{9}$

^(EL|GR)?[0-9]{9}$

^(ES)?[0-9A-Z][0-9]{7}[0-9A-Z]$

^(FI)?[0-9]{8}$

^(FR)?[0-9A-Z]{2}[0-9]{9}$

^(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})$

^(HU)?[0-9]{8}$

4.20.

359

^(IE)?[0-9]S[0-9]{5}L$

^(IT)?[0-9]{11}$

^(LT)?([0-9]{9}|[0-9]{12})$

^(LU)?[0-9]{8}$

^(LV)?[0-9]{11}$

^(MT)?[0-9]{8}$

^(NL)?[0-9]{9}B[0-9]{2}$

^(PL)?[0-9]{10}$

^(PT)?[0-9]{9}$

^(RO)?[0-9]{2,10}$

^(SE)?[0-9]{12}$

^(SI)?[0-9]{8}$

^(SK)?[0-9]{10}$

3.6.
, .
, , . , . ,
3.9. ,
, . -

360

4.

,
.
. EL , GR
.

.
, . , . , , .
,
- , http://ec.europa.eu/taxation_
customs/vies/vieshome.do.
, ,
2.3, 2.5 2.8.

,
, . , ,
, . ,
,
.
,
.
, .
,
.

5.1.

cat,
.
, .
, ,
hellcat, application Catwoman.

:
\bcat\b

362

5. ,

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 3.7. , 3.14.

, , , cat , . , , cat
, , ,
.
,
, .
2.6.
JavaScript, PCRE Ruby ,
ASCII.
,
^|[^A-Za-z0-9_] [A-Za-z0-9_]
[A-Za-z0-9_] [^A-Za-z09_]|$ . Python, UNICODE U. \b , , Latin. , JavaScript, PCRE Ruby
\bber\b darber, dar ber.
.
, , r. ,
, , .

, ( ). . PCRE ( UTF-8) Ruby 1.9


, (? =\P{L}|^)
cat(?=\P{L}|$) . -

363

5.1.

Letter ( \P{L} ), 2.7. 2.16. ,


( \b ), \P{L} [^\p{L}\p{N}_] .

JavaScript Ruby 1.8 , . ,


, ,
, ( 3.15). , ( , \w \W ASCII) , .
Letter , , \p{L} . [A-Za-z\xAA\xB5\xBA\xC0-\xD6\xD8-\xF6\xF8-\xFF] , , 256 0x00 0xFF (
. 5.1). (,
, ) , ASCII.

. 5.1.

364

5. ,

, cat
dog JavaScript. , cat . \b \w :

// 8-
var L = A-Za-z\xAA\xB5\xBA\xC0-\xD6\xD8-\xF6\xF8-\xFF;
var pattern = ([^{L}]|^)cat([^{L}]|$).replace(/{L}/g, L);
var regex = new RegExp(pattern, gi);
// cat dog
//
subject = subject.replace(regex, $1dog$2);

,
JavaScript \xHH ( HH ). , L, ,
. , \xHH, (, \\xHH) . .

.
5.2, 5.3 5.4.

5.2.

,
.



:
\b(?:one|two|three)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

365

5.2.

5.3.

JavaScript
var subject = One times two plus one equals three.;
var regex = /\b(?:one|two|three)\b/gi;
subject.match(regex);
// : [One,two,one,three]
//
//
//
//
//

,
.
,
,
.

function match_words (subject, words) {


var regex_metachars = /[(){}[\]*+?.\\^$|,\-]/g;
for (var i = 0; i < words.length; i++) {
words[i] = words[i].replace(regex_metachars, \\$&);
}
var regex = new RegExp(\\b(?: + words.join(|) + )\\b, gi);
return subject.match(regex) || [];
}
match_words(subject, [one,two,three]);
// : [One,two,one,three]


: , (
| ). , .
\bone\b|\btwo\b|\bthree\b . .

, , -

366

5. ,

,
, . , . ,
,
, awe|awesome awesome, awe
.

,
, .

two three
, ,
\b(?:one|t(?:wo|hree))\b . 5.3
, , .

JavaScript
JavaScript .
match(),
JavaScript, . match() , /g
(global ). , , null, .
(match_words()), , ,
. , , , .

, . (/i)
.

.
5.1, 5.3 5.4.

5.3.

367

5.3.

color colour .

, bat, cat rat.

, phobia.

Steven: Steve, Steven Stephen.


regular expression.

,
. .

Color colour
\bcolou?r\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Bat, cat rat


\b[bcr]at\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, phobia
\b\w*phobia\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Steve, Steven Stephen


\bSte(?:ven?|phen)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

regular expression
\breg(?:ularexpressions?|ex(?:ps?|e[sn])?)\b

368

5. ,

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby



( \b ) .
, .

Color Colour
color colour,
colorblind. ? , u. , ? ,
, .
, ( u ), , .
?
. , , , ,
. , , , ,
.

Bat, cat rat


b, c
r ,
at. \b(?:b|c|r)at\b , \b(?:bat|cat|rat)\b \bbat\b|\bcat\b|\brat\b .
, , . , ( ,
AZ),
. -

369

5.3.

, ,
, .
, . ,
. ,
, , . , ,
. , , .
:

, <[cat]{3}> cat, act,
ttt .
, <[^cat]>,
, c, a t.

, . <[a|b|c]> abc|. , ,
, . , .
, , 2.3.

, phobia
, ,
. , , arachnophobia hexakosioihexekontahexaphobia, * ,
phobia. ,
phobia ,
* + .

Steve, Steven Stephen


, . , (?:...) , | . ? ,

370

5. ,

,
n . ( )
\bSte(?:ve|ven|phen)\b .
Ste , , \b(?:Steve|Steven|Stephen)\b \bSteve\b|\
bSteven\b|\bStephen\b .
, , ,
, Ste. , S.
,
, ,

. ,
( Ste), . \bSte(?:ven?|phen)\
b , ,
, .

2.13.

regular expression
, ,
regular expression.
,
.
, ,
,
JavaScript.
, :
\b
reg
(?:
ular\
expressions?
|
ex
(?:

#
#
#
#
#
#
#
#

.
reg.
, ...
ular .
expression expressions.
...
ex.
, ...

5.4. ,
ps?
|
e
[sn]
)
?
)
\b

371

#
p ps.
#
...
#
e.
#
sn.
# .
#
.
# .
# .

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
:
regular expressions
regular expression
regexps
regexp

regexes
regexen
regex

.
5.1, 5.2 5.4.

5.4. ,


, cat. Catwoman ,
cat, , cat .


, :
\b(?!cat\b)\w+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

372

5. ,

,
( [^...] ) , , [^cat] , ,
cat. [^cat] , , c, a t. , \b[^cat]+\b cat, cup, c.
\b[^c][^a][^t]\w* , , c, a
t. ,
, , , ,
.

, , , , :
\b
(?!

#
#
#
cat #
\b #
)
#
\w+ #

.
, , ,
, ...
cat.
.
.
.

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
, (?!...) . cat, ,
,
,
.
, . + \w+ ,
.


categorically match any word except cat, :
categorically, match, any, word except.

5.5. ,

373

,
, , cat, ,
cat, :
\b(?:(?!cat)\w)+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
.
, , ; ,
, ,
. , , .

cat, .
, .

cat.

.
2.16, ( , ).
5.1, 5.5, 5.6 5.11.

5.5. ,

,
cat,
, ,
.

374

5. ,

:
\b\w+\b(?!\W+cat\b)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
3.7 3.14 ,
.

,
( \b ) ( \w ). 2.6.
(?!...) , . , , , , .
, ,
, .
, .

,
, , ,
. ( ) 2.16.

: \
W+ , , cat , ,
, cat, , ,
cat.

,
cat, , cat.
cat,

375

5.6. ,

5.4,
\b(?!cat\b)\w+\b(?!\W+cat\b) .

, cat ( cat , ), :
\b\w+\b(?=\W+cat\b)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

.
2.16, ( , ).
5.4 5.6.

5.6. ,

,
cat, , .



.
, , , . 2.16.
, (? !...) . , ,
, , .

376

5. ,

-.
. 11, .

, cat
(?<!\bcat\W+)\b\w+

:
: .NET
(?<!\bcat\W{1,9})\b\w+

:
: .NET, Java, PCRE
(?<!\bcat)(?:\W+|^)(\w+)

:
: .NET, Java, PCRE, Perl, Python, Ruby 1.9


JavaScript Ruby 1.8 ,
. , , , ,
,
JavaScript:
var subject = My cat is furry.,
main_regex = /\b\w+/g,
lookbehind = /\bcat\W+$/i,
lookbehind_type = false, //
matches = [],
match,
left_context;
while (match = main_regex.exec(subject)) {
left_context = subject.substring(0, match.index);
if (lookbehind_type == lookbehind.test(left_context)) {
matches.push(match[0]);
} else {
main_regex.lastIndex = match.index + 1;
}
}
// : [My,cat,furry]

5.6. ,

377

,

(? !\bcat\W+) . + , , , .NET. ,
, ,
.

+ {1,9} . .NET, Java


PCRE, , .
, .
.
, , , .NET.
.NET
, ,
, .
, ,
.
\W , , , ,
. , , ( ) , , ( ) .
, . , ,
1, ,
. , , 3.9.

378

5. ,


JavaScript ,
JavaScript , , , .
()
.

(? !\
bcat\W+)\b\w+ : \bcat\W+ , ( \b\w+ ). $ ,
. lookbehind ( /m), $
$(?!\s) , . lookbehind_type ,
: true , false
.

main_
regex() exec() (
3.11). , , , (left_context) lookbehind
. lookbehind , , . lookbehind_type, ,
.
. , matches. , , ( main_regex.lastIndex), main_regex, exec .
! !
lastIndex, , /g (global ). lastIndex . , , -

5.7.

379

, .
, .
, . . -
, .

,
cat ( cat , ), :
(?<=\bcat\W+)\b\w+

:
: .NET
(?<=\bcat\W{1,9})\b\w+

:
: .NET, Java, PCRE
(?<=\bcat)(?:\W+|^)(\w+)

:
: .NET, Java, PCRE, Perl, Python, Ruby 1.9

.
2.16, ( , ).
5.4 5.5.

5.7.

NEAR , . , , : ,
, NOT OR, -

380

5. ,

NEAR. word1 NEAR


word2 word1 word2, ,
, .

,
, word1,
word2, , ,
.
, :
\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\b(?:
word1
\W+ (?:\w+\W+){0,5}?
word2
|
word2
\W+ (?:\w+\W+){0,5}?
word1
)\b

#
#
#
#
#
#
#




...


: ,

: .NET, Java, PCRE, Perl, Python, Ruby
. . JavaScript , . 3.5 3.7 , .

, , . word1 word2,

381

5.7.

.
word1 word2, .

{0,5}?.
, . word1 word2 word2, word1 word2,
() .
, 0 5
. , {1,15}? , , 15 .

, ,
( \w \W , ),
, (, ).


.
, ,
.
. , .
, ,
, . , .
, word1 word2
. ,
; , word2:
\b(?:word1|(word2))\W+(?:\w+\W+){0,5}?(?(1)word1|word2)\b

:
: .NET, PCRE, Perl, Python

382

5. ,

,
, , :
\b(?:(?<w1>word1)|(?<w2>word2))\W+(?:\w+\W+){0,5}?(?(w2)(?&w1)|(?&w2))\b

:
: PCRE 7, Perl 5.10

word1 word2
, (?<name>...) . (?&name) ,
,
. , . , \k<name>
(.NET, PCRE 7, Perl 5.10) (?P=name) (PCRE 4 , Perl 5.10,
Python), , .
, (?&name) , , .
, , . , , , , . , , , .

,

.
, ,
. ,
. , ,
?
(. 5.2). n!, 1 n (n ). 24 . 10 , . , ,
, .
. , , ( ,

383

5.7.

), ,
. , ,
:
\b(?:(?>(word1)|(word2)|(word3)|(?(1)|(?(2)|(?(3)|(?!))))\w+)\b\W*?){3,8}
(?(1)(?(2)(?(3)|(?!))|(?!))|(?!))

:
: .NET, PCRE, Perl

:
[ 12, 21 ]
= 2
:
[ 123, 132,
213, 231,
312, 321 ]
= 6
:
[ 1234, 1243, 1324, 1342, 1423, 1432,
2134, 2143, 2314, 2341, 2413, 2432,
3124, 3142, 3214, 3241, 3412, 3421,
4123, 4132, 4213, 4231, 4312, 4321 ]
= 24
:
2! = 2 1
3! = 3 2 1
4! = 4 3 2 1
5! = 5 4 3 2 1
...
10! = 10 9 8 7 6 5 4 3 2 1

=
=
=
=

2
6
24
120

= 3628800

. 5.2.

, ( 2.14) Python:
\b(?:(?:(word1)|(word2)|(word3)|(?(1)|(?(2)|(?(3)|(?!))))\w+)\b\W*?){3,8}
(?(1)(?(2)(?(3)|(?!))|(?!))|(?!))

:
: .NET, PCRE, Perl, Python

{3,8} , , -

384

5. ,

. , (?!) , , ,
. .
, \w+ , , . , , ,
.

, , ,
,
, , .
. ,
. , .
, , , ,
Java Ruby (
).
, , . ,
.
\b(?:(?>word1()|word2()|word3()|(?>\1|\2|\3)\w+)\b\W*?){3,8}\1\2\3

:
: .NET, Java, PCRE, Perl, Ruby
\b(?:(?:word1()|word2()|word3()|(?:\1|\2|\3)\w+)\b\W*?){3,8}\1\2\3

:
: .NET, Java, PCRE, Perl, Python, Ruby

. , , ,
:

385

5.7.
\b(?:(?>word1()|word2()|word3()|word4()|
(?>\1|\2|\3|\4)\w+)\b\W*?){4,9}\1\2\3\4

:
: .NET, Java, PCRE, Perl, Ruby
\b(?:(?:word1()|word2()|word3()|word4()|
(?:\1|\2|\3|\4)\w+)\b\W*?){4,9}\1\2\3\4

:
: .NET, Java, PCRE, Perl, Python, Ruby
. , \1 , , , , ,
. , ,
, .

(? \1|\2|\3)
\w+ , . , , .

Python ,
, Python ,
. , . ,

, .
JavaScript
. JavaScript
, Python,
,
, . ,
, . JavaScript , -

386

5. ,

, , . : , ,
, ,
, .
JavaScript , , , ((a)|(b))+ . , , , , . , (?:(a)|(b))+ ab,
a. JavaScript,
, . (?:(a)|(b))+ ab,
1 , , JavaScript
undefined, , RegExp.
prototype.exec().

JavaScript ,
, .


, , , .

,
, , .
\A(?=.*?\bword1\b)(?=.*?\bword2\b).*\Z

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

387

5.8.
^(?=[\s\S]*?\bword1\b)(?=[\s\S]*?\bword2\b)[\s\S]*$

: ( ^ $ )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
.
. JavaScript
, JavaScript \A \Z .

, 3.6. word1
word2 , . , , . , \A(?=.*?\bword1\b)(?=.*?\bword2\b)(?=.*?\
bword3\b).*\Z .

.
5.5 5.6.

5.8.

,
. , The the.
, ,
.

, , :
\b([A-Z]+)\s+\1\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
, 1. -

388

5. ,

, ,
(, HTML),
. 3.15 , ,
.
,
, , 3.7, , .
, grep, ,

1, , , , .

, ,
.
, , . , . (\w)\1 \w{2} . , .
2.10.

. , ,
, A
Z a z ( ). , , Letter ( \p{L} ), ( . 72).

\s+ , , ,
, .
, , ( ), \s+ [^\S\r\n] .
,
. PCRE 7 Perl 5.10 \h , -

5.9.

389

, ,
, .
, ,
, this thistle.
, ,
. ,
, that that
had had. , , (
oink oink ha ha) . .

.
2.10, .
5.9, , .

5.9.

,
- . , .

( uniq
UNIX Get-Unique PowerShell Windows), , . , , , .
, , 1 . ( , 1

. . .

390

5. ,

,
) , .

1:

, ,
, ,
.

.
, :
^(.*)(?:(?:\r?\n|\r)\1)+$

: ^ $ ( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
$1

: .NET, Java, JavaScript, Perl, PHP


\1

: Python, Ruby
,
, ( ). ,
. 3.15 ,
.

2:

, , , , :
^([^\r\n]*)(?:\r?\n|\r)(?=.*^\1$)

5.9.

391

: , ^ $
: .NET, Java, PCRE, Perl, Python, Ruby
JavaScript,
:
^(.*)(?:\r?\n|\r)(?=[\s\S]*^\1$)

: ^ $ ( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
( , .)
:

3:

, .
, :
^([^\r\n]*)$(.*?)(?:(?:\r?\n|\r)\1$)+

: ^ $ ( )
: .NET, Java, PCRE, Perl, Python, Ruby
JavaScript, -
, .
^(.*)$([\s\S]*?)(?:(?:\r?\n|\r)\1$)+

: ^ $ ( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
$1$2

: .NET, Java, JavaScript, Perl, PHP


\1\2

: Python, Ruby

392

5. ,

1 2,

.
, , .
.

1:

, , . , . .

-, ( ^ ) .
,
^ $ ( 3.4 , ). .*
( ), 1. , , , .

(?:\r?\n|\r) , , Windows ( \r\n ), UNIX/Linux/OS X ( \


n ) Mac OS ( \r ). \1
,
.
,
. , + ( 1), .

, , . , ,
,
.
, ( -

393

5.9.

). 1, .

2:

1,
,
. -,
, JavaScript, [^\r\n] (
, ) . , ,
, . -, , , .
,
(
), . ,
.

3:

,
( , ,
, ),
3 2. ,
(
2), , , ,
.
1, ( )
2. 1 2 ,
, - .
. -,
,
, , , , . -,
, ,

394

5. ,


,
.
.

,
,
. , , :
value1
value2
value2
value3
value3
value1
value2

.
. 5.1.
5.1.

value1

value1

value1

value1

value2

value2

value2

value2

value2

value2

value3

value3

value3

value3

value2

value3

value3

value1

value2

value2
/

.
2.10, .
5.8, , .

395

5.10. ,

5.10. ,

,
ninja .

^.*\bninja\b.*$

: , ^ $
( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


.
, ninja, \bninja\b .
2.6, ,
, ,
ninja,
.


, .* .
. ,
. ninja, , , .

,
, ,
. , $ , . .
, , ,
. ,
, , , -

396

5. ,

,
- .
, ,
(
^ $ , ), . , , ^ $ , , , . 3.4
, . JavaScript Ruby ,
, , JavaScript , , Ruby
.

, ,
:
^.*\b(one|two|three)\b.*$

: , ^ $
( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
: one, two three. ,
, . -, , -,
1 , . ,
, .
, , , ,
, . , ^.*?\b(one|two|three)\b.*$ , 1 , .

,
, :
^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).+$

: , ^ $
( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5.11. ,

397

,
, . .+ ,
, .

.
5.11, , , .

5.11. ,

, ninja.

^(?:(?!\bninja\b).)*$

: , ^ $
( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
, ( 2.16). , .
\
bninja\b . ^ $ , , .

, , ,
. , ^ $
, , , .

398

5. ,

,
, ninja.

. , , , . 3.21, .

.
5.10, , , .

5.12.

,
, , :
^\s+

: ( ^ $
)
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\s+$

: ( ^ $
)
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, . , 3.14.
,
.

399

5.12.

, . : , ( ^
$ , ), ,
( \s ), ,
( + ).

, trim() strip(), . . 5.2 ,



.
5.2.

()

C#, VB.NET

String.Trim([chars])

Java

string.trim()

PHP

trim($string)

Python, Ruby

string.strip()

JavaScript Perl , :
Perl:
sub trim {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}

JavaScript:
function trim (string) {
return string.replace(/^\s+/, ).replace(/\s+$/, );
}
// , ,
// trim :
String.prototype.trim = function () {

400

5. ,
return this.replace(/^\s+/, ).replace(/\s+$/, );
};
, Perl JavaScript, \s ,
, , , ,
.

, . ( ) , , .
, .
JavaScript, JavaScript , , , [\s\S] .
.

string.replace(/^\s+|\s+$/g, );
, , . ( 2.8) /g (global ),
,
(
). , , .
string.replace(/^\s*([\s\S]*?)\s*$/, $1)

( ) 1. 1 .
,
, .
,

401

5.12.

, [\s\S] . , ( \s*$ ).
, - ,
.

string.replace(/^\s*([\s\S]*\S)?\s*$/, $1)
, ,
, .
, \S . , -
,
, ? .

, .
[\s\S]* ,
.
, ,
\S
,
.
, , , . , .

string.replace(/^\s*(\S*(?:\s+\S+)*)\s*$/, $1)
,
. , , . , ,
, , ,
, .
, , ,
.

402

5. ,


, , , , . ,
,
,
.

.
5.13.

5.13.

,
. , .

,
.
3.14 ,
.


\s+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


[\t]+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


. -

403

5.14.

, HTML
( ),
- .


( , , ) . + ,
( \s ), , , , . +
{2,} ,
.
, , ,
, . , .


, , , .
.

.
5.12.

5.14.

,
. , , , .

,
-

404

5. ,

, . , ,
JavaScript, ,
( . 5.3). ,
,
, .


. 5.3 ,
.
5.3. ,

C#, VB.NET

Regex.Escape(str)

Java

Pattern.quote(str)

Perl

quotemeta(str)

PHP

preg_quote(str, [delimiter])

Python

re.escape(str)

Ruby

Regexp.escape(str)

JavaScript
, .


, ,
, ( ).
,
.
3.15 , , . , :
[[\]{}()*+?.\\|^$\-,&#\s]

5.14.

405

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


. , , . 2.19.
\$&

: .NET, JavaScript
\\$&

: Perl
\\$0

: Java, PHP
\\\0

: PHP, Ruby
\\\&

: Ruby
\\\g<0>

: Python

JavaScript
, , RegExp.escape() JavaScript:
RegExp.escape = function (str) {
return str.replace(/[[\]{}()*+?.\\|^$\-,&#\s]/g, \\$&);
};
// ...
var str = Hello.World?;
var escaped_str = RegExp.escape(str);
alert(escaped_str == Hello\\.World\\?); // -> true

406

5. ,


. , . , :
[] {} ()
, <[> <]>, . , <{> <}>, ,
, . , <(> <)>, , .
* + ?
,
,
, .
(
Perl 5.10 PCRE 7).
. \ |
,

, .
^ $
, . ,
.
, , .
.
. , , . ,

,
, .

5.14.

407

,
, <{1,5}>. ,
, ( ) ,
, .
&
,
Java , , . , ,
.
#
(
<\s>) , . .

, ( $& ,
\& , $0 , \0 \g 0 ) . Perl
$& ,
.
$& -
Perl, ,
. $& $1.

2.1
. 50, , \Q...\E . Java, PCRE Perl, .
, \E , . , ,
.

408

5. ,

.
2.1, ,
. , , , , ,
,
, .


. 56 , , , 5 6. , ,
\d ( 2.3). , .
, 56 , ,
, :-) , \p{P}{3} .


, ,
,
, , :
1 100?. . , ,
. ,
, ,
.

6.1.

,
.

410

6.

:
\b[0-9]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
\A[0-9]+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby
^[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

:
(?<=^|\s)[0-9]+(?=$|\s)

:
: .NET, Java, PCRE, Perl, Python, Ruby 1.9

. :
(^|\s)([0-9]+)(?=$|\s)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
[+-]?\b[0-9]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, ,
:
\A[+-]?[0-9]+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby

411

6.1.
^[+-]?[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
.
, :
([+-]*)?\b[0-9]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


. ( 2.3) ( 2.12): [0-9]+ .


[0-9] \d.
.NET Perl \d
, [0-9] 10 ASCII. -

, , ASCII, \d [0-9] .

, , ASCII, ,

, , \d [0-9] . , , ,

, ,
ASCII. ,
, ,
.

,
. , A4 , .
, , .
, , , . -

412

6.

\A \Z , . , JavaScript . JavaScript
^ $ ,
/m,
. Ruby , .

, ( 2.6). , , . ,
4 4 A4. 4\b
, 4 - . \b4 \b4\b A4, \b A 4.
, .

,
, ,
,
. +4
+4B, \b\+4\b \+4\b . +4, ,
. \b\+4\b +4
3+4, 3 , + .

\+4\b . \b \+\b4\b .

\b + 4 . \b , . \+?\b4\b 4 A4, \+?4\b
.

.
$123,456.78.
\b[0-9]+\b , 123, 456 78. ,
, .
, , .

413

6.2.

, ,
. (?=$|\s)
( ). (? =^|\s) . \s
, ,
. 2.16.

JavaScript Ruby 1.8


. ,
, .
, , .
, , , .
, .

.
2.3 2.12.

6.2.


,
.

x :
\b[0-9A-F]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\b[0-9A-Fa-f]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

414

6.

, :
\A[0-9A-F]+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby
^[0-9A-F]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
x 0x:
\b0x[0-9A-F]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
x &H:
&H[0-9A-F]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
x , H:
\b[0-9A-F]+H\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 8- :
\b[0-9A-F]{2}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 16- :
\b[0-9A-F]{4}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 32-
:
\b[0-9A-F]{8}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 64- :

415

6.2.
\b[0-9A-F]{16}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
(
):
\b(?:[0-9A-F]{2})+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
. ,
, , A F. ,
. , ,
.
.
[0-9a-f]
, [0-9A-F] .
, [0-9a-fA-F] . , , 3.4.

. .

, , .
, A-F a-f .

(?:[0-9A-F]{2})+ . [0-9A-F]{2}
. (?:[0-9A-F]{2})+ .
( 2.9) ,
+ -

416

6.

{2} . [0-9A-F]{2}+ Java, PCRE Perl 5.10, , . + {2} .


, {2} .

, , ,
. , ,
. , 10 9 11 F 11.
( 2.6).
. , , &H, . , .
, , .
, , , . \A \Z ,
. , JavaScript .
JavaScript ^ $ ,
/m,
. Ruby ,
.

.
2.3 2.12.

6.3.


,
.

6.3.

417

:
\b[01]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
\A[01]+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby
^[01]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, B:
\b[01]+B\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 8- :
\b[01]{8}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, 16- :
\b[01]{16}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
( , 8):
\b(?:[01]{8})+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
. ,
0 1. -

418

6.

, :
[01] .

.
2.3 2.12.

6.4.

, , ,
.


\b0*([1-9][0-9]*|0)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


$1

: .NET, Java, JavaScript, PHP, Perl


\1

: PHP, Python, Ruby

Perl
while ($subject =~ m/\b0*([1-9][0-9]*|0)\b/g) {
push(@list, $1);
}

PHP
$result = preg_replace(/\b0*([1-9][0-9]*|0)\b/, $1, $subject);


. 0* , . [1-9][0-9]*
, , -

6.5.

419

.
, . , , ,
6.1.
,
,
, 3.11.
( ) , 3.9. ,
Perl.

. .
( ) , . , PHP.

.
3.15 6.1.

6.5.

, . , , .

1 12 ( )
^(1[0-2]|[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 24 ():
^(2[0-4]|1[0-9]|[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

420

6.

1 31 ( ):
^(3[01]|[12][0-9]|[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 53 ( ):
^(5[0-3]|[1-4][0-9]|[1-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 59 ( ):
^[1-5]?[0-9]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 100 ():
^(100|[1-9]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 100:
^(100|[1-9][0-9]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
32 126 ( ASCII):
^(12[0-6]|1[01][0-9]|[4-9][0-9]|3[2-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 127 ( ):
^(12[0-7]|1[01][0-9]|[1-9]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
128 127 ( ):
^(12[0-7]|1[01][0-9]|[1-9]?[0-9]|-(12[0-8]|1[01][0-9]|[1-9]?[0-9]))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.5.

421

0 255 ( ):
^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 366 ( ):
^(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|[1-9][0-9]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1900 2099 ():
^(19|20)[0-9]{2}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 32767 ( ):
^(3276[0-7]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|
[1-9][0-9]{1,3}|[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
32768 32767 ( ):
^(3276[0-7]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|
[1-9][0-9]{1,3}|[0-9]|-(3276[0-8]|327[0-5][0-9]|32[0-6][0-9]{2}|
3[01][0-9]{3}|[12][0-9]{4}|[1-9][0-9]{1,3}|[0-9]))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 65535 ( ):
^(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|
[1-9][0-9]{1,3}|[0-9])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
. .
.

422

6.

, 0 255, . [0-255] . , , ,
, 0 255. , [0125] , , 0, 1, 2 5.


, , , ,
, . 6.1 ,
. . , , , .

.
,
,
. ( 2.3) ( 2.8).
, [0-5] . , 0 9
ASCII .
[0-5] , ,
[j-o] [\x09-\x0E]
.

,
. .
, 12 24, . ,
1 12, .
, ,
, . 40 59 . 44 55
.

423

6.5.

,
, 40 59.
, .
,
.
[45][0-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
40 59 . . 4 5. [45]
. 10 .
[0-9] .

[0-9] \d . [0-9] , . , Java,


.

44 55 , . 4 5.
4, 4 9,
44 49. 5, 0 5, 50 55.
:
4[4-9]|5[0-5]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, 4[4-9] 5[0-5] . , , (4[4-9])|(5[0-5) .


, . 34 65
.
3 6. 3, -

424

6.

4 9. 4 5,
. 6, 0 5:
3[4-9]|[45][0-9]|6[0-5]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, , , ,
.
1 12 . 1 9, , 10 12, .
, :
1[0-2]|[1-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

. , . ,
. ,
12, 1[0-2]|[1-9] 12, [1-9]|1[0-2] 1 . [1-9] .
1 ,
1[0-2] , .

85 117 ,
. 85 99 ,
100 117 . , . , : 8, 5 9.
9, . , : 1. 0, . 1, 0 7. : 85 89, 90 99, 100 109 110 117. ,

425

6.5.



POSIX- . . , ,
, ,
POSIX. , [1-9]|1[0-2]
1 12.


. . ,
^([1-9]|1[0-2])$ ^(1[0-2]|[1-9])$ , , , 12 12,
POSIX- . ,
, .
,
. 20 1.

, , :
8[5-9]|9[0-9]|10[0-9]|11[0-7]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, ,
. ,
, , .
,
. , , 0 65535,
:
6553[0-5]|655[0-2][0-9]|65[0-4][0-9][0-9]|6[0-4][0-9][0-9][0-9]|
[1-5][0-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9]|
[1-9][0-9]|[0-9]

426

6.

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
,
. ,
(, , 6),
. , . ,
.

, .
, . 2.12.
6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|
[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]|[0-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]
. . [1-9][0-9]{1,3} .

6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|
[1-9][0-9]{1,3}|[0-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
.
, 6 :
6(?:553[0-5]|55[0-2][0-9]|5[0-4][0-9]{2}|[0-4][0-9]{3})|[1-5][0-9]{4}|
[1-9][0-9]{1,3}|[0-9]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
, 6 . -

6.6.

427

, .
.

.
2.8, 4.12 6.1.

6.6.

, . , , .

1 C ( 1 12: ):
^[1-9a-c]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 18 ( 1 24: ):
^(1[0-8]|[1-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 1F ( 1 31: ):
^(1[0-9a-f]|[1-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 35 ( 1 53: ):
^(3[0-5]|[12][0-9a-f]|[1-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 3B ( 0 59: ):
^(3[0-9a-b]|[12]?[0-9a-f])$

428

6.

: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


0 64 ( 0 100: ):
^(6[0-4]|[1-5]?[0-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 64 ( 1 100):
^(6[0-4]|[1-5][0-9a-f]|[1-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
20 7E ( 32 126: ASCII):
^(7[0-9a-e]|[2-6][0-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 7F ( 0 127: 7- ):
^[1-7]?[0-9a-f]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 FF ( 0 255: 8- ):
^[1-9a-f]?[0-9a-f]$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1 16E ( 1 366: ):
^(16[0-9a-e]|1[0-5][0-9a-f]|[1-9a-f][0-9a-f]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
76C 833 ( 1900 2099: ):
^(83[0-3]|8[0-2][0-9a-f]|7[7-9a-f][0-9a-f]|76[c-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
0 7FFF: ( 0 32767: 15- ):
^([1-7][0-9a-f]{3}|[1-9a-f][0-9a-f]{1,2}|[0-9a-f])$

429

6.7.

: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


0 FFFF: ( 0 65535: 16- ):
^([1-9a-f][0-9a-f]{1,3}|[0-9a-f])$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

. , ,
.
.
ASCII , ,
[0-F] , .
,
, ASCII
.
: [0-9A-F] .

.
. [0-9A-F] , [0-9a-f] . [0-9A-Fa-f] .


. .
, 3.4.

.
2.8 6.2.

6.7.
, , , , -

430

6.

. , , , 3.12.

: , , :
^[-+][0-9]+\.[0-9]+[eE][-+]?[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
: , , :
^[-+][0-9]+\.[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, , :
^[-+]?[0-9]+\.[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
: , , :
^[-+]?[0-9]*\.[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, . , .
, . .
^[-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, . , .
, . .
^[-+]?([0-9]+(\.[0-9]*)?|\.[0-9]+)$

431

6.7.

: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


, . , .
, . .
^[-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+)([eE][-+]?[0-9]+)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, . , .
, . .
^[-+]?([0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][-+]?[0-9]+)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
:
[-+]?(\b[0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][-+]?[0-9]+\b)?

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
( 2.5), ,
,
. ,
, 6.1.
, ,
:
. ( 2.3) ,
e. + ? ( 2.12) .

.
, ,
. + *
.

, , .
, , , -

432

6.

. [-+]?[0-9]*\.?[09]* ,
, ,

.
123abc456, {$&} , {123}{}a{}b{}c{456}{}. 123 456, .

, , ,
, .
.
, ,
, . , , 123. , , , . , , .
, , , ( 2.8)
( 2.9). [0-9]+(\.[0-9]+)?
. \.[0-9]+ .

[0-9]+(\.[0-9]+)?|\.[0-9]+
. ,
, .
.
, , .

[0-9]+(\.[0-9]+)?|\.[0-9]+ , . , [0-9]+(\.[0-9]*)?|\.[0-9]+ .
- ? , . , . + ( ) *
( ).

6.8.

433

, .
.
, . , , . , , . , , .

.
2.3, 2.8, 2.9 2.12.

6.8.

, , .

:
^[0-9]{1,3}(,[0-9]{3})*\.[0-9]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
.
, .
^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
.
, .
^([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\.[0-9]+)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

434

6.

,
:
\b[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?\b|\.[0-9]+\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, ,
, .
,
[0-9]+ [0-9]{1,3}(,[0-9]{3})* .
1 3 ,
, .

[0-9]{0,3}(,[0-9]{3})* ,
,
, , ,123. , . , , , . .
, , .
, ,
, .

.
2.3, 2.9 2.12.

6.9.

,
IV, XIII MVIII.

:
^[MDCLXVI]+$

435

6.9.

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
^(?=[MDCLXVI])M*(C[MD]|D?C*)(X[CL]|L?X*)(I[XV]|V?I*)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
^(?=[MDCLXVI])M*D?C{0,4}L?X{0,4}V?I{0,4}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

M, D, C, L, X, V I,
1000, 500, 100, 50, 10, 5 1 .
, ,
, .
( )
. . , 4
IV, IIII.
, .

. 1999
MCMXCIX, M 1000, CM 900, XC 90 IX 9.
: M , M* .

10 , .
C[MD] CM CD,
900 400. D?C{0,3} DCCC, DCC,
DC, D, CCC, CC, C , 800, 700, 600, 500,
300, 200, 100 . 10 .

436

6.

X[CL]|L?X{0,3} ,
I[XV]|V?I{0,3} .
, .

, .
, . . , , ,
. .
(?=[MDCLXVI]) . ,
2.16,
. ,
.

.
, IIII, IV.
, , - .
4 IIII, IV.
.
( 2.5), ,
, .
,
^ $ \b .


Perl
, , . [MDLV]|C[MD]?|X[CL]?|I[XV]? ,
:

sub roman2decimal {
my $roman = shift;
if ($roman =~
m/^(?=[MDCLXVI])
(M*)
(C[MD]|D?C{0,3})
(X[CL]|L?X{0,3})

# 1000
# 100
# 10

437

6.9.
(I[XV]|V?I{0,3})
$/ix)

# 1

{
#
my %r2d = (I => 1, IV => 4, V => 5, IX => 9,
X => 10, XL => 40, L => 50, XC => 90,
C => 100, CD => 400, D => 500, CM => 900,
M => 1000);
my $decimal = 0;
while ($roman =~ m/[MDLV]|C[MD]?|X[CL]?|I[XV]?/ig) {
$decimal += $r2d{uc($&)};
}
return $decimal;
} else {
#
return 0;
}
}

.
2.3, 2.8, 2.9, 2.12, 2.16, 3.9 3.11.

URL,
, ,
, , ,
:
URL, URN

IP-

Windows
URL ,
, (World Wide
Web). , ,
.

7.1. URL

, URL, .

URL:
^(https?|ftp|file)://.+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

439

7.1. URL
\A(https?|ftp|file)://.+\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby

:
\A
(https?|ftp)://
[a-z0-9-]+(\.[a-z0-9-]+)+
([/?].*)?
\Z

#
#
#
#
#

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(https?|ftp)://[a-z0-9-]+(\.[a-z0-9-]+)+
([/?].+)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

. (http ftp),
(www ftp):
\A
((https?|ftp)://|(www|ftp)\.)
[a-z0-9-]+(\.[a-z0-9-]+)+
([/?].*)?
\Z

#
#
#
#
#

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^((https?|ftp)://|(www|ftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
, .
, :
\A
(https?|ftp)://
[a-z0-9-]+(\.[a-z0-9-]+)+
(/[\w-]+)*
/[\w-]+\.(gif|png|jpg)
\Z

#
#
#
#
#
#

: ,

440

7. URL,

: .NET, Java, PCRE, Perl, Python, Ruby


^(https?|ftp)://[a-z0-9-]+(\.[a-z0-9-]+)+(/[\w-]+)*/[\w-]+\.(gif|png|jpg)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

, URL - URL. ,
URL
.
URL , , .
URL , .
URL, -.
URL :
scheme://user:password@domain.name:80/path/file.ext?param=value&param2
=value2#fragment

. URL
file: . URL http:
.
, URL , -: http, https, ftp file. ^ ( 2.5). ( 2.8) . https? http|https .


, http file, , . .+$ , , , .

( 2.4) , . Ruby . Ruby , \A \Z ( 2.5). ,


, Ruby. -

441

7.1. URL

, URL,
.
( 2.18) . , , ,
. JavaScript
.
URL
FTP , HTTP FTP , .
ASCII.
(IDN) .
, . ( 2.3), .
, . ( , , Perl .)

. .* , . ,
, [/?].* ,
? ( 2.12).

, ,
URL. URL
.
- URL, , , . , www.regexbuddy.com
http://www.regexbuddy.com. URL,
, www. ftp..

(https?|ftp)://|(www|ftp)\. .
, . https? ftp ,
:// . www ftp , . -

442

7. URL,

,
.
URL ,
, ASCII,
GIF, PNG JPEG.
, . , , \w ( 2.3).

? , . .
. , .
404, . , URL.

.
2.3, 2.8, 2.9 2.12.

7.2. URL

URL . URL
, ,
.

URL :
\b(https?|ftp|file)://\S+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
URL :
\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*
[A-Z0-9+&@#/%=~_|$]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
URL . , www ftp:

443

7.2. URL
\b((https?|ftp|file)://|(www|ftp)\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*
[A-Z0-9+&@#/%=~_|$]

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

:
Visit http://www.somesite.com/page, where you will find more information.

URL?
http://www.somesite.com/page, : URL. , (%20). . - .
, , URL, :
http://www.somesite.com/page, where you will find more information.

, , ,
URL, , URL . \S , , . , , S , \S , \s .
.
2.3.

.
, URL . , URL, , .

\S
.
, .
,
URL. (-

444

7. URL,

2.12), URL . ,
, URL ,
. , .
3.4.

URL, , . , ,
URL.
- URL, , ,
. , www.regexbuddy.com http://www.regexbuddy.com. URL, ,
www. ftp..

(https?|ftp)://|(www|ftp)\.
. , , , . https? ftp , :// . www ftp ,
. ,
, .

.
2.3 2.6.

7.3. URL,

URL . URL
,
, URL. URL ,
,
URL.

445

7.3. URL,

\b(?:(?:https?|ftp|file)://|(www|ftp)\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*
[-A-Z0-9+&@#/%=~_|$]
|(?:(?:https?|ftp|file)://|(www|ftp)\.)[^\r\n]+
|(?:(?:https?|ftp|file)://|(www|ftp)\.)[^\r\n]+

: ,
, ,

: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL
, URL.

, ,
.
,
, URL . , , URL
. URL , : https?|ftp|file www|ftp .
, , URL.

. , URL URL. URL . URL .


, ,
, , URL.
,
URL HTML XHTML. , -, , URL.

.
2.8 2.9.

446

7. URL,

7.4. URL,

URL . URL , , URL. URL, ,


, URL.

\b(?:(?:https?|ftp|file)://|www\.|ftp\.)
(?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#/%=~_|$?!:,.])*
(?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[A-Z0-9+&@#/%=~_|$])

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
\b(?:(?:https?|ftp|file)://|www\.|ftp\.)(?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)
|[-A-Z0-9+&@#/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|
[A-Z0-9+&@#/%=~_|$])

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL ,
.
, , .
- :
http://en.wikipedia.org/wiki/PC_Tools_(Central_Point_Software)
http://msdn.microsoft.com/en-us/library/aa752574(VS.85).aspx

, URL . , URL. , , URL, , URL, :


RegexBuddys web site (at http://www.regexbuddy.com) is really cool.

, , , -

447

7.4. URL,

.
,
URL, . URL
Microsoft .
, ,
. , .
, ,
7.2. : , URL, , URL
, URL, (
). 7.2 ,
URL .
. :
[-A-Z0-9+&@#/%=~_|$?!:,.]


\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#/%=~_|$?!:,.]

:
[A-Z0-9+&@#/%=~_|$]


\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[A-Z0-9+&@#/%=~_|$]

( 2.8).

,
( 2.9).

\([-A-Z0-9+&@#
/%=~_|$?!:,.]* \) , .
, URL,
.

, URL
, .
, URL, ,

448

7. URL,

URL URL,
, , , .
, URL, .
URL. , . , .
, ,
(ab*c|d)* , a c ,
b d .
(ab*c|d*)* .
,
d , * d . d ,
. (d*)* dddd . , , .
,
2-1-1, 1-2-1 1-1-2.
, 2-2, 1-3 3-1. ,
.
, 2.15.
, , , - ,
URL, -
, .

.
2.8 2.9.

7.5. URL

, URL. , HTML- a , URL. URL ,


, .

449

7.6. URN

URL ,
7.2 7.4. :
<ahref=$&>$&</a>

: .NET, JavaScript, Perl


<ahref=$0>$0</a>

: .NET, Java, PHP


<ahref=\0>\0</a>

: PHP, Ruby
<ahref=\&>\&</a>

: Ruby
<ahref=\g<0>>\g<0></a>

: Python
, 3.15.

. , URL,

ahref=URL URL /a , URL URL. , . ,
, . 2.20.

.
2.21, 3.15, 7.2 7.4.

7.6. URN

, (Uniform Resource Name, URN),


RFC 2141, URN .

450

7. URL,

, :
\Aurn:
#
[a-z0-9][a-z0-9-]{0,31}:
#
[a-z0-9()+,\-.:=@;$_!*%/?#]+
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^urn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\-.:=@;$_!*%/?#]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
URN :
\burn:
#
[a-z0-9][a-z0-9-]{0,31}:
#
[a-z0-9()+,\-.:=@;$_!*%/?#]+

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
\burn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\-.:=@;$_!*%/?#]+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
URN , ,
URN (), URN:
\burn:
#
[a-z0-9][a-z0-9-]{0,31}:
#
[a-z0-9()+,\-.:=@;$_!*%/?#]*[a-z0-9+=@$/]

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
\burn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\-.:=@;$_!*%/?#]*[a-z0-9+=@$/]

451

7.6. URN

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URN .
urn:, .
(Namespace
Identifier, NID). 1 32 . . ,
. ( 2.3): , 0 31 , . URN ,
.
URN c
(Namespace Specific String, NSS). .
. , , ( 2.12).
,
URN, ,
. ^ $
, Ruby, \A \Z , JavaScript. 2.5.

, URN . , URL 7.2,


URN. , :
The URN is urn:nid:nss, isnt it?

,
URN. URN, , , , , , , URN. , , RFC
2141. URN , NSS,
,
, URN.

452

7. URL,

, ( ) ( )
, . , , , NSS , , ,
, .

.
2.3 2.12.

7.7. URL

, URL RFC 3986.

\A
(#
[a-z][a-z0-9+\-.]*:
(#
//
([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+
|\[[a-f0-9:.]+\]
|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])

#
#
#
#
#
#
#


IP- IPv6
IP-
IPv...

(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
|#
(/?[a-z0-9\-._~%!$&()*+,;=:@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)?
)
|# URL ( )
(#
[a-z0-9\-._~%!$&()*+,;=@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
|#
(/[a-z0-9\-._~%!$&()*+,;=:@]+)+/?
)
)
#
(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
#
(\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

453

7.7. URL

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
\A
(#
(?<scheme>[a-z][a-z0-9+\-.]*):
(#
//
(?<user>[a-z0-9\-._~%!$&()*+,;=]+@)?
(?<host>[a-z0-9\-._~%]+
|
\[[a-f0-9:.]+\]
|
\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])

#
#
#
#
#
#
#
#


IP- IPv6
IP-

IPv...

(?<port>:[0-9]+)?
(?<path>(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)
|#
(?<path>/?[a-z0-9\-._~%!$&()*+,;=:@]+
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)?
)
|# URL ( )
(?<path>
#
[a-z0-9\-._~%!$&()*+,;=@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
|#
(/[a-z0-9\-._~%!$&()*+,;=:@]+)+/?
)
)
#
(?<query>\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
#
(?<fragment>\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

: ,

: .NET
\A
(#
(?<scheme>[a-z][a-z0-9+\-.]*):
(#
//
(?<user>[a-z0-9\-._~%!$&()*+,;=]+@)?
(?<host>[a-z0-9\-._~%]+
|
\[[a-f0-9:.]+\]
|
\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])

(?<port>:[0-9]+)?
(?<hostpath>(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)

#
#
#
#
#
#
#
#


IP- IPv6
IP-

IPv...

454

7. URL,
|#
(?<schemepath>/?[a-z0-9\-._~%!$&()*+,;=:@]+
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)?
)
|# URL ( )
(?<relpath>
#
[a-z0-9\-._~%!$&()*+,;=@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
|#
(/[a-z0-9\-._~%!$&()*+,;=:@]+)+/?
)
)
#
(?<query>\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
#
(?<fragment>\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

: ,

: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\A
(#
(?P<scheme>[a-z][a-z0-9+\-.]*):
(#
//
(?P<user>[a-z0-9\-._~%!$&()*+,;=]+@)?
(?P<host>[a-z0-9\-._~%]+
|
\[[a-f0-9:.]+\]
|
\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])

#
#
#
#
#
#
#
#


IP- IPv6
IP-

IPv...

(?P<port>:[0-9]+)?
(?P<hostpath>(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)
|#
(?P<schemepath>/?[a-z0-9\-._~%!$&()*+,;=:@]+
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)?
)
|# URL ( )
(?P<relpath>
#
[a-z0-9\-._~%!$&()*+,;=@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
|#
(/[a-z0-9\-._~%!$&()*+,;=:@]+)+/?
)
)
#
(?P<query>\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
#
(?P<fragment>\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

7.7. URL

455

: ,

: PCRE 4 , Perl 5.10, Python
^([a-z][a-z0-9+\-.]*:(\/\/([a-z0-9\-._~%!$&()*+,;=]+@)?([a-z0-9\-._~%]+|
\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])(:[0-9]+)?
(\/[a-z0-9\-._~%!$&()*+,;=:@]+)*\/?|(\/?[a-z0-9\-._~%!$&()*+,;=:@]+
(\/[a-z0-9\-._~%!$&()*+,;=:@]+)*\/?)?)|([a-z0-9\-._~%!$&()*+,;=@]+
(\/[a-z0-9\-._~%!$&()*+,;=:@]+)*\/?|(\/[a-z0-9\-._~%!$&()*+,;=:@]+)
+\/?))
(\?[a-z0-9\-._~%!$&()*+,;=:@\/?]*)?(#[a-z0-9\-._~%!$&()*+,;=:@\/?]*)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

,
URL, , ,
URL. ,
, URL , URL.

URL.
URL ,
, , , URL, URL .
URL,
. , , .
RFC 3986 , URL.
URL, URL URL ,
. RFC 3986 , , , .
. URL , . , URL.
RFC 3986 URL, . , - - URL , RFC 3986 , %20.

456

7. URL,

URL ,
http: ftp:. . , .
: [a-z][a-z0-9+\-.]* .

URL , RFC 3986 (authority).


IP- , .
, .
IP- @.
[a-z0-9\-._~%!$&()*+,;=]+@ .

RFC 3986 , . 7.15 , : , , . , RFC 3986


.
UTF-8,
, , , , %FF, FF .
, , . ,
URL. , [a-z0-9\-._~%]+ , IPv4 ( RFC 3986).

IPv4,
IPv6, , IP-, . IPv6
\[[a-f0-9:.]+\] , ,
, \[v[a-f0-9][a-z0-9\._~%!$&()*+,;=:]+\] . IP- , , IPv6.
,
URL. 7.17 , IPv6 .

, , , . :[0-9]+ ,
.

457

7.7. URL

, , . , ,
. , . . .
C (/[a-z0-9\._~%!$&()*+,;=:@]+)*/? .

URL , , .
, . , , : /?
[a-z0-9\-._~%!$&()*+,;=:@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/? .

URL , , , .
, . URL , .
,
. ,
URL, . [a-z0-9\-._~%!$&()*+,;=@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/? .
, , . , , , .
(/[a-z0-9\-._~%!$&()*+,;=:@]+)+/? .
, URL, , , , , .
.

. , . . ,
URL,
\?[a-z0-9\-._~%!$&()*+,;=:@/?]* . .
. , .

458

7. URL,

URL ,
.
URL. \#[a-z0-9\._~%!$&()*+,;=:@/?]* .

URL, . 2.11 ,
, . .NET
,
, .
,
URL , / .
, path, ,
URL / .
, ,
. , , . , ,
. .

.
2.3, 2.8, 2.9 2.12.

7.8. URL

, URL, . , http http://www.regexcookbook.com.

URL
^([a-z][a-z0-9+\-.]*):

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

459

7.8. URL

URL
\A
([a-z][a-z0-9+\-.]*):
(#
//
([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+
|\[[a-f0-9:.]+\]
|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])

#
#
#
#
#
#
#


IP- IPv6
IP-
IPv...

(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
|#
(/?[a-z0-9\-._~%!$&()*+,;=:@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)?
)
#
(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
#
(\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z][a-z0-9+\-.]*):(//([a-z0-9\-._~%!$&()*+,;=]+@)?([a-z0-9\-._~%]+|
\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?|(/?[a-z0-9\-._~%!$&()*+,;=:@]+
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?)?)(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
(#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL , , URL. URL URL. ( 2.5).


, , , , .
[a-z][a-z0-9+\-.]* ( 2.3).

URL . ,
, URL . URL . URL, -

460

7. URL,

RFC 3986, URL, , .


, , 7.7. ,
URL, .
( ), . , ( ) . 2.9. 3.9 , ,
, .
,
URL,
7.7. , URL,
. .
URL,
,
, . , ,
URL.

.
2.9, 3.9 7.7.

7.9. URL

, URL.
, jan ftp://jan@www.regexcookbook.com.


URL
^[a-z0-9+\-.]+://([a-z0-9\-._~%!$&()*+,;=]+)@

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

461

7.9. URL

URL
\A
[a-z][a-z0-9+\-.]*://
([a-z0-9\-._~%!$&()*+,;=]+)@
([a-z0-9\-._~%]+
|\[[a-f0-9:.]+\]
|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])
(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
(\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

#
#
#
#
#
#
#
#
#


IP- IPv6
IP- IPv...

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&()*+,;=]+)@([a-z0-9\-._~%]+|
\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
(#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

URL , ,
URL. , URL,
, URL .
@. @ , , URL,
@ URL.
, .
,
, , URL. [a-z0-9+\-.]+ ://.
, .
@, , . [a-z0-9\-._~%!$&()*+,;=] , .

462

7. URL,

,
URL .
URL, .
.
, -
URL ( ) . 2.9. 3.9 , , , .
,
URL, 7.7.
, URL, . , , URL, , .
. ,
, 7.8
URL,
,
, . , , URL.
, URL, , , 7.7. 7.7
( ) .
@.
@,
.

.
2.9, 3.9 7.7.

7.10. URL

, URL. , www.regexcookbook.com http://www.regexcookbook.com.

463

7.10. URL


URL
\A
[a-z][a-z0-9+\-.]*://
([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+
|\[[a-z0-9\-._~%!$&()*+,;=:]+\])

#
#
#
#


IP- IPv4
IP- IPv6+

: ,

: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&()*+,;=]+@)?([a-z0-9\-._~%]+|
\[[a-z0-9\-._~%!$&()*+,;=:]+\])

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL
\A
[a-z][a-z0-9+\-.]*://
([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+
|\[[a-f0-9:.]+\]
|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])
(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
(\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

#
#
#
#
#
#
#
#
#



IP- IPv6
IP- IPv...

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&()*+,;=]+@)?([a-z0-9\-._~%]+|
\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])(:[0-9]+)?
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
(#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

URL , , URL.

464

7. URL,

\A ^ , . [a-z][az0-9+\-.]*:// , ([a-z0-9\-._~%!$&()*+,;=]+@)?
. .

RFC 3986 . IPv4 ,


IPv6 IP- . , , .
,
, IPv4 . ( )
.

[a-z0-9\-._~%]+ IPv4. \[[a-z0-9\-._~%!$&()*+,;=:]+\]


IPv6 . ,
( 2.8), . , .

,
URL . ,
. , - URL . IPv6
. 2.9. 3.9 , ,
, .
,
URL,
7.7. , URL, . . , 7.9.
, , 7.7.
, , . , , URL.
,
URL, , ,

465

7.11. URL

7.7. 7.7
( ) .

.
2.9, 3.9 7.7.

7.11. URL

, URL. , 80 http://www.regexcookbook.com:80/.


URL
\A
[a-z][a-z0-9+\-.]*://
([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+
|\[[a-z0-9\-._~%!$&()*+,;=:]+\])
:(?<port>[0-9]+)

#
#
#
#
#


IPv4
IPv6+

: ,

: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&()*+,;=:]+\]):([0-9]+)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL
\A
[a-z][a-z0-9+\-.]*://
([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+
|\[[a-f0-9:.]+\]
|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]+\])
:([0-9]+)
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?
(\?[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
(\#[a-z0-9\-._~%!$&()*+,;=:@/?]*)?
\Z

#
#
#
#
#
#
#
#
#



IPv6
IP- IPv...

466

7. URL,

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^[a-z][a-z0-9+\-.]*:\/\/([a-z0-9\-._~%!$&()*+,;=]+@)?
([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&()*+,;=:]
+\]):([0-9]+)(\/[a-z0-9\-._~%!$&()*+,;=:@]+)*\/?
(\?[a-z0-9\-._~%!$&()*+,;=:@\/?]*)?(#[a-z0-9\-._~%!$&()*+,;=:@\/?]*)?$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

URL , ,
URL. \A ^ , . [a-z][a-z0-9+\-.]*:// , ([a-z0-9\-._~%!$&()*+,;=]+@)? . ([a-z0-9\-._~%]+|\[[a-z0-9\._~%!$&()*+,;=:]+\]) .

,
.
,
[0-9]+ .

,
URL . , ,
. , -
URL .
. 2.9. 3.9 , ,
, .
,
URL,
7.7. , URL, . . , 7.10.
, ,

7.12 URL

467

, , , .
3.
,
URL, , , 7.7. 7.7 ( ) .

.
2.9, 3.9 7.7.

7.12 URL

, URL. ,
/index.html http://www.regexcookbook.
com/index.html /index.html#fragment.

URL, , URL.
URL, , :
\A
# ,
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?
#
([a-z0-9\-._~%!$&()*+,;=:@/]*)

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?([a-z0-9\-._~%!$&()*+,;=:@/]*)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
URL, , URL.
URL, :
\A
# ,
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?

468

7. URL,
#
(/?[a-z0-9\-._~%!$&()*+,;=@]+(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?|/)
# , URL
([#?]|\Z)

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?(/?[a-z0-9\-._~%!$&()*+,;=@]+
(/[a-z0-9\-._~%!$&()*+,;=:@]+)*/?|/)([#?]|$)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
URL, , URL. URL,
, :
\A
# ,
(?>([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?)
#
([a-z0-9\-._~%!$&()*+,;=:@/]+)

: ,

: .NET, Java, PCRE, Perl, Ruby

, , , ,
URL. 7.7 , / URL, ,
URL .
\A
^, . [a-z]
[a-z0-9+\-.
]*: , //[^/?#]+ . ,
, , URL , . ( ), ( ) ( ). ,
, ( 2.3).

469

7.12 URL

, , :
(//[^/?#]+)? . URL.
, .
, ,
,
.

, URL, [a-z0-9\._~%!$&()*+,;=:@/]* , . , .

,
. , ,
URL. - , .
7.7
URL / . ,
, .
, . URL http://www.
regexcookbook.com, , . . , .
( ), .
, .
, . (, , 2.13.) , , , , : , (//[^/?#]+)? .
[a-z0-9\-._~%!$&()*+,;=:@/]+ //www.regexcookbook.
com, , , -

470

7. URL,

.
, , . , ,
http .
. , ,
URL,
. , URL .

, . ( 2.15) , , JavaScript Python. ,
. , , ,
, ,
, . ,
, .
. ,
,
null JavaScript.
,
URL, 7.7. .NET ,
.NET path ,
URL. , ,
: hostpath, schemepath
relpath. -,
, , . , .
,
7.7.
6, 7 8. ,

471

7.13. URL

, . JavaScript . , , JavaScript undefined.


3.9 , .

.
2.9, 3.9 7.7.

7.13. URL

, URL. , param=value http://www.regexcookbook.com?param=value /index.html?param=value.

^[^?#]+\?([^#]+)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL ,
,
URL. URL . , URL.
^[^?#]+\? .

, , . ^ ( 2.5),
^ ( 2.3).

URL () , . , , URL
URL, ,
\? ^[^?#]+\? .

URL, . URL .

472

7. URL,

, , [^#]+
.
, .
URL, . URL, , URL,
[^#]+ , , . , URL (
) . 2.9. 3.9 ,
, , .

,
URL,
7.7. 7.7
, URL,
12.

.
2.9, 3.9 7.7.

7.14. URL

, URL.
, top http://www.regexcookbook.
com#top /index.html#top.

#(.+)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

URL ,
, URL. URL

473

7.15.

. URL,
, URL.
, .
#.+ . , .

URL, . , , URL. , #,
.
,
URL,
7.7. 7.7
, URL,
13.

.
2.9, 3.9 7.7.

7.15.

,
, .

, :
^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
\A([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby
:
\b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b

474

7. URL,

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
63 :
\b((?=[a-z0-9-]{1,63}\.)[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

punycode:
\b((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
63 ,
punycode:
\b((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

domain.tld subdomain.domain.tld,
.
(top-level domain, tld) .
: [a-z]{2,} .

, .
.
[a-z0-9]+(-[a-z0-9]+)* . ,
, ,
. , ( 2.3), ,

.

, \. . URL
, , :

475

7.15.

([a-z0-9]+(-[a-z0-9]+)*\.)+.
,
, , .
, , ,
. ^ $
Ruby \A \Z JavaScript. 2.5.

,
\b ( 2.6).

,
63 .
, , , [a-z0-9]+(-[a-z0-9]+)* ,
,
63 .

[-a-z0-9]{1,63} , 63 , \b([a-z0-9]{1,63}\.)+[a-z]{2,63} , . .

,
. , , 2.16.
[a-z0-9]+(-[a-z09]+)*\. , ,
, [-a-z0-9]{1,63}\. , ,
63 . (?=[-a-z0-9]{1,63}\.)
[a-z0-9]+(-[a-z0-9]+)*\. .

(?=[-a-z0-9]{1,63}\.) ,
1 63 , . .
63 - 63 , . ,
63 .
. ,
[a-z0-9]+(-[a-z0-9]+)*\. , . , -

476

7. URL,

63 , ,

.
(Internationalized Domain Names, IDN) . , . , .es
,
.
, punycode.
punycode, , , ,
, . , , punycode, xn--. ,
(xn--)? , .

.
2.3, 2.12 2.16.

7.16. IPv4
, IPv4 255.255.255.255.
32- .


, IP-:
^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, IP-:
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

477

7.16. IPv4

, IP-
:
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, IP-
:
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, IP-:
^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, IP-:
^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Perl
if ($subject =~ m/^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})/)
{
$ip = $1 << 24 | $2 << 16 | $3 << 8 | $4;
}

IP- 4 255.255.255.255,
0 255. IP- .
.
, .

IP- [0-9]
{1,3} . 0 999,
0 255. ,

478

7. URL,

, IP-,
IP- .
IP- 25[0-5]|2[04][0-9]|[01]?[0-9][0-9]? .
0 255,
10 99
0 9. 25[0-5]
250 255, 2[0-4][0-9]
200 249 [01]?[0-9][0-9]? 0
199, . 6.5.

, IP, , .
, , 2.5. IP-
, ,
\b
( 2.6).

(?:number\.){3}
number . IP-
( 2.9), ( 2.12). ,
IP-. IP-.
.

IP- , .
. , , , .
, IP-.
32- . , , Perl $1, $2, $3 $4. 3.9. Perl , , (<<). String.toInteger() , .

7.17. IPv6

479

.
2.3, 2.8, 2.9 2.12.

7.17. IPv6
,
IPv6 , / .


IPv6 ,
16-
, (: 1762:0:0:0:0:B03:1:AF18).
.
, IPv6 :
^(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
\A(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby
IPv6 :
(?<![:.\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\w])

:
: .NET, Java, PCRE, Perl, Python, Ruby 1.9
JavaScript Ruby 1.8
. , IPv6
.
:
\b(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

480

7. URL,


IPv6 , 16- , .
, . . . IPv4 IPv6
IPv4 IPv6. IPv6 : 1762:0:0:0:0:B03:127.32.67.15.
, IPv6 :
^(?:[A-F0-9]{1,4}:){6}(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
IPv6 :
(?<![:.\w])(?:[A-F0-9]{1,4}:){6}
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?![:.\w])

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
JavaScript Ruby 1.8
. , IPv6
.
:
\b(?:[A-F0-9]{1,4}:){6}(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


IPv6
.
, IPv6 :
\A
(?:[A-F0-9]{1,4}:){6}
(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}

#
# 6
# 2

481

7.17. IPv6
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} # 4
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
)\Z
#

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:[A-F0-9]{1,4}:){6}(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}|
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
IPv6 :
(?<![:.\w])
(?:[A-F0-9]{1,4}:){6}
(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
)(?![:.\w])

#
#
#
#


6
2
4

: ,

: .NET, Java, PCRE, Perl, Python, Ruby 1.9
JavaScript Ruby 1.8
. , IPv6
.
:
\b
(?:[A-F0-9]{1,4}:){6}
(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
)\b

#
#
#
#


6
2
4

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
\b(?:[A-F0-9]{1,4}:){6}(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}|
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\b

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

482

7. URL,


IPv6 .
,
,
, , ,
. , .
, , .
IP-,
. IP- ,
, .
, 1762::B03:1:AF18 1762:0:0:0:0:B03:1:AF18 . , IPv6. , IPv6
:
\A(?:
#
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
# , 7
|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
\Z) #
# 1
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
)\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|
(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}$)(([0-9A-F]{1,4}:){1,7}|:)
((:[0-9A-F]{1,4}){1,7}|:))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
IPv6 :
(?<![:.\w])(?:
#
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
# , 7
|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
(?![:.\w])) #

7.17. IPv6

483

# 1
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
)(?![:.\w])

: ,

: .NET, Java, PCRE, Perl, Python, Ruby 1.9
JavaScript Ruby 1.8
, , IPv6
. ,
,
:
(?:
#
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
# , 7
|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
(?![:.\w])) #
# 1
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
)(?![:.\w])

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
(?:(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}
[A-F0-9]{0,4}(?![:.\w]))(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4})
{1,7}|:))(?![:.\w])

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


IPv6 . , , , ,
, . ,
.
, ,
, . , , -

484

7. URL,

.
IP-, , .
, 1762::B03:127.32.67.15 1762:0:0:0:0:B03:127.32.67.15.
IPv6 ,
, .
, IPv6 :
\A
(?:
#
(?:[A-F0-9]{1,4}:){6}
# , 6
|(?=(?:[A-F0-9]{0,4}:){0,6}
(?:[0-9]{1,3}\.){3}[0-9]{1,3} # 4
\Z) #
# 1
(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.)
{3}[0-9]{1,3}$)(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:))
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4]
[0-9]|[01]?[0-9][0-9]?)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
IPv6
:
(?<![:.\w])
(?:
#
(?:[A-F0-9]{1,4}:){6}
# , 6
|(?=(?:[A-F0-9]{0,4}:){0,6}
(?:[0-9]{1,3}\.){3}[0-9]{1,3} # 4
(?![:.\w])) #
# 1
(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)

7.17. IPv6

485

)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
(?![:.\w])

: ,

: .NET, Java, PCRE, Perl, Python, Ruby 1.9
JavaScript Ruby 1.8
, , IPv6
. ,
,
:
(?:
#
(?:[A-F0-9]{1,4}:){6}
# , 6
|(?=(?:[A-F0-9]{0,4}:){0,6}
(?:[0-9]{1,3}\.){3}[0-9]{1,3} # 4
(?![:.\w])) #
# 1
(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
(?![:.\w])

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}
[0-9]{1,3}(?![:.\w]))(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:))
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?![:.\w])

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
IPv6 , : , .

486

7. URL,

, IPv6:
\A(?:
#
(?:
#
(?:[A-F0-9]{1,4}:){6}
# , 6
|(?=(?:[A-F0-9]{0,4}:){0,6}
(?:[0-9]{1,3}\.){3}[0-9]{1,3} # 4
\Z) #
# 1
(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
|#
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
|# , 7
(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
\Z) #
# 1
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
)\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}
[0-9]{1,3}$)(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:))
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-F0-9]{1,4}:){7}
[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}$)
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python
IPv6 , :
(?<![:.\w])(?:
#
(?:
#
(?:[A-F0-9]{1,4}:){6}
# , 6
|(?=(?:[A-F0-9]{0,4}:){0,6}
(?:[0-9]{1,3}\.){3}[0-9]{1,3} # 4
(?![:.\w])) #

7.17. IPv6

487

# 1
(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
|#
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
|# , 7
(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
(?![:.\w])) #
# 1
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
)(?![:.\w])

: ,

: .NET, Java, PCRE, Perl, Python, Ruby 1.9
JavaScript Ruby 1.8 , , IPv6 .
,
, :
(?:
#
(?:
#
(?:[A-F0-9]{1,4}:){6}
# , 6
|(?=(?:[A-F0-9]{0,4}:){0,6}
(?:[0-9]{1,3}\.){3}[0-9]{1,3} # 4
(?![:.\w])) #
# 1
(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
|#
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
|# , 7
(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
(?![:.\w])) #
# 1
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
)(?![:.\w])

488

7. URL,

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
(?:(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}
[0-9]{1,3}(?![:.\w]))(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:))
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?)|(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|
(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}(?![:.\w]))
(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:))(?![:.\w])

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

- IPv6
, IPv4.
. : .
, .
, , .
.
, IPv6,
IP-
.
IP- , 2.5. JavaScript ^
$ , Ruby \A
\Z . . Ruby ^ $ ,
, . Ruby
,
.

IPv6

(? ![:.\w]) (?![:.\w]) ,
(, ), . , . 2.16. ,

489

7.17. IPv6

,
() . . , ,
, ,
. 2.6.


IPv6
. , . [A-F0-9]{1,4} 1 4 , 16- .
( 2.3) . .
3.4.

(?:[A-F0-9]{1,4}:){7} , .
. , 2.9, .

,
. .
, .


IPv6 . (?:[A-F0-9]
{1,4}:){6} , ,
.


IPv4. 7.16.

490

7. URL,


, , .
32
IPv6. 16- ,
4 , IPv4.
, , . 32 . 2.8, ( ) . ,
.
, , .
IPv4.


,
. ,
. , 1:0:0:0:0:6:0:0,
1::6:0:0 1:0:0:0:0:6::
IPv6. , . ,
,
.
. IPv6 ,
,
. :
(
([0-9A-F]{1,4}:){1,7}
| :
)
(
(:[0-9A-F]{1,4}){1,7}
| :
)

# 1 7
#

# 1 7
#

: ,

: .NET, Java, PCRE, Perl, Python, Ruby

7.17. IPv6

491

IPv6 , , .
,
, JavaScript,
.
JavaScript , , ,
, .

.
1 7 , , , .
1 7 ,
, , . ,
1 7 , 1 7 , 1 7 .
. 1 7
, , ,
, c
7. IPv6 8 . , , , ,
7.
.
1 7 .
, 7 -
.
, . , -,
aaaaxbbb. 1 8 0 7 a, x 0 7 b.

. , . ,
.
, .

492

7. URL,
\A(?:a{7}x
| a{6}xb?
| a{5}xb{0,2}
| a{4}xb{0,3}
| a{3}xb{0,4}
| a{2}xb{0,5}
| axb{0,6}
| xb{0,7}
)\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby
a . b,
a x. , .
, , IPv6 . , , 2.16. , .
\A
(?=[abx]{1,8}\Z)
a{0,7}xb{0,7}
\Z

:
: .NET, Java, PCRE, Perl, Python, Ruby

\A . . 1 8 a , b / x , , . \Z .
,
.

\A \Z -. aaaaxbbb , . , ,
- , ,
, ,

493

7.17. IPv6

. ,
.
, . , a{0,7} ,
. ,
, ,

.

, a{0,7}xb{0,7}
15 , ,
8-, 8 . ,
a{0,7}xb{0,7} , . a*xb*
, a{0,7}xb{0,7} .

\Z
.
, . ,
- axba,
1 8
.


, , .
, ,
, .
, IPv6 ,
IPv4, .
,
, , .
.
()
,
IPv4 . ,
7.16, IPv4,
,
.

494

7. URL,

IPv4 ,
, , IPv6 . ,

IPv4. IPv4 , . IPv4, , , .

,

. IPv6 :
, .

()
. , , IPv6.
, . IPv6
, , . IPv6 . () .
,
,
.

IPv4.
:
^(6words|compressed6words)ip4$

:
^(8words|compressed8words)$

:
^((6words|compressed6words)ip4|8words|compressed8words)$

:
^((6words|compressed6words)ip4|(8words|compressed8words))$

7.18. Windows

495

.
2.16 7.16.

7.18. Windows

,
Microsoft
Windows.


\A
(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\ #
(?:[^\\/:*?<>|\r\n]+\\)*
#
[^\\/:*?<>|\r\n]*
#
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\(?:[^\\/:*?<>|\r\n]+\\)*
[^\\/:*?<>|\r\n]*$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

UNC
\A
(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\| #
(?:[^\\/:*?<>|\r\n]+\\)*
#
[^\\/:*?<>|\r\n]*
#
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\(?:[^\\/:*?<>|\r\n]+\\)*
[^\\/:*?<>|\r\n]*$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

496

7. URL,

, UNC
\A
(?:(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\|
\\?[^\\/:*?<>|\r\n]+\\?)
(?:[^\\/:*?<>|\r\n]+\\)*
[^\\/:*?<>|\r\n]*
\Z

#
#
#
#

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\|\\?[^\\/:*?<>|\r\n]+\\?)
(?:[^\\/:*?<>|\r\n]+\\)*[^\\/:*?<>|\r\n]*$

:
: .NET, Java, JavaScript, PCRE, Perl, Python


,
, .
, . [a-z]:\\ . , , , .

Windows ,
: \/:*? |. . ,
,
[^\\/:*? |\r\n]+ . , . \r \n . 2.3. ( 2.12) ,
.

.
(?:[^\\/:*? |\r\n]+\\)* , , ,
( 2.9) ,
( 2.12).

[^\\/:*?
|\r\n]* . , -

497

7.18. Windows

.
, ,
* + .

UNC
,
,
(Universal Naming Convention, UNC).
UNC \\server\share\folder\file.
,
, , UNC. [az]: , , ,
.

(?:[a-z]:|\\\\[a-z0-9_.$]+\\
[a-z0-9_.$]+) . ( 2.8).
[a-z]:
\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+ .
.
, . 2.9, (?: . .

. UNC , .

, UNC
(, .., ) . ,
, . ,
.

\\?[^\\/:*? |\r\n]+\\? . ,
. \\? , , . [^\\/:*? |\r\n]+ .
, \\? , , , -

498

7. URL,

.
, \\? ,
. , , , , , , .

, ,
, . ,
, , .
, ,
, . , , . , .
, .

, , , . .

.
2.3, 2.8, 2.9 2.12.

7.19. Windows

, , Microsoft
Windows. ,
Windows, ,
.


\A
(?<drive>[a-z]:)\\
(?<folder>(?:[^\\/:*?<>|\r\n]+\\)*)

7.19. Windows

499

(?<file>[^\\/:*?<>|\r\n]*)
\Z

: ,

: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\A
(?P<drive>[a-z]:)\\
(?P<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?P<file>[^\\/:*?<>|\r\n]*)
\Z

: ,

: PCRE 4 , Perl 5.10, Python
\A
([a-z]:)\\
((?:[^\\/:*?<>|\r\n]+\\)*)
([^\\/:*?<>|\r\n]*)
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z]:)\\((?:[^\\/:*?<>|\r\n]+\\)*)([^\\/:*?<>|\r\n]*)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

UNC
\A
(?<drive>[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\
(?<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?<file>[^\\/:*?<>|\r\n]*)
\Z

: ,

: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\A
(?P<drive>[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\
(?P<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?P<file>[^\\/:*?<>|\r\n]*)
\Z

: ,

: PCRE 4 , Perl 5.10, Python

500

7. URL,

, UNC
.
.
\A
(?<drive>[a-z]:\\|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+\\|\\?)
(?<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?<file>[^\\/:*?<>|\r\n]*)
\Z

: ,

: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\A
(?P<drive>[a-z]:\\|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+\\|\\?)
(?P<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?P<file>[^\\/:*?<>|\r\n]*)
\Z

: ,

: PCRE 4 , Perl 5.10, Python
\A
([a-z]:\\|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+\\|\\?)
((?:[^\\/:*?<>|\r\n]+\\)*)
([^\\/:*?<>|\r\n]*)
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z]:\\|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+\\|\\?)
((?:[^\\/:*?<>|\r\n]+\\)*)([^\\/:*?<>|\r\n]*)$

:
: .NET, Java, JavaScript, PCRE, Perl, Python


. ,
, .


, ,
,

501

7.19. Windows

. ,
: drive (), folder () file ().
( 2.11), .

: 1, 2 3. 3.9 , , , .

UNC
, UNC,
.

, UNC
, . , . . , , .
, UNC
, .
, .
, . ,
, , .
Windows. , , ,
, . , , , .
: .

, , (
), , ( ):

502

7. URL,
\A
(?:
(?<drive>[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\
(?<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?<file>[^\\/:*?<>|\r\n]*)
| (?<relativefolder>\\?(?:[^\\/:*?<>|\r\n]+\\)+)
(?<file2>[^\\/:*?<>|\r\n]*)
| (?<relativefile>[^\\/:*?<>|\r\n]+)
)
\Z

: ,

: .NET, PCRE 7, Perl 5.10, Ruby 1.9
\A
(?:
(?P<drive>[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\
(?P<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?P<file>[^\\/:*?<>|\r\n]*)
| (?P<relativefolder>\\?(?:[^\\/:*?<>|\r\n]+\\)+)
(?P<file2>[^\\/:*?<>|\r\n]*)
| (?P<relativefile>[^\\/:*?<>|\r\n]+)
)
\Z

: ,

: PCRE 4 , Perl 5.10, Python
\A
(?:
([a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\
((?:[^\\/:*?<>|\r\n]+\\)*)
([^\\/:*?<>|\r\n]*)
| (\\?(?:[^\\/:*?<>|\r\n]+\\)+)
([^\\/:*?<>|\r\n]*)
| ([^\\/:*?<>|\r\n]+)
)
\Z

: ,

: .NET, Java, PCRE, Perl, Python, Ruby
^(?:([a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\((?:[^\\/:*?<>|\r\n]+\\)*)
([^\\/:*?<>|\r\n]*)|(\\?(?:[^\\/:*?<>|\r\n]+\\)+)([^\\/:*?<>|\r\n]*)|
([^\\/:*?<>|\r\n]+))$

:
: .NET, Java, JavaScript, PCRE, Perl, Python

503

7.20. Windows

- ,
, , .
,


, .
.NET
. .NET ,
. .NET,
folder () file (),
, folder file :

\A
(?:
(?<drive>[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\
(?<folder>(?:[^\\/:*?<>|\r\n]+\\)*)
(?<file>[^\\/:*?<>|\r\n]*)
| (?<folder>\\?(?:[^\\/:*?<>|\r\n]+\\)+)
(?<file>[^\\/:*?<>|\r\n]*)
| (?<file>[^\\/:*?<>|\r\n]+)
)
\Z

: ,

: .NET

.
2.9, 2.11, 3.9 7.18.

7.20.
Windows

, ()
Windows .
, . , c c:\folder\file.ext.

504

7. URL,

^([a-z]):

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
, ,
, .
, UNC.
Windows
, , . , ,
, , .

^ ( 2.5).
, Ruby , ,
Windows . [a-z] ( 2.3).
( ),
, . , , .

.
2.9, .
3.9, , , , .
7.19, , Windows.

7.21.
UNC

, ()
Windows .

7.21. UNC

505

UNC,
, . , server share \\server\share\folder\file.ext.

^\\\\([a-z0-9_.$]+)\\([a-z0-9_.$]+)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

,
, , ,
UNC.
.
UNC .
Windows , , UNC. ,
, ,

^ ( 2.5).
, Ruby , ,
Windows . \\\\
. ,
, . [a-z0-9_.$]+ . , , .
, , , . \\server\share.

.
2.9, .
3.9, , , , .
7.19, , Windows.

506

7. URL,

7.22. Windows
, ()
Windows . . , \folder\subfolder\ c:\folder\subfolder\file.ext
\\server\share\folder\subfolder\file.ext.

^([a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)?((?:\\|^)
(?:[^\\/:*?<>|\r\n]+\\)+)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, Windows, ,
UNC, . .

, ^([a-z]:|\\\\[a-z0-9_.$]+\\[a-z09_.$]+)? , , . . ,
7.20,
, UNC, 7.21. 2.8.

, , .
, .

(?:[^\\/:*?|\r\n]+\\)+. .

,
,
. . , , . , , , .

, .
. ,

507

7.22. Windows

, , .
,
.
.
, ,
, e\, , \\server\share\. , , , (\\|^) , \\? .

, \\server\shar , e\ ,
2.13. , . :
^([a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)?
((?:\\?(?:[^\\/:*?<>|\r\n]+\\)+)

, , , , , , . \\server\share \\server\
share
, , , .

, [a-z09_.$]+ , , .
+ , .
, , .

, , : e\.
, (?:[^\\/:*? |\r\n]+\\)+ , . , .

(\\|^) \\? , . ,
, .
,

508

7. URL,

, . , ,
, (\\|^) . , . , , , ,
, . (\\|^)
,
, (?:[^\\/:*? |\r\n]+\\)+
,
,
.

7.18 7.19,
.
, ,
, , .
, ,
7.19.
, , .

.
.
, , .

.
2.9, .
3.9, , , , .
7.19, , Windows.

7.23. Windows
, ()
Windows .
. ,
file.ext c:\folder\file.ext.

7.23. Windows

509

[^\\/:*?<>|\r\n]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

, , , , .
.
, , , /
.

$ ( 2.5).
, Ruby , ,
Windows . [^\\/:*? |\r\n]+ ( 2.3) , . , , ,
.

, ,
,
. , ,

.

.
3.7, , , , .
7.19, , Windows.

510

7. URL,

7.24.
Windows
, ()
Windows . , . , .ext c:\folder\file.ext.

\.[^.\\/:*?<>|\r\n]+$

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


, 7.23.
, . 7.23 . 7.23
, .
.
\. ,
.

, Version 2.0.txt, . ,
. .
, . ,
. $ , .txt, .0.


, , .
, ,
, .

.
7.19, , Windows.

7.25.

511

7.25.

, Windows. , , , Save ().


[\\/:*?<>|]+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


.
: .NET, Java, JavaScript, PHP, Perl,
Python, Ruby

Windows \/:*? |.
, , - .


[\\/:*? |] . , . .

+ . , . , , ,
,
,
.

512

7. URL,

, .

.
3.14, ,
.


,
: HTML, XHTML, XML, CSV INI.
, , ,
, ,
. , .
,
.
, , , . , ,
,
, ,
(, ). , .
, , . , , ,
.
(Hypertext Markup Language, HTML)
HTML ,
- . HTML-, ,

514

8.

. HTML, -, - HTML.

HTML:
( , ), , . HTML 4.01, 1999 , .
HTML . , . ( , , , )
( , , ).
(, <html>), (, </html>). , . ,
. ,
(, <div><div></div></div> , <div><span></div></span> ).
( <p>, ) . ,
. ( <br>, ), . .
HTML AZ. . .
<script> <style> ,
.
</style> </script>,
, .
,
, .
-. , <a>
(anchor ) Click me!:

8.

515

<a href=http://www.regexcookbook.com
title = Regex Cookbook>Click me!</a>

,
. . ,
, ( ). , A-Z, a-z, 0-9,
, , ( , <^[-.0-9:A-Z_a-z]+$>). ( selected checked, ) , , . ,
, . ,
(, selected=selected).
AZ. . , .
HTML 4 252
( ).

&#nnnn;
&#xhhhh;, nnnn 0-9, hhhh 0-9 A-F ( ). &; ( ,
HTML) , ,
, (&lt; &gt;), (&quot;) (&amp;).
&nbsp; ( ,
0xA0), ,
HTML ,
. , ,
, . (&) .
HTML :
<!-- -->
<!-- ,
-->

516

8.

. -- > . ( 1995 ) HTML <script> <style>.


.
, HTML ( DOCTYPE), . DOCTYPE HTML,
, ,
HTML 4.01 Strict:
<!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01//EN
http://www.w3.org/TR/html4/strict.dtd>

, , HTML.
, HTML
, . , , , HTML . , , ,
OReilly (Chuck
Musciano) (Bill Kennedy) HTML & XHTML: The
Definitive Guide1, .
HTML
XHTML XML ( ), , .

(Extensible Hypertext Markup


Language, XHTML)
XHTML HTML 4.01 HTML SGML XML. HTML, XHTML
HTML. XHTML 1.0 1.1.
1

, HTML XHTML. ,
6- . . . .: -, 2008.

8.

517

, HTML, ,
HTML, :

XHTML XML,
<?xml version=1.0 encoding=UTF-8?>.

.
/>.

XML ,
HTML, .
, . .
.

HTML XHTML , , . HTML XHTML http://www.w3.org/TR/


xhtml1/#diffs http://wiki.whatwg.org/wiki/HTML_vs._XHTML.
XHTML HTML XML,
,
. ,
(X)HTML, HTML XHTML. ,

HTML XHTML, -
.

(Extensible Markup Language, XML)


XML , ,
, . ,
XHTML, .
XML 1.0 1.1.
XML , HTML, :

518

8.

XML XML,
<?xml version=1.0 encoding=UTF-8?>, , . , <?xml-stylesheet type=text/xsl href=transform.xslt?> , transform.xslt XSL.
DOCTYPE , . :
<!DOCTYPE example [
<!ENTITY copy &#169;>
<!ENTITY copyright-notice Copyright &copy; 2008, OReilly Media>
]>

CDATA .
<![CDATA[ ]]>.

. ,
/>.

XML ( , , )
. AZ, az,
(:) (_), 09, (-) (.).
8.4.
. .

, , XML , , XML. (
HTML, )
.
XML
HTML XHTML, , . , XML, XML, XHTML HTML.

8.

519

, (Comma-Separated Values, CSV)


CSV , ,
. CSV

. - CSV, 2005 RFC 4180, ,
IANA, MIME text/csv.
RFC - , Microsoft Excel. RFC , ,
Excel, .
CSV, RFC
4180 Microsoft Excel 2003
.
,
, , . ,
, . .
.
.
CSV . , . , , , .
, , .
CSV , . , CSV, , .
RFC 4180 , . Excel , Excel 2003 RFC. RFC
, - . , Excel, ,
. , ,
,
.

520

8.

CSV, , . , :
aaa,b b,c cc
1,,333, three,
still more threes

. 8.1 CSV .
8.1. CSV
aaa

b b

c cc

( )

333, three, still more threes

, CSV, , , CSV.
csv
,
. ( ),
.
(INI)
INI
. , , , .
INI, .
INI -, , .
, .
,
, . ,
.
.
, .

8.1. XML

521

. .
INI ( ),
(user post) (name, title content):
; last modified 2008-12-25
[user]
name=J. Random Hacker
[post]
title = Regular Expressions Rock!
content = Let me count the ways...

8.1. XML

HTML, XHTML
XML, , , , - .

, ,
. , , ,
. ,
, ,
, .
, , , .
, , , (X)HTML ( ) , , .


,
,
.
, ,
, . -

522

8.

<, >:
<[^>]*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

>
;
.
,
(X)HTML. ,
> :
<(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, :
<
(?: [^>] # ...
| [^]* # ...
| [^]* #
)*
>

:
: .NET, Java, PCRE, Perl, Python, Ruby
,
. , JavaScript,
, JavaScript .

(X)HTML ( )
> , (X)HTML. , , , , DOCTYPE < .
, ,
,
. , , . -

523

8.1. XML

1 , :
</?([A-Za-z][^\s>/]*)(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
<
/?
([A-Za-z][^\s>/]*)
(?: [^>]
| [^]*
| [^]*
)*
>

#
#
#
#
#


1
...
...

:
: .NET, Java, PCRE, Perl, Python, Ruby
, JavaScript, .

(X)HTML ()
, ,
, (X)HTML, . , . , , (X)
HTML, ,
(, , , ). HTML XHTML,
. 1 2 ( ), , :
<(?:([A-Z][-:A-Z0-9]*)(?:\s+[A-Z][-:A-Z0-9]*(?:\s*=\s*(?:[^]*|
[^]*|[-.:\w]+))?)*\s*/?|/([A-Z][-:A-Z0-9]*)\s*)>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
:

524

8.
<
(?:
([A-Z][-:A-Z0-9]*)
(?:
\s+
[A-Z][-:A-Z0-9]*
(?:
\s*=\s*
(?: [^]*
| [^]*
| [-.:\w]+
)
)?
)*
\s*
/?
|
/
([A-Z][-:A-Z0-9]*)
\s*
) #
> #

#
# ...
# 1
# ...
# ...
#

#
#
-
#

#

#
(HTML)
#
#
(HTML)
#
#
# (XHTML)
# ...
#
# 2
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

XML ()
XML , , .
HTML , :
<(?:([_:A-Z][-.:\w]*)(?:\s+[_:A-Z][-.:\w]*\s*=\s*(?:[^]*|[^]*))*\s*
/?|/([_:A-Z][-.:\w]*)\s*)>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
<
(?:
([_:A-Z][-.:\w]*)
(?:
\s+
[_:A-Z][-.:\w]*
\s*=\s*
(?: [^]*

#
# ...
# 1
# ...
# ...
#

#
-
#

8.1. XML

525

| [^]*

#

#
)*
#
\s*
#
/?
#
|
# ...
/
#
([_:A-Z][-.:\w]*) # 2
\s*
#
)
#
>
#
)

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
,
(X)HTML, 1 2, . ,
XML, , (X)HTML,
, HTML (
). ,
.


, XML- ,
, , . -
XML (X)HTML, .
, , ,
. , , HTML, , (Document Object
Model, DOM). SAX XPath. ,
.

526

8.

, , .
, XML .
. HTML XHTML,
, .
, HTML
XHTML, <br />
XHTML.


,
, , .
XML (X)HTML.
, ,
,
,
. , , , .

< ( ). [^>]* , , >.


, /. ( [^>]*? ), , ,
( 2.13). > .


[^>] , .
, ( .*? ) ( JavaScript [\s\S]*? ). ( <.*> ) ,
< > ,
.

8.1. XML

527

. :
<div>
</div>
<div class=box>
<div id=pandoras-box class=box />
<!-- comment -->
<!DOCTYPE html>
<< < w00t! >
<>

, .
,
<input> <input type=button value=>>> <input
type=button onclick=alert(2 > 1)>.
>, . , CDATA XML, DOCTYPE, <script> ,
>.
- , , , ,
, .

>
,
, , , , . ,
XML- ,
, .
, >, . , <input> : <input type=button value=>>> <input type=button
onclick=alert(2 > 1)>.

[^>], ,

( ), ( ). , , . , . -

528

8.

, , , ( , ).

( [^]* [^]* ). , >,
, .

,
,
.
, ,
.

(!)
,
,
* + ( [^>] ). , , .

.

, .
<, >, , , 2.15.
( ) ,
<, , . !
, (

8.1. XML

529

JavaScript Python , ), , . ,

. ,
( ), .
:
<(?>(?:(?>[^>]+)|[^]*|[^]*)*)>

:
: .NET, Java, PCRE, Perl, Ruby
:
<(?:[^>]++|[^]*|[^]*)*+>

:
: Java, PCRE, Perl 5.10, Ruby 1.9

(X)HTML ( )

,
- (X)HTML .
, , , . , HTML, , ,
, ,
.
, ,
(<) AZ az,
/ ( ). <
, , DOCTYPE, XML, CDATA

530

8.

. -,
, , <textarea> . (X)HTML XML . 534 ,
. , .

< . /? , , . ([A-Za-z][^\s>/]*) ,
1. (, ),
( , ). .
, [A-Za-z] , .
, [^\s>/] ,
, . ( \s , ), > ( ) / (
> XHTML).
( )
. ,
. ,
, , DOM , , , .

, ,
: (?:[^>]|[^]*|[^]*)* . , , .

, ,
() :

<([A-Za-z][^\s>/]*)(?:[^>/]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(/), , .

531

8.1. XML


<([A-Za-z][^\s>/]*)(?:[^>]|[^]*|[^]*)*/>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
.

<([A-Za-z][^\s>/]*)(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
. /? , < .


</([A-Za-z][^\s>/]*)(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
. ,
,
. , , , , .
(!) . 528 ,
. , ,
[^\s>/] , , , , .

,
. , // , :

532

8.
</?([A-Za-z](?>[^\s>/]*))(?>(?:(?>[^>]+)|[^]*|[^]*)*)>

:
: .NET, Java, PCRE, Perl, Ruby
</?([A-Za-z][^\s>/]*+)(?:[^>]++|[^]*|[^]*)*+>

:
: Java, PCRE, Perl 5.10, Ruby 1.9

(X)HTML ()
, , HTML XHTML,
,
. :
AZ az,
AZ, az, 09,
( :
^[-:A-Za-z0-9]+$ ).

, .
, ( )
(/).

, , AZ, az, 09, , , (


: ^[-.:A-Za-z0-9_]+$ ).

( , ), 1 2, . ,
.
. 1:

<([A-Z][-:A-Z0-9]*)(?:\s+[A-Z][-:A-Z0-9]*(?:\s*=\s*
(?:[^]*|[^]*|[-.:\w]+))?)*\s*/?>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

533

8.1. XML

/? , > ,
, . , .
(
/ ), .


</([A-Z][-:A-Z0-9]*)\s*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,

.


, ,
. , , * , + ? ( ). , , .

(X)HTML XML , , <script> . .

XML ()
XML , .
XML,
, XML, .
(X)HTML (), HTML, XML:
(, ). , .
XML ( -

534

8.

) , ,
.
,
[_:A-Z][-.:\w]* 8.4. , XML.

(X)HTML,
1 2, , / .
,
.
. 1:

<([_:A-Z][-.:\w]*)(?:\s+[_:A-Z][-.:\w]*\s*=\s*
(?:[^]*|[^]*))*\s*/?>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

/? , > ,
, . , .
, .

</([_:A-Z][-.:\w]*)\s*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
, CDATA
DOCTYPE.

(X)HTML XML
XML- ,
, ,
. (X)HTML
XML, , , . ,

8.1. XML

535

, (X)HTML XML.
, , ( ), CDATA XML .
, ,
.
3.18 ,
.
: . , ,
. , (X)HTML XML.
.
(X)HTML.
, <script>,
<style>, <textarea> <xmp>1 ( ):
<!--.*?--\s*>|<(script|style|textarea|xmp)\b(?:[^>]|[^]*|
[^]*)*?(?:/>|>.*?</\1\s*>)

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
, ,
:
#
<!-- .*? --\s*>
|
#
<( script | style | textarea | xmp )\b
(?: [^>] #
| [^]* #
| [^]* #
)*?
1

<xmp> , ,
<pre>. <pre> ,
( HTML), . <xmp>
HTML 3.2 HTML 4.

536

8.
(?: #
/>
| #
> .*? </\1\s*>
)

: , ,
: .NET, Java, PCRE, Perl, Python, Ruby
, , JavaScript, JavaScript .
, [\
s\S] , JavaScript:

<!--[\s\S]*?--\s*>|<(script|style|textarea|xmp)\b(?:[^>]|[^]*|
[^]*)*?(?:/>|>[\s\S]*?</\1\s*>)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
<script>, <style>, <textarea> <xmp>,
() , .
, . ()
1, , ( ,
).
XML. , CDATA DOCTYPE. , | :

<!--.*?--\s*>|<!\[CDATA\[.*?]]>|<!DOCTYPE\s(?:[^<>]|[^]*|
[^]*|<!(?:[^>]|[^]*|[^]*)*>)*>

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
:
#
<!-- .*? --\s*>

8.1. XML

537

|
# CDATA
<!\[CDATA\[ .*? ]]>
|
#
<!DOCTYPE\s
(?: [^<>] #
| [^]* #
| [^]* #
| <!(?:[^>]|[^]*|[^]*)*> #
)*
>

: , ,
: .NET, Java, PCRE, Perl, Python, Ruby
, JavaScript ( ):
<!--[\s\S]*?--\s*>|<!\[CDATA\[[\s\S]*?]]>|<!DOCTYPE\s(?:[^<>]|[^]*|
[^]*|<!(?:[^>]|[^]*|[^]*)*>)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

HTML 4
HTML,
HTML, .
91 HTML 4.
HTML, <blink>, <bgsound>, <embed> <nobr>. , XHTML 1.1 (
XHTML 1.0 ), , HTML 5:
</?(a|abbr|acronym|address|applet|area|b|base|basefont|bdo|big|blockquote|
body|br|button|caption|center|cite|code|col|colgroup|dd|del|dfn|dir|div|
dl|dt|em|fieldset|font|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|hr|html|
i|iframe|img|input|ins|isindex|kbd|label|legend|li|link|map|menu|meta|
noframes|noscript|object|ol|optgroup|option|p|param|pre|q|s|samp|script|

538

8.
select|small|span|strike|strong|style|sub|sup|table|tbody|td|textarea|
tfoot|th|thead|title|tr|tt|u|ul|var)\b(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, , | . , ,
.
, . , ,
<0.
, 0, , , 91 0.
( ,
), ,
, 91 19.
, :

</?(a(?:bbr|cronym|ddress|pplet|rea)?|b(?:ase(?:font)?|do|ig|lockquote|
ody|r|utton)?|c(?:aption|enter|ite|o(?:de|l(?:group)?))|d(?:[dlt]|el|fn|
i[rv])|em|f(?:ieldset|o(?:nt|rm)|rame(?:set)?)|h(?:[1-6r]|ead|tml)|
i(?:frame|mg|n(?:put|s)|sindex)?|kbd|l(?:abel|egend|i(?:nk)?)|m(?:ap|
e(?:nu|ta))|no(?:frames|script)|o(?:bject|l|p(?:tgroup|tion))|p(?:aram|
re)?|q|s(?:amp|cript|elect|mall|pan|t(?:rike|rong|yle)|u[bp])?|t(?:able|
body|[dhrt]|extarea|foot|head|itle)|ul?|var)\b(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
:
<
/? #
( # 1
a(?:bbr|cronym|ddress|pplet|rea)?|
b(?:ase(?:font)?|do|ig|lockquote|ody|r|utton)?|
c(?:aption|enter|ite|o(?:de|l(?:group)?))|
d(?:[dlt]|el|fn|i[rv])|
em|
f(?:ieldset|o(?:nt|rm)|rame(?:set)?)|
h(?:[1-6r]|ead|tml)|
i(?:frame|mg|n(?:put|s)|sindex)?|
kbd|
l(?:abel|egend|i(?:nk)?)|
m(?:ap|e(?:nu|ta))|

539

8.1. XML
no(?:frames|script)|
o(?:bject|l|p(?:tgroup|tion))|
p(?:aram|re)?|
q|
s(?:amp|cript|elect|mall|pan|t(?:rike|rong|yle)|u[bp])?|
t(?:able|body|[dhrt]|extarea|foot|head|itle)|
ul?|
var
) \b
(?: [^>]
| [^]*
| [^]*
)*
>

#
#
#
#


, >, ,

:
: .NET, Java, PCRE, Perl, Python, Ruby


, :
<
/? #
( # 1
a (?:
|
|
|
|
)?|
b (?:
|
|
|
|
|
|
)?|
c (?:
|
|
|
) |
d (?:
|
|
|
) |
em |

bbr
cronym
ddress
pplet
rea

#
#
#
#
#
#
ase (?:font)?
#
do
#
ig
#
lockquote
#
ody
#
r
#
utton
#
#
aption
#
enter
#
ite
#
o (?:de|l(?:group)?) #
#
[dlt]
#
el
#
fn
#
i[rv]
#
#
#

( <a>)
<base>, <basefont>

( <b>)

<code>, <col>, <colgroup>


<dd>, <dl>, <dt>

<dir>, <div>

540

8.
f (?: ieldset
| o (?:nt|rm)
| rame (?:set)?
) |
h (?: [1-6r]
| ead
| tml
) |
i (?: frame
| mg
| n (?:put|s)
| sindex
)?|
kbd |
l (?: abel
| egend
| i (?:nk)?
) |
m (?: ap
| e (?:nu|ta)
) |
no (?: frames
| script
) |
o (?: bject
| l
| p (?:tgroup|tion)
) |
p (?: aram
| re
)?|
q |
s (?: amp
| cript
| elect
| mall
| pan
| t (?:rike|rong|yle)
| u[bp]
)?|
t (?: able
| body
| [dhrt]
| extarea
| foot
| head
| itle
) |
ul? |
var

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

<font>, <form>
<frame>, <frameset>
<h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <hr>

<input>, <ins>
( <i>)

<li>, <link>

<menu>, <meta>

<optgroup>, <option>

( <p>)

<strike>, <strong>, <style>


<sub>, <sup>
( <s>)

<td>, <th>, <tr>, <tt>

<u>, <ul>

8.2. <b> <strong>


) \b
(?: [^>]
| [^]*
| [^]*
)*
>

#
#
#
#

541


, >,

:
: .NET, Java, PCRE, Perl, Python, Ruby
XHTML, , , XHTML 1.0 , 14 : <applet>, <basefont>, <center>, <dir>, <font>, <frame>,
<frameset>, <iframe>, <isindex>, <menu>, <noframes>, <s>, <strike> <u>.
XHTML 1.1 XHTML 1.0
( ): <rb>, <rbc>, <rp>, <rt>, <rtc> <ruby>.
, XHTML 1.0 1.1, .

.

, ; 8.2 , .
8.4, ,
XML .

8.2. <b> <strong>

<b> <strong>,
.

<b> , :
<(/?)b\b((?:[^>]|[^]*|[^]*)*)>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

542

8.

:
< #
(/?)
# 1
b \b
#
(
# 2
(?: [^>] #
, >,
| [^]* #

| [^]* #

)*
#
)
#
>
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
, :
<$1strong$2>

: .NET, Java, JavaScript, Perl, PHP


<\1strong\2>

: Python, Ruby
, ,
2 :
<$1strong>

: .NET, Java, JavaScript, Perl, PHP


<\1strong>

: Python, Ruby
, , 3.15.

XML- .
. <b> <strong> , .

<
. -

543

8.2. <b> <strong>

, , /? ,
. ( , )
- .

, <b>. . ,
B.

( \b ), , ,
. <b>,
<br>, <body>, <blockquote> , b.
( \s ), ,
. .

XML XHTML ,
XML
, , . ,
-, <b-sharp>. ,

(?=[\s/>]) . ,
.

((?:[^>]|[^]*|[^]*)*) , , ,
. , ,
( ) .
, . ,
[^>] , , >, . ,
, , ,
, .

544

8.


, .
. ( | ).

<b>, <i>, <em> <big>. , ,


<strong> </strong>, :
<(/?)([bi]|em|big)\b((?:[^>]|[^]*|[^]*)*)>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
<
(/?)
([bi]|em|big) \b
(
(?: [^>]
| [^]*
| [^]*
)*
)
>

#
# 1
# 2
#
. 3
#
, >,
#

#

#
#
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
<b> <i>
[bi] ,
(|), <em> <big>. , ,
.
, .

, , , , 3. , <strong>
, ,

.

8.3. XML- , <em> <strong>

545

, :
<$1strong$3>

: .NET, Java, JavaScript, Perl, PHP


<\1strong\3>

: Python, Ruby
,
3 :
<$1strong>

: .NET, Java, JavaScript, Perl, PHP


<\1strong>

: Python, Ruby

.
8.1, , XML- ,
.
8.3,
, , , .

8.3. XML- ,
<em> <strong>

,
<em> <strong>.
,
<em> <strong>, <em> <strong>,
.

( 2.16). , , , < </ .

546

8.

( , 3.14),
.

1: ,
<em> <strong>
</?(?!(?:em|strong)\b)[a-z](?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
< /?
(?!

#
#
(?: em | strong ) #
,
\b
#

#

)
#
[a-z]
# a-z
(?: [^>]
#
, >,
| [^]*
#

| [^]*
#

)*
#
>
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

2: , <em>
<strong>, ,

( \b \s*> ), <em> <strong>,


:
</?(?!(?:em|strong)\s*>)[a-z](?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
< /?
(?!

#
#
(?: em | strong ) #
,
\s* >
#
,
)
#
[a-z]
# a-z
(?: [^>]
#
, >,
| [^]*
#

8.3. XML- , <em> <strong>


| [^]*
)*
>

#
#
#

547

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

, XML- . ,
, (X)HTML ( ) 8.1.
, 1.
. 1 <em> <strong>
, , . 2 ,
1, <em>
<strong>, . . 8.2 , .
8.2.

<i>

</i>

<i style=font-size:500%; color:red;>

<em>

</em>

<em style=font-size:500%; color:red;>


, (
), 2 <em> <strong>
.

548

8.

( ) , , ,
, . ,
,
HTML.
HTML
(XSS),
<, > &
(&lt;, &gt; &amp;)
, , , ( ). style ,
CSS . , <, > & ,
&lt;(/?)em&gt;

<$1em> ( Python Ruby, <\1em> ).


: , <a>, <em> <strong>, . <a>, , href title,
, , <em> <strong>,
- . .
, ,
(<a>, <em> <strong>).
href title, <a>.
.
, , , :
<(?!(?:em|strong|a(?:\s+(?:href|title)\s*=\s*(?:[^]*|[^]*))*)\s*>)
[a-z](?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
< /?
(?!

#
#

8.4. XML

549

(?: em
| strong
| a
(?:

# <em>...
# <strong>...
# <a>...
#
<a>,
#
...
\s+ #
, href / title
(?:href|title)
\s*=\s*
(?:[^]*|[^]*) #
#

)*
)
\s* >
)
[a-z]
(?: [^>]
| [^]*
| [^]*
)*
>

# , ...
# ,
# a-z
# , >,
#
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
, . , , , , 3.11 3.16, , ,
( ,
- ).

.
8.1, , XML- ,
.
8.2, , .

8.4. XML

, XML ( ). XML ,
, ,

550

8.

, .
, , , , , , .
, . XML.
, , , , XML,
.
:
thing

_thing_2_

:-

fantastic4:the.thing

,
, Latin, ,
.
, 09.

, :
thing!

thing with spaces

.thing.with.a.dot.in.front
-thingamajig
2nd_thing


XML , , . XML 1.0, ( ), XML 1.1
1.0, . XML 1.1 , 1.0,
. , . , -

551

8.4. XML

, .
, XML 1.0,
XML 1.0. XML 1.1, XML 1.0. W3C 2008 , XML 1.1.

,
( ^...$ \A...\Z ), . ,
XML,
. 2.5.

XML 1.0 ()
\A[:_\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nl}][:_\-.\p{L}\p{M}\p{Nd}\p{Nl}]*\Z

:
: .NET, Java, PCRE, Perl, Ruby 1.9
PCRE UTF-8,
( \p{...} ).
PHP UTF-8 /u.

JavaScript, Python Ruby 1.8.


XML 1.1, ,
. . 553 , XML 1.1, ,
.

XML 1.1 ()
,
.
, . FF ( 255) <\u>
\x{...} .

552

8.
\A[:_A-Za-z\xC0-\xD6\xD8-\xF6\xF8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C
\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]
[:_\-.A-Za-z0-9\xB7\xC0-\xD6\xD8-\xF6\xF8-\u036F\u0370-\u037D\u037F-\u1FFF
\u200C\u200D\u203F\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-
\uFDCF\uFDF0-\uFFFD]*\Z

:
: .NET, Java, Python, Ruby 1.9
^[:_A-Za-z\xC0-\xD6\xD8-\xF6\xF8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C
\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]
[:_\-.A-Za-z0-9\xB7\xC0-\xD6\xD8-\xF6\xF8-\u036F\u0370-\u037D\u037F-\u1FFF
\u200C\u200D\u203F\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-
\uFDCF\uFDF0-\uFFFD]*$

: ( ^ $
)
: .NET, Java, JavaScript, Python
\A[:_A-Za-z\xC0-\xD6\xD8-\xF6\xF8-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}
\x{200C}\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}
\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}][:_\-.A-Za-z0-9\xB7\xC0-\xD6\xD8-\xF6
\xF8-\x{36F}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}\x{200D}\x{203F}
\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-
\x{FDCF}\x{FDF0}-\x{FFFD}]*\Z

:
: PCRE, Perl
PCRE UTF-8,
\x{...}
FF. PHP
UTF-8 /u.

Ruby 1.8 , . 555


, .
, ,
, XML 1.1,
,
16- ( 0x0 0xFFFF).
XML 1.1 917503 , 0x10000 0xEFFFF. PCRE, Perl Python
0xFFFF,
( ,
- ).
, PCRE Perl \x{10000}-

553

8.4. XML

\x{EFFFF} , Python \U00010000-\U000EFFFF ( , U ). XML 1.1 , XML 1.0.

XML, ,
, , .
, .
, , .

XML 1.0
, ,
XML 1.0
. (:), (_) :
(Ll)

(Lu)

(Lt)
(Lo)
(Nl)


(-), (.) :
(M), : ,
(Mn), , (Mc), (Me)
(Lm)
(Nd)


. :
\A
#
[:_\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nl}] #

554

8.
[:_\-.\p{L}\p{M}\p{Nd}\p{Nl}]*
\Z

# ( )
#

:
: .NET, Java, PCRE, Perl, Ruby 1.9
, PCRE
UTF-8. PHP UTF-8 /u.
,
(Ll, Lu, Lt, Lo Lm) \p{L} .

, . . -,
XML 1.0 ( ) . -,
XML 1.0 2.0,
1996 .
,
XML 1.0.
, , 2.0,
, . ,
XML 1.0 (http://
www.w3.org/TR/2006/REC-xml-20060816), 2.3 Common Syntactic
Constructs B Character Classes.

.
Perl PCRE (Ll), (Lu) (Lt) , (L&). , \p{...} , .
, \pL\pM \p{L}\p{M} :

\A[:_\p{L&}\p{Lo}\p{Nl}][:_\-.\pL\pM\p{Nd}\p{Nl}]*\Z

:
: PCRE, Perl
.NET , Lm
L ,
:

8.4. XML

555

\A[:_\p{L}\p{Nl}-[\p{Lm}]][:_\-.\p{L}\p{M}\p{Nd}\p{Nl}]*\Z

:
: .NET
Java, , PCRE Perl, .
, (
) Lm L:
\A[:_\pL\p{Nl}&&[^\p{Lm}]][:_\-.\pL\pM\p{Nd}\p{Nl}]*\Z

:
: Java
JavaScript, Python Ruby 1.8 . Ruby 1.9 ,
, .

XML 1.1
XML 1.0 2.0. , , (, , ). XML
, XML 1.1
XML 1.0. , ,
, , , 2.0, , .
, , ,
, , .
XML 1.1
, XML 1.0 .

(, 8.1), , XML,
, -

556

8.

, .
, . ,
(
), .

, ,
.

,
-,
:
[^\d\s/<=>][^\s/<=>]*

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, .
, .
, +
:

(?!\d)[^\s/<=>]+

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

.
(John Cowan),
XML 1.1, , XML 1.1
, : http://recycledknowledge.blogspot.
com/2008/02/which-characters-are-excluded-in-xml.html.
Background to Changes in XML 1.0, 5th Edition
http://www.w3.org/XML/2008/02/xml10_5th_edition_background.html,
,
XML 1.1, XML 1.0, .

8.5. HTML <p> <br>

557

8.5. HTML
<p> <br>

, , ,
HTML -. , ,
<p>...</p>. , <br>.

, .
.

1: HTML


HTML, HTML &, < > (. 8.3).
-.
8.3. HTML

&
<
>

&amp;
&lt;
&gt;

(&),
, .

2: <br>
:
\r\n?|\n

558

8.

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
\R

:
: PCRE 7, Perl 5.10
:
<br>

: .NET, Java, JavaScript, Perl, PHP, Python,


Ruby

3: <br> </p><p>
:
<br>\s*<br>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
</p><p>

: .NET, Java, JavaScript, Perl, PHP, Python,


Ruby

4: <p>...</p>

.

JavaScript
, JavaScript
html_from_plaintext. , , ,
HTML:
function html_from_plaintext (subject) {
// 1 ( )
subject = subject.replace(/&/g, &amp;).
replace(/</g, &lt;).
replace(/>/g, &gt;);
// 2
subject = subject.replace(/\r\n?|\n/g, <br>);
// 3
subject = subject.replace(/<br>\s*<br>/g, </p><p>);

559

8.5. HTML <p> <br>


// 4
subject = <p> + subject + </p>;
return subject;
}
/*
html_from_plaintext(Test.)
html_from_plaintext(Test.\n)
html_from_plaintext(Test.\n\n)
html_from_plaintext(Test1.\nTest2.)
html_from_plaintext(Test1.\n\nTest2.)
html_from_plaintext(< AT&T >)
*/

->
->
->
->
->
->

<p>Test.</p>
<p>Test.<br></p>
<p>Test.</p><p></p>
<p>Test1.<br>Test2.</p>
<p>Test1.</p><p>Test2.</p>
<p>&lt; AT&amp;T &gt;</p>


. , JavaScript, /g, , replace , . \n, ,
JavaScript ( 0x0A
ASCII) .

1: HTML

,
( . 8.3, ,
). JavaScript
, ,
.

2: <br>
Windows/MS-DOS (CRLF), UNIX/Linux/OS X (LF) Mac OS (CR) \r\n?|\n .
Perl 5.10 PCRE 7 \R ( R) , .

<br> ,

560

8.

</p><p> .
HTML , .
XHTML,
<br> <br/> .
, , , .

3: <br> </p><p>
, ,
,
</p>, <p>. ( ), . 2 ( <br>),

. . HTML.
XHTML
<br/>, <br>,
, , <br/>\s*<br/> .

4: <p> ... </p>


3 . <p>
</p> . , 1 100.

4.10, \R
Perl PCRE , , ,
\R .

8.6.
XML-

(X)HTML XML , , id.

8.6. XML-

561

.
, , , :
id.

<div> id.

id, my-id.

, my-class
class ( ).

id ( )
, ,
( ) :
<[^>]+\sid\b[^>]*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

:
<
#
[^>]+
# , .
\s id \b #
[^>]*
# , id
>
#

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

id ( )
, , >, , id
:
<(?:[^>]|[^]*|[^]*)+?\sid\s*=\s*([^]*|[^]*)
(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

562

8.

:
<
#
(?: [^>]
#
| [^]*
#
| [^]*
#
)+?
#
\s id
#
\s* = \s*
#
( [^]* | [^]* ) #
(?: [^>]
#
| [^]*
#
| [^]*
#
)*
#
>
#

.
...


-
1

...

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
id 1. .
,
\s*=\
s*([^]*|[^]*) \b . , id,
.

<div> id
, . < div\s . \s ( ) , div. ,
, ,
, , ,
(id). +?\sid *?\
bid , , id
( ) :

<div\s(?:[^>]|[^]*|[^]*)*?\bid\s*=\s*([^]*|[^]*)
(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

563

8.6. XML-

:
<div \s
#
(?: [^>]
#
| [^]*
#
| [^]*
#
)*?
#
\b id
#
\s* = \s*
#
( [^]* | [^]* ) #
(?: [^>]
#
| [^]*
#
| [^]*
#
)*
#
>
#


.
...


-
1

...

: ,
: .NET, Java, PCRE, Perl, Python, Ruby

id, my-id
,
id ( ) . 561
id, . , ([^]*|[^]*) (?:my-id|my-id) :

<(?:[^>]|[^]*|[^]*)+?\sid\s*=\s*(?:my-id|my-id)
(?:[^>]|[^]*|[^]*)*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
< #
(?: [^>]
#
| [^]* #
| [^]* #
)+?
#
\s id
#
\s* = \s*
#
(?: my-id #
| my-id ) #
(?: [^>]
#
| [^]* #
| [^]* #
)*
#
>
#

.
...


-

...

...

: ,

564

8.

: .NET, Java, PCRE, Perl, Python, Ruby

(?:my-id|my-id) ,
my-id ( ), ([])my-id\1 . , .

, my-class class

, , , , . ,
. ,
class (
) , , my-class.
:
<(?:[^>]|[^]*|[^]*)+>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
8.1 XML- . ,
,
, .

, ,
3.13, class , :
^(?:[^>]|[^]*|[^]*)+?\sclass\s*=\s*([^]*|[^]*)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
class 1. , class, ^(?:[^>]|[^]*|[^]*)+? ,
, class .
, class. , ,

565

8.6. XML-

, .

. ,

( )
, .
, , 1 ,
:
(?:^|\s)my-class(?:\s|$)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
, myclass , . , ,
.
\bmy-class\b
not-my-class.

, , .
, ,
, . ,
, XPath,
SAX DOM.
, , , . ,
, , .

.
8.7, , .

566

8.

8.7. cellspacing
<table>,

(X)HTML cellspacing=0
, cellspacing.
XML-
, .
, .

1:
<table>, cellspacing,
, :
<table\b(?![^>]*?\scellspacing\b)([^>]*)>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
<table \b

#
#
(?!
#
[^>]
#
*?
#
\s cellspacing \b #
)
#
(
#
[^>]
#
*
#
#
)
#
>
#

<table,

,
, >...
, ()
cellspacing,
. 1
, >...
,
()
>

:
: .NET, Java, PCRE, Perl, Python, Ruby

2:
[^>]
(?:[^>]|[^]*|[^]*) . , -,
, >, -,

8.7. cellspacing <table>,

567

, cellspacing .
:
<table\b(?!(?:[^>]|[^]*|[^]*)*?\scellspacing\b)
((?:[^>]|[^]*|[^]*)*)>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
<table \b

#
#
(?!
#
(?: [^>]
#
| [^]*
#
| [^]*
#
)*?
#
\s cellspacing \b #
)
#
(
#
(?: [^>]
#
| [^]*
#
| [^]*
#
)*
#
)
#
>
#

<table,

,
, >,


, ()
cellspacing
. 1
, >,


, ()

:
: .NET, Java, PCRE, Perl, Python, Ruby


, , ,
<table> (
) 1. , , cellspacing. :
<tablecellspacing=0$1>

: .NET, Java, JavaScript, Perl, PHP


<tablecellspacing=0\1>

: Python, Ruby
3.15 , , .

568

8.

, ,
. , ,
.

, <table\b , <table, ( \b ). , table. , (X)HTML ( , , tablet, tableau


tablespoon), , , ,
.

, (?![^>]*?\scellspacing\b) ,
. , , , , cellspacing - . cellspacing
, ,
.

,
[^>]*? , , , ,
( >). ( \scellspacing\b ) cellspacing, .
( \s ),
.
, cellspacing ,
.

, : ([^>]*) .
, . , ,
. ,
.

569

8.8. XML-

, >
.
2, , , , ,
[^>] (?:[^>]|[^]*|[^]*) .
.

, <table> , cellspacing , , ( 1).

.
8.6, , .

8.8. XML-

(X)HTML XML. , -,
,
, .

. , :
<!--.*?-->

:
: .NET, Java, PCRE, Perl, Python, Ruby
. , - JavaScript

, , . JavaScript:

570

8.
<!--[\s\S]*?-->

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
( ). 3.14 ,
.



<!-- --> .
( , ), . .*?
[\s\S]*? .

, .
JavaScript [\s\
S] . .
\s , \S .
.

*?
,
. , -->, ,
,
, -->. ( , , 2.13.) , XML- . ,
() -->.


- , HTML <script> <style> .
, , . ,
(X)HTML

8.8. XML-

571

JavaScript CSS. ,
<textarea>, CDATA .
, . , ,
(X)HTML XML, . ,
, .
, , , , , , , , .
, .
, .
(X)HTML XML . 534 XML- .
. , 3.18 , , , , ( ):
<(script|style|textarea|xmp)\b(?:[^>]|[^]*|[^]*)*?
(?:/>|>.*?</\1\s*>)|<[a-z](?:[^>]|[^]*|[^]*)*>|<!\[CDATA\[.*?]]>

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
,
:
# :
<( script | style | textarea | xmp )\b
(?: [^>] #
| [^]* #
...
| [^]* #
)*?
(?: #
/>
| # ,
> .*? </\1\s*>
)
|

572

8.

# :
<[a-z]
#
(?: [^>] #
| [^]* #
...
| [^]* #
...
)*
>
|
# CDATA
<!\[CDATA\[ .*? ]]>

: , ,
: .NET, Java, PCRE, Perl, Python, Ruby
JavaScript, :
:
<(script|style|textarea|xmp)\b(?:[^>]|[^]*|[^]*)*?
(?:/>|>[\s\S]*?</\1\s*>)|<[a-z](?:[^>]|[^]*|[^]*)*>|<!\[CDATA\[
[\s\S]*?]]>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

XML-
, (X)HTML XML,
, !-- -->. :

, .
, <!-- com--ment --> , , .

,
. , <!-- comment --->
, <!---> .

-- > . , <!-- comment -- > .


:

573

8.8. XML-
<!--[^-]*(?:-[^-]+)*--\s*>

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
, <!---->. , , . , , . , ( 2.13).
, , ,
[^-] ,
, ( , <!--(?:-?[^-]+)*-\s*> ). , 2.15.

, .
,
, . , (?:-?[^-]+)* : , , , .


. , , , * . --> ( , , ), ,
. , . ,
. , (?:-[^-]+)*
, , + ,
.

574

8.

,
, . , JavaScript
Python:
<!--(?>-?[^-]+)*--\s*>

:
: .NET, Java, PCRE, Perl, Ruby
( ) 2.14.

C-
,
. C
/* */, // .
:
/\*[\s\S]*?\*/|//.*

: (
)
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

.
8.9, , , XML- .

8.9. XML-

TODO ( ) (X)HTML XML. , :


This TODO is not within a comment, but the next one is. <!-- TODO:
Come up with a cooler comment for this example. -->

, , ,
. , -

8.9. XML-

575

. 41, ,
, . ,
, , ,
.
grep, ,

1.
,
.
.


, :
TODO.
:
<!--.*?-->

:
: .NET, Java, PCRE, Perl, Python, Ruby
JavaScript ,
, , :
<!--[\s\S]*?-->

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
,
, , TODO . , TODO, :

\bTODO\b

PowerGREP,
1, , .

576

8.

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
3.13 ,


( 2.16) ,
. , ,
TODO -->, . , ,
, <!-- -->:
\bTODO\b(?=(?:(?!<!--).)*?-->)

: ,
: .NET, Java, PCRE, Perl, Python, Ruby
JavaScript , [\s\S] :

\bTODO\b(?=(?:(?!<!--)[\s\S])*?-->)

:
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby


3.13 , . . , , ,
\bTODO\b .

, , *? ,
, . 2.13, --> ( ),
.

577

8.9. XML-


.
, . , ,

:
\b TODO \b
# TODO,
(?=
# ,
(?:
# , ...
(?! <!-- ) #
, <!--
.
#

)*?
# ,
-->
# -->
) #

: ,

: .NET, Java, PCRE, Perl, Python, Ruby


JavaScript - ,
.
,
,
. , TODO
-->
!--.

, , : . -
, .
,
TODO, -->. \bTODO
\b(?=.*?-->) ( ),
<!--TODO-->.
.*? ,
,
TODO -->, - . *?

578

8.

, , , , -->.

\bTODO(?=.*?-->)\b , \b , - .
, , ( . 111).
, .
, , ,
, ,
TODO .

, , \bTODO\b(?=.*?-->) , , TODO
<!-- separate comment -->? TODO, -->, ,
. , , , ,
!--, . ,
[^<!-] , <, ! -, <!--.

,
. (?!<!--).
, , . , (?:(?!<!--).) , *? , .

, , , : \bTODO\
b(?=(?:(?!<!--).)*?-->) . JavaScript, , : \
bTODO\b(?=(?:(?!<!--)[\s\S])*?-->) .

, TODO -->
<!-- , : <!-- --> .
, :

8.10. , CSV

579

, ,
,
.
,
( ).

: , TODO ,

, .NET.
.NET
:
(?<=<!--(?:(?!-->).)*?)\bTODO\b(?=(?:(?!<!--).)*?-->)

: ,
: .NET
, .NET,
, , , . , <!--, , , -->.
, TODO. .

.
8.8, ,
XML- .

8.10. ,
CSV

, CSV,
. , , , .

580

8.

CSV -, . , ( )

. , , ,
, 2,
1.

CSV,
, ,
(Comma-Separated Values, CSV).
(,|\r?\n|^)([^,\r\n]+|(?:[^]|)*)?

: ( ^ $
)
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

::
( , | \r?\n | ^ ) #
#
(
#
[^,\r\n]+
#
|
#
(?:[^]|)* #
#
)?
#
#

1

2 :

...
(
)
,

: , ( ^ $
)
: .NET, Java, PCRE, Perl, Python, Ruby

3.11, CSV 1 .
.
.
, (
, ). CSV
2 , -

8.10. , CSV

581

. , , , 1.

JavaScript
-,

Replace () . ( Input ()) , -
( Output ()).
CSV, , . ,
.html -:
<html>
<head>
<title>Change CSV delimiters from commas to tabs</title>
</head>
<body>
<p>Input:</p>
<textarea id=input rows=5 cols=75></textarea>
<p><input type=button value=Replace onclick=commas_to_tabs()></p>
<p>Output:</p>
<textarea id=output rows=5 cols=75></textarea>
<script>
function commas_to_tabs () {
var input = document.getElementById(input),
output = document.getElementById(output),
regex = /(,|\r?\n|^)([^,\r\n]+|(?:[^]|)*)?/g,
result = ,
match;
while (match = regex.exec(input.value)) {
// 1
if (match[1] == ,) {
// (
// ) 2.
// 2 ( -
//
// ), .
result += \t + (match[2] || );

582

8.
} else {
//
result += match[0];
}
//
if (match.index == regex.lastIndex) regex.lastIndex++;
}
output.value = result;
}
</script>
</body>
</html>

, , CSV ( , ) .
.

, (,|\r?\n|^) ,
, .
, CSV. , . , ,
^ . ,
, , (, ).

, , , . , , . ,
, , 1 - . , , CSV , , ?
, ,
, -

583

8.10. , CSV

, .NET (
. 113).
.NET , , .
, ,
2. CSV, . ,
.
, 2 , | . ,
[^,\r\n]+ , ,
( + ), , . , , .

2, (?:[^]|)* ,
, . ,
, , , () , .

* , , .
,
CSV. , , , .

.
8.11, ,
CSV, .

584

8.

8.11. CSV

CSV.

8.10 CSV. , ( ) .
( )
CSV , .
,
CSV . , ,

.
CSV,
, ,
(Comma-Separated Values, CSV)
. 519.
(,|\r?\n|^)([^,\r\n]+|(?:[^]|)*)?

: ( ^ $
)
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
( , | \r?\n | ^ ) #
#
(
#
[^,\r\n]+
#
|
#
(?:[^]|)* #
#
)?
#
#

1

2 :

...
(
)
,

: , ( ^ $
)
: .NET, Java, PCRE, Perl, Python, Ruby

8.11. CSV

585

, 8.10, CSV. , CSV.

JavaScript
-,
Extract Column 3 ( 3) . Input ()
, , , ,
(
) Output (). ,
.html -:
<html>
<head>
<title>Extract the third column from a CSV string</title>
</head>
<body>
<p>Input:</p>
<textarea id=input rows=5 cols=75></textarea>
<p><input type=button value=Extract Column 3
onclick=display_csv_column(2)></p>
<p>Output:</p>
<textarea id=output rows=5 cols=75></textarea>
<script>
function display_csv_column (index) {
var input = document.getElementById(input),
output = document.getElementById(output),
column_fields = get_csv_column(input.value, index);
if (column_fields.length > 0) {
// ,
// (\n)
output.value = column_fields.join(\n);
} else {
output.value = [No data found to extract];
}
}

586

8.
// CSV,
//
function get_csv_column (csv, index) {
var regex = /(,|\r?\n|^)([^,\r\n]+|(?:[^]|)*)?/g,
result = [],
column_index = 0,
match;
while (match = regex.exec(csv)) {
// 1. ,
// column_index. .
if (match[1] == ,) {
column_index++;
} else {
column_index = 0;
}
if (column_index == index) {
// (. 2)
result.push(match[2]);
}
//
if (match.index == regex.lastIndex) regex.lastIndex++;
}
return result;
}
</script>
</body>
</html>


8.10, , .
JavaScript,

CSV .
get_csv_column()
. 1. ,
, ,
column_index, . 1
( ), ,
, column_index .

8.11. CSV

587

, column_index
, . ,
, 2 (, ). , get_csv_column() ,
(, ).
,
(\n).
, , , . get_csv_column() ,
, (index).


CSV ,
. , (
).
,
.
,
, .

CSV
1 1
^([^,\r\n]+|(?:[^]|)*)?(?:,(?:[^,\r\n]+|(?:[^]|)*)?)*

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

CSV
2 1
^(?:[^,\r\n]+|(?:[^]|)*)?,([^,\r\n]+|(?:[^]|)*)?
(?:,(?:[^,\r\n]+|(?:[^]|)*)?)*

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

588

8.

CSV
3 1
^(?:[^,\r\n]+|(?:[^]|)*)?(?:,(?:[^,\r\n]+|(?:[^]|)*)?){1},
([^,\r\n]+|(?:[^]|)*)?(?:,(?:[^,\r\n]+|(?:[^]|)*)?)*

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

{1} ,
3. , {2} , 4, {3} 5 . 3, {1}
, .



( 1).
1 .
$1

: .NET, Java, JavaScript, Perl, PHP


\1

: Python, Ruby

8.12.
INI

INI.

. INI (,
[Section1]). :
^\[[^\]\r\n]+]

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

589

8.13. INI

,
:

^ $
^
.

\[ [. [

, .
[^\]\r\n] ,
, ], (\r)
(\n). +,
,
....

] ] . ,
.

,
. Section1:
^\[Section1]

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
[Section1]
, . ( ) , , (, Item1=[Value1]).

.
8.13, , INI.
8.14, ,
-.

8.13. INI

INI (
- ),

590

8.

INI .

8.12 , INI. ,
, , ,
[ (
):
^\[[^\]\r\n]+](?:\r?\n(?:[^[\r\n].*)?)*

: ^ $ ( )
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
:
^ \[ [^\]\r\n]+ ] #
(?:
# ...
\r?\n
#

(?:
# ...
[^[\r\n]
#
, [
.*
#

)?
# ,
)*
# ,

: ^ $ , (
)
: .NET, Java, PCRE, Perl, Python, Ruby

INI ^\[[^\]\r\n]+]
, , [. :

[Section1]
Item1=Value1
Item2=[Value2]
; [SectionA]
; SectionA
ItemA=ValueA ; ItemA Section1

8.14. - INI

591

[Section2]
Item3=Value3
Item4 = Value4

.
[Section2],
. Section2 .

.
8.12, ,
INI.
8.14, , .

8.14. -
INI

- INI
(, Item1=Value1), . 1
(Item1), 2 (Value1).

,
( ):
^([^=;\r\n]+)=([^;\r\n]*)

: ^ $
: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^
( [^=;\r\n]+ )
=
( [^;\r\n]* )

#
#
#
#


1

2

: ^ $ ,
: .NET, Java, PCRE, Perl, Python, Ruby

592

8.

,
INI, .
^ ,
( ^ $ ). , , .

, ( [^=;\r\n] ), ( + ),
1. , : , , ( \r ) ( \n ). INI, ,
.

(
) . , , ,
. -, ( ). -,
* , ( , , ).

.
8.12, ,
INI.
8.13, , INI.


C
7- , 53
$ ( )
, 49
, 63, 65, 66,
147, 279, 283
, 406
, 125
$~[0], , 188
$`, , 134, 192, 253
$_, , 134
$', , 134, 253
$&, , 129, 187, 192
$~, (Ruby), 187
, 200
$+, , 201, 237
& ()
, 407
| ( )
, 85, 301, 365
, 406
- ()
, 55
, 49
, 345, 348
, 406
, ()
, 57
, 433
, 407
* ()
, 279
,
101
, 49
, 406
? ( )
, 102
, 99
, 49
, 406

+ ( )

, 104
, 279
, 49
, 406
[] ( )
, 55
, 49
, 279
, 406
@ ( at)
Perl, 147
, 278
() ( )
, 108, 318
, 49
, 91
, 87
, 406
^ ()
, 49
, 63, 65,
279, 283
, 55
, 406
(?:), , 279
\\ ( )
, 142
, 55
, 49
, 49
, 125, 406
, 403
!~, (Perl), 174
=~, (Perl), 174
=~, (Ruby), 174, 193
# ()
, 123
# (), 407
TM
( ), , 71

594
. ()
, 61
, 49
, 59
, 406
{} ( )
, 97
, 49
, 406

A
\\A, , 63, 64, 65,
280
\\a ( ASCII bell),
52
ActionScript, ,
139
appendReplacement(), , 242
appendTail(), , 242

B
\\b, , , 57, 68,
286, 302, 362
\\B, , 69
<b>, , <strong>,
541
begin(), , Ruby ( MatchData),
193

C
\\cA \\cZ, , 53
CacheSize, (.NET), 154
CANON_EQ, (Java), 164
CASE_INSENSITIVE, (Java),
161
CDATA, XML, 518
cellspacing, ,
<table>, 566
CIL (Common Intermediate Language ), 158
Comma-Separated Values (, , CSV), 519

, 584
, 579
COMMENTS, (Java), 161
compile(),
Java ( Pattern), 145, 155, 161
Python ( re), 157, 163
Ruby ( Regexp), 158
Count, (Match.Groups), 198
Currency, , 79


c
(Namespace Specific String, NSS),
451
C, , 139
C#, , 136
, 150
,
144
, 154
C++, , 139

D
\\D, , 57
\\d, , 57, 411
Delphi for Win32, , 139
Delphi Prism, ,
140
DOCTYPE, (HTML), 516
DOTALL, (Java), 161

E
\\E, , , 50
\\e ( ASCII escape),
52
ECMA-262, , 137, 139
ECMAScript, (RegexOptions),
164
ECMAScript ,
137
end(),
Java ( Matcher), 191, 198
Python ( MatchObject), 193
Ruby ( MatchData), 193
EPP (Extensible Provisioning Protocol
), 290
ereg, (PHP), 138
,
146
exec(), , JavaScript, 191, 198
, 212
ExplicitCapture,
(RegexOptions), 164
EXTENDED, (Regexp), 164
Extensible Hypertext Markup Language
( , XHTML), 516
<table>,
566

595


, 521
,
, 545
Extensible Markup Language ( , XML), 517
<table>,
566
, 560
, 574
, 521, 524
, 549
,
, 545
, 569

F
\\f ( ASCII form
feed), 52
findall(), , Python ( re), 207
finditer(), , Python ( re), 214
find(), , Java ( Matcher), 173,
186, 191, 205
, 212

G
Get-Unique, (PowerShell), 389
grep, , R,
140
Groovy, , 140
GroupCollection, (.NET), 197
groupdict(), , Python (
MatchObject), 200
groups(), , Python (
MatchObject), 199
Groups, .NET (
Match()), 197
Group, (.NET), 197
group(),
Java ( Matcher), 198
re, (Python), 199
gsub(), , Ruby ( String), 232,
244
, 235

H
Hypertext Markup Language ( , HTML), 513
<table>,
566
<p>
, 557
<b> <strong>,
541

, 521
, 537
,
, 545

I
(?-i), , 51
(?i), , 51
IGNORECASE, (Regexp), 164
IgnoreCase, (RegexOptions),
161
IgnorePatternWhitespace,
(RegexOptions), 161
IndexOutOfBoundsException, , 229
index,
JavaScript, 191
.NET, 190
INI,
, 588
, 591
, 589
IPv4, , , 476
IPv6, , , 479
IsMatch(), (.NET), 172
ISO 8601, , 303

J
JavaScript, ,
137
,
151
,
145
, 27
,
162, 165
,
24, 137
, 156
java.util.regex, , 136
, 151
Java, , 136
, 151
,
145
, 27

596
,
161, 164
,
23, 136
, 155


MULTILINE,
Java, 161
Regexp, 164
Multiline, (RegexOptions), 161
m//, (Perl), 174, 187
, 199

ksort(), (PHP), 231

.NET, , 136
,
161, 164
, 154
\\n ( )
C#, 144
Java, 145
Python, 143, 148
\\n ( ASCII
newline), 52
NEAR-, 379
new(), (Ruby), 158, 163
Nregex, - , 34

L
lastIndex, , JavaScript (
RegExp), 192, 212, 378
length(), , Ruby ( MatchData),
194
length,
JavaScript, 191
.NET, 190

M
(?m), , 67
MatchData, (Ruby), 187
begin(), , 193
end(), , 193
length(), , 194
offset(), , 194
size(), , 194
Matcher, (Java), 156
end(), , 191, 198
find(), , 173, 186, 191, 205, 212
group(), , 198
reset(), , 156, 186
start(), , 191, 198
Matches(), (.NET), 204
MatchEvaluator, (.NET), 240
MatchObject, (Python)
end(), , 193
start(), , 193
Match, (.NET)
NextMatch(), , 211
Index Length, 190
match(),
Ruby ( Regexp), 193
match(), (JavaScript), 205
Match(), (.NET), 190
Groups, , 197
Value, , 185
, 211
mb_ereg, (PHP), 137
,
146
Microsoft VBScript, , 141

O
offset(), , Ruby ( MatchData),
194
Onigurama, , 138

P
\\P{}, , 76
\\p{}, , 72
PatternSyntaxException, ,
155
Pattern, (Java), 155
compile(), , 145, 155, 161
split(), , 259
, 161, 164
PCRE (Perl-Compatible Regular
Expressions Perl- ), , 22,
137
, 26
Perl, , 138
,
151
,
146
, 26
,
163, 166

597


,
21, 138
, 157
PHP, , 137
, 151
,
146
, 26
,
162, 165
,
137
POSIX- , 425
PowerGREP, , 43
PowerShell, , 140
PREG_OFFSET_CAPTURE, ,
192, 199, 206
PREG_PATTERN_ORDER, ,
205
PREG_SET_ORDER, , 205
PREG_SPLIT_DELIM_CAPTURE, , 267
PREG_SPLIT_NO_EMPTY, ,
261, 267
preg, (PHP), 137
preg_match_all(), , 205, 214
preg_match(), , 173, 187, 213
preg_replace_callback(), ,
243
preg_replace(), , 137, 229,
235, 237
preg_split(), 267
preg_split(), , 261
, 151
,
146
,
162
, 156
Python, , 138
, 151
,
148
, 27
,
163, 166
,
24

, 157
,
138

Q
\\Q, , , 50
qr//, (Perl), 157

R
\\r ( ASCII
carriage return), 52
REALbasic, ,
141
RegexOptions, , 161, 164
RegexOptions,
Compiled, , 158
regexpr, , R, 140
RegExp, (JavaScript)
exec(), , 198, 212
index, , 191
lastIndex, , 192, 212, 378
length, , 191
test(), , 173
Regexp, (Ruby)
compile(), , 158
new(), , 158, 163
RegExp(), (JavaScript), 156
RegexRenamer, , 46
Regex, (.NET)
IsMatch(), , 172
Matches(), , 204
Replace(), , 227, 234, 240, 241
Split(), , 257, 266
RegEx, (REALbasic), 141
Regex(), (.NET), 144, 154
RegexOptions, ( ), 158
replaceAll(), , Java ( String),
229
, 234
replaceFirst(), , Java ( String),
229
, 234
replace(), , JavaScript (
String), 229
, 235
Replace(), (.NET), 227, 240, 241
, 234
replace, ( String)
, 400

598

reset(), , Java ( Matcher), 156,


186
re, (Python), 27, 138, 148
compile(), , 157
findall(), , 207
finditer(), , 214
group(), , 199
search(), , 174, 187
split(), , 262, 268
sub(), , 231, 235, 244
, 151
,
163
RFC 3986, , 455
RFC 4180, , 519
Ruby, , 138
,
151
,
149
, 27
,
163, 166
,
24
, 158
,
138
R, , 140

Ruby ( String), 263


, 268
Perl, 261
, 267
start(), ,
Java ( Matcher), 191, 198
Python ( MatchObject), 193
String, (Java)
replaceAll(), , 229, 234
replaceFirst(), , 229, 234
String, (JavaScript)
match(), , 186, 205
replace(), , 229, 235
replace, , String, 400
split(), , 260
String, (Ruby)
gsub(), , 232, 235, 244
scan(), , 214
split(), , 263, 268
strlen(), , 192
sub(), , Python ( re), 27, 231,
244
, 235
sub, , R,
140
System.Text.RegularExpressions.Regex,
, 154
s///, (Perl), 231
/e, , 244
, 235

<script>, (HTML), 514


<style>, (HTML), 514
\\S, , 278
\\s, , 315
\\s, , 57
scala.util.matching, , 141
Scala, , 141
scan(), , Ruby ( String), 214
search(), , Python ( re),
174, 187
Singleline, (RegexOptions), 161
size(), , Ruby ( MatchData),
194
split(), ,
JavaScript ( String), 260
Java ( String), 259
.NET, 257
, 266
Python ( re), 262
, 268

<table>, , , 566
\\t ( ASCII
horizontal tab), 52
test(), (JavaScript), 173
TPerlRegEx, , 140

U
\\u,
Java, 145
UNC, , 497, 501, 504
UNICODE_CASE, (Java), 161
UNICODE ( U), , 57, 311
uniq, (UNIX), 389
UNIX_LINES, (Java), 165
URL,
, 460
, 462
, 465
, 467

599


, 471
, 458
, 472
, 442, 444, 446
, 438
, , 452

V
\\v ( ASCII vertical
tab), 52
Value, , .NET ( Match), 185
VB.NET, , 136,
141
, 150
,
145
, 154
Visual Basic 6, ,
141

W
Windows Grep, , 44

X
\\W, , 57, 70
, 315
\\w, , 278
, 315
\\w, , 57, 70
XML 1.0, , 550, 551, 553
XML 1.1, , 550, 551, 555

Z
\\Z, , 63, 64, 66,
280
\\z, , 63, 64

, 351
- , , 308
, 72, 80
, 82
, 108, 110, 318
, 116
( ), 514
, 548
, 560
,

, 548
, 377
, 98
,
, 149
, 72, 77
, 82
,
Windows, 503

, , 418
, , 398
, 429
, 573
, 58
, 101
, , 104
, 109

, , 300
, 133
, 128

,
129

, 364
, 222

, URL,
448
, 68, 302
, 412
, 57
, 73, 81
, 87
, 108, 110,
318
, 131
, 118
, 99
, 111,
313, 362

600
, 119
, 384
, 131

, , 416
, , 436
, 81

,
25
, 22, 138
, 55
, , 425
, , 188
, , 312
, , 473
, , 389


, , 352
(single line), , 61

,
, 398
, 224
, 232

, 133

, 128
, 129

, 238
, 125

, 245
, 247
, CSV, 519
, 104

, 75


,
(Comma-Separated Values, CSV), 519

, 584
, 579


(Namespace Identifier, NID), 451
, 202
, 181
,
194
,
CSV, 579
XML, 518
, 549
,
Windows, 506

Windows, 508
,
511
, , 341
, 96
, 93, 94, 131,
200, 236

(.NET), 299
,
376, 378
, 149
, URL, 460
,
Windows, 504
, URL,
462

, 27
Expresso, 40
grep, 43
myregexp.com, 36
Nregex, 34
reAnimator, 38
RegexBuddy, 28
RegexPal, 31
Rubular, 35
The Regulator, 41
, 47
, 315

601


,
, 142
, 109

, 72, 74
, 284
, 98
, 573
, 98
,
97
, 72, 73
, 77
, 76
, 81

HTML, 515
XML, 569, 570, 574
INI, 520
, 122
(IPv6), 482,
490
, 483, 485, 493, 494
, 377

, 423
, 133
ISBN-10, 334
ISBN-13, 334

, 134
, 142
, , 49
, 51
, 351

, 100
, 100
,
104
,
, 288
, 49
, 55
, 403
,
, 85

, , 85
, 100
, 100,
102
,
104
, 515, 557
,
515, 557
(multiline) , 66

, 317
, 51

, 90

,
113, 371, 374
,
113, 375

Python, 148
, , 52
(?:), 279
, 89
, , , 161
, , 425
, , 345
, , 325

, 418

, 91
, 94, 96
, 384
, 131
, 232
, 387
, 208
, 58
, , 151
(HTML),
516
, 314

602
, 315
, 317
, 85
(Perl), 147

(Perl), 146
, 112, 313
, 113, 371, 374
, 112
, 188
Windows, 500, 501
, 55

, INI, 520
,
, 159
, 222
, ,
391, 393
, 98
, 58


, 100

, 97
, 108
, 232
,
, 402
, , 387
,
112

, 386
, 112
, , 188
, 397
URL , 442, 444, 446
, , 339

XML, 560
, 269
, 364, 367, 371, 373, 375, 379,
387, 361
, 389, 395, 397
XML, 521


, 409, 413, 416
, 25
, 124
, 224
,

, 238
HTML
, 557


, 245

, 247
, 232
, 541
, CSV, 519
, 584
, URL, 465
, , 390, 393
, 269
, , 367
,
, 338
338
, 336
, 134
, 108

, 341

, 123
, 74
, , 402
, 57

,
398

, 353, 355
, 345, 348
, 283
URL, 438
, 274

603



, 168
,
300
, 290
ISO 8601,
303
, 473
, 352
XML, 570
, 288
,
325
, 215
Windows, 495
, 112
, 175
, 111, 313, 362
, 113
, 121
URN, 449
, 282
URL, 452
, , 119
HTML, 537
, 312, 317, 323, 328, 336,
345, 351
ISBN, 328
, 345
,
325

- , 308
ANSI, 310
, 312
,
317
, 336
, 323
, HTML,
557
, 384
(Windows)
, 498
, 503
, 504
, 506
, 508

, 510
, 495
,
511
, URL, 467

, 253
, 264
CSV, ,
579
, INI, 520
, 58
, 321
HTML, 557
(), Windows, 510

(Extensible Provisioning
Protocol, EPP), 290

(Extensible Markup Language, XML),
517
<table>,
566
, 574
, 521, 524
, 549
,
, 545
, 569
(Extensible Hypertext Markup
Language, XHTML), 516
<table>,
566
, 521
,
, 545

, 278
, 58
,
51

, 403
Perl, 21
, 122
Python, 148
, 222
, 112, 313, 363
, 376, 378

604
, 118
, 113, 375
, 112
, 434

, 222
, , 122
, 161
, 72
^ $
, , 159
, ,
71

ANSI, , 310
ISO-8859-1, , 310
Windows-1252, ,
310
, 72
, 57, 70
, 55, 284
, 123
, 82
, 58
, 58
, 58
, 56
,
, 379
XML- , 574
,
, 373
, , 375
, , 371
, 364
, 361
, 387
, 367
, , 397
, , 395
, 68
, 403
(IPv6), 480,
489
, 483, 485, 493, 494


,
219
, 66
, 151
, 56
, 397
, 51, 278
/ , 62
, 245
, 133
,
232
, 208
, 219
, 119
, 222
IPv4, 476
IPv6, 479
, 222
INI,
588
XML, 549
, 49
, 59
, 52
,
85
, 54
- INI,
591
, 97, 100, 108
INI, 589
, 91
, 521
, 68
, 390, 392
, 87
, 194
, 93, 200,
236, 299
, 131
, 118
, 384
, 232

605


, 91
, 387
, 119
, 131
(IPv6), 479

(PHP), 146

C#, 144
Java, 145
Perl, 147
PHP, 146
VB.NET, 145

(Python), 148
URL, 438
, 312
URN, 449
, 253
, 403
,
URL, 471
, 389
, 397
, 395
, 390, 392

, 398
, 389
, 402
, URL, 458
(Python), 148

( ), 514
<table>,
566
, 541
, 521
,
, 545

, 288
, 282
(dot all), , 61
, , 61, 159

, 418

, 511
, 402
URL, ,
452
(Uniform
Resource Name, URN), , 449

, 76
, 381
, 159
, 323

(INI), 520
, 377
, 97
ISO 8601, 303

, 300, 303
, 290, 303
, 282
, 341

, 290
, URL,
472


, ,
419, 427
, 409
, 433
, 418
(), 411
, 75
, 57

IPv4, , , 476
IPv6, , , 479
, 429
, , 416

606
, , 352
ISBN, , 328

, ,
345
, 325
, 434
, 433
, 418
, 409, 419
, 427
, 413
, ,
515

, , 413
, 427

, 49
, 50, 407
, 55
, 403
, 125
, ,
274
, , 514

,
, 71
, , 80
(Hypertext
Markup Language, HTML), 513
<table>,
566
<p>
, 557
<b> <strong>,
541
, 521
, 537
,
, 545
, 63

- - Books.Ru
ISBN 978-5-93286-181-3,
. Books.Ru .
- , . ,
- (piracy@symbol.ru), .