You are on page 1of 598

Mastering

Regular Expressions
Third Edition

Jeffrey E.F. Friedl


2008

, 3
. .

.

.
.
.
.
.
.

.
, 3 . . . .: ,
2008. 608 ., .
ISBN13: 9785932861219
ISBN10: 5932861215

. 

15 . 
, Perl, PHP, Java,
Python, Ruby, MySQL, VB.NET, C# ( .NET), 

.
PHP 
. ,
,
java.util.regex Sun, 
Java 1.4.2 Java 1.5/1.6.
, 
, 
, !



. ,
, .
ISBN13: 9785932861219
ISBN10: 5932861215
ISBN 0596528124 ()
, 2008
Authorized translation of the English edition 2006 OReilly Media, Inc. This trans
lation is published and sold by permission of OReilly Media, Inc., the owner of all
rights to publish and sell the same.
, 
. 
, , .

. 199034, , 16 , 7,
. (812) 3245353, www.symbol.ru. N 000054 25.12.98.
25.07.2008. 701001/16. .
38 . . 2000 .

199034, , 9 , 12.

.
, ,
.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1. . . . . . . . . . . . . . . . . . . . . 24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
. . . . . . . . . . . . . . . . . . . . 29
,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
: egrep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
egrep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . . . . . 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
: . . . . . . . . . . . . . . . . . . . . . . . . . . 43
. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
. . . . . . . . . . . . . . . . . . . . . . . . 52
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Perl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
. . . . . . . . . . . . . . . . . . . . . 67
. . . 70
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
. . . . 77
: . . . . . . . . . . . . . . . . . . . . . . . . 78
: . . . . . . . . . . . . . . . . . . . . . . . 79
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
. . . . . . . . . . . . . . . . . . . . . . . . . 88
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
. . . . . . . . . . . . . . . . . . . . . . . . . . 108

3. : . . . . . . . . . . . . 114
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
. . . . . . . . . . . . . . . . . . . . . . 116
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
. . . . . . . . . . . . . . . 126
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
 . . . . . . . . . 127
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
. . . . . . . . . . . . . . . . . . . . . . 149
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
. . . . . . . . . . . . . . . . . . . . . . 176
, ,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

4. . . . . . . . . . . . . . . . . 186
! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
. . . . . . . . . . . . . . . . . . . . 188
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
1: . . . . . . . . . . . . . . . 191
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
2: . . . . . . . . . . . 195
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
: , . . . . . . . 198
: , . . . . . . . . . . . . . . . . . . . . . . . 200
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
. . . . . . . . . . . . . . . . . . . . . . . . . 209
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
, . . . . . . . . . . . 215
. . . . 216
?+, *+, ++ {max,min}+ . . . . . . 219
. . . . . . . . . . . . . . . . . . . . . . . . . . 220
? . . . . . . . . . . . . . . . . . . . . . . . 222
. . . . . . . . . . . . . . . . . . . . . . . 223
, POSIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
, . . . . . . . . . . 225
POSIX ,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

5. . . . 234
. . . . . . . . . . . . . . . . . . . . . . . . . . 235
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
. . . . . . . . . . . . . . . . . . . . . . 245
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
. . . . . . . . . . . . . . . . . . 250
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
HTTP URL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
URL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
, . . . . . . . . . . . . . . . . . . . . . . . 266

6. . . . . . . . . . . . 274
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
. . . . . . . 276
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
. . . . . . . . . 277
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
POSIX . . . . . . . . . . . . . . . . . . . . . . . . . . 283
. . . . . . . . . . . . . . . . . 283
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
. . . . . . . . . . . . . . . . . . 285
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
. . . . . . . . . . . . . 288
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
VB.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Tcl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
. . . . . . . . . . . . . . . . 296
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
. . . . . . . . . . . . . . . . 301
. . . . . . . . . . . . . . . 303
. . . . . . . . . . . . . . . . . . . . . . . 309
, . . . . . . . . . . . . . . . . . . . . . . 310
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
. . . . . . . . . . . . . . . . . . . . . . . . . . 314
. . . . . . . . . . . . . . . . . . . 316

. . . . . . . . . . . . . . . . . . . . . . . . . . 317
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
1:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
. . . . . . . . . . . . . . . . . . . . . . . . . . 322
2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
3: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

. . . . . . . . . . . . . . . . . . . . . . . . . . 327
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
: ! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

7. Perl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
. . . . . . . . . . . . . . . . . . . 345
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
. . . . . . . . . . . . . . . 350
. . . . . . . . . 354
. . . . . . . . . . . . . . . . . . . . . . 354
rl . . . . . . . . . . . . . . . . . . . 355
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

. . . . . . . . . . . . . . . . . . . . . . . . . . 357
, . . . . . . . . . . . . 362
qr// . . . . . . . . . . . . . . . 366

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
. . . . . . . 369

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
. . . . . . . . . . . . . . . . . . . 374

/g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
/e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
. . . . . . . . . . . . . . . . . . . . . . . . . . . 386
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
split . . . . . . . . . . . . . . . . . 390
split . . . . . . . . . . 392
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

. . . . . . . . . . . . . . . . . . . . . . . . . 394
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
local . . . . . . . . . . . . . . . . . . . . . . 402
my . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
. . . . . . . . . . . . . . . . 409
. . . . 412
. . . . . . . . . . . . . . . . . . . . . . . . . 413
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
. . . . . . . . . . . . . . . . . . . . . 417
, /, qr//
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
. . . . . . . . . . . . . . 431
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

8. Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
\p{} \P{} Java . . . . . . . . . . . . . . . . . 441
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
java.util.regex . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Pattern.compile() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Pattern.matcher() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
. . . . . . . . . . . . . . . . . . . . . . . . . 448

13

. . . . . . . . . . . . . . . . . . . . . . . . 449
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
split Pattern . . . . . . . . . . . . . . . . 471
split Pattern . . . . . . . . . . . . . . . 472
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
WIDTH HEIGHT <img> . . . . . . . . . 473
HTML
Matcher . . . . 475
CSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
1.4.2 1.5.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
1.5.0 1.6.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

9. .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
.NET . . . . . . . . . . . . . . . . . . . . . . . . . . 482
.NET . . . . . . . . . . . . . . . . . . . . . . . . . . 485
.NET . . . . . . . . . . . . . . . . . . 490
. . . . . . 490
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
. . . . . . . . . . . . . . . . . . . . . . . . . . . 494
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Regex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Regex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
. . . . . . . . . . . . . . . . . . . . . . . . . 509
. . . . . . . . . . . . . . . . . . . . . . . . 510
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516

10. PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519


PHP . . . . . . . . . . . . . . . . . . . . . . . . . . 521
preg . . . . . . . . . . . . . . . . . . . . 524
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

14

preg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
preg_match_all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
preg_replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
preg_replace_callback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
preg_split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
preg_grep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
preg_quote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
preg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
preg_regex_to_pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
. . . . . . . . . . . . . . . . . . . 561

. . . . . . . . . . . . . . . . . . . . . . . . 562
. . . . . . . . . . . . . . . . . . . . . . . . . . . 563
. . . . . . . . . 563
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
. . . . . . . . . . . . . . . . . . . . . . . 566
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
S: Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
CVS PHP . . . . . . . . . . . . . . . . . . . . . . . . . 569

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575



. 

, .
, 
, , 

.
(
, , . .),

, Java Jscript, Visual Basic VBScript, JavaScript
ECMAScript, C, C++, C#, elisp, Perl, Python, Tcl, Ruby, PHP, sed
awk. 
, .

, 
.
. ,
, , ,
. 
. 
.


1996 .
, . 
,
.
, ,
. 
, , .
, 
,
( ) 

16

. 

. Perl, Python, Tcl, Java Visual Basic

.
( Ruby, PHP C#). 


.
.
, 
, 
. 2002 ,
java.util.regex, .NET Frame
work Microsoft Perl 5.8. 
. , , , 
PHP.
PHP
, .
PHP
; , , 
PHP 
. ,
, Java, 
Java 1.5 Java 1.6.


, 
. , 
,
. ,
.

, 
, . .
, (, 
), 
, , , 
, .

.
, 
( , 
). 

17

,
. ,

, .


, 
, . , 
, 
, 
. .
,
. ,

,
, .
, ,
,
Perl, Java,
.NET PHP. 
.

. ,
(, . 124) ,
. 
,
.
,
, .


:

1 
.
2
.
3 ,
.

4
.

18

5 
4.
6 .

7 Perl.
8 java.util.regex 
Java.
9
.NET.
10 preg, 
PHP.



. , 
3 
.

1
. 

egrep. , 
,
.
, ,
.

2 
,
.

, 
.
,
, 

.

3 : 
, 
. 
, , 
, .
, 
, .
.

19

, 
.


, 
? ?. ,
, 
.
4
. 

. ,
,
.
5 

. 
( ) 

.
6 

, .
, 4 5,

, .


4, 5 6, 
.
,
.
7 Perl Perl , 

. Perl 
,  
, . .
, 
. ,
,
. ,
.
8 Java 
java.util.regex,
Java 1.4.

20

Java 1.5, 
1.4.2 1.6.

9 .NET 
.NET (
Microsoft ). 
VB.NET, #, C++,
JScript, VBScript, ECMAScript .NET.

10 PHP 
,
PHP,
preg, PCRE.



(, , ).

, 
.

: this. 
, 
. (, ,
) : this. , 
, 
. 
, 
.


. , [] 
, [. . .] .

,
b. 
, ( 
), . 
, b .


, :

TAB

NL

CR

21



. :
... cat Itindicatesyourcat
is, cat, ...

.
, 
:

... , (Sub
ject|Date . 
: (Subject|Date): .

, 1200
, .
: 123, 
123.
: ... . 8.2 (439).


, 
. , ; 
, , 
. , .
, . , 
. 
, ,
, 
.
. ,
, . ,
, . 
, ,
.

, ,
, 
, ,
URL :
http://regex.info/

22

, 
, .
(
:) ).

 , jfriedl@regex.info.
:
OReilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 9989938 ( )
(707) 8290515 (/)
(707) 8290104 ()
bookquestions@oreilly.com
, , 
OReilly Network :
http://www.oreilly.com

Safari Enabled
Safari Enabled,
, OReilly Net
work Safari Bookshelf.
Safari , .
,
, , 
,
. http://
safari.oreilly.com.


,

.
,
 .
, 
. , ;
,
, .

23


. 

.
(Stephen Friedl)
, 
. ( , , 
, Tech Tips http://
www.unixwiz.net/.)
(Zak Greant), (Ian Morse),
(Philip Hazel), (Stuart Gill), 
. (William F. Maton) (Andy
Oram).
Java 
madbot  (Mike madbot McCloskey) ( 
Sun Microsystems, Google), (Mak
Reinhold) (Dr Cliff Click), Sun Mi
crosystems. .NET 
(David Gutierrez), (Kit George)
(Ryan Byington). PHP
(Andrei Zmievski) Yahoo!
(Ken Lunde) Adobe Systems,

.1 Heisei Mincho W3, 
Munhwa
.  
: ,
.
http://regex.info
(Jeffrey Papen) Peak Web Hosting (http://
www.PeakWebhosting.com/).

. . .


: 
(, this this). 
, 
. :
; 
,
; ( Escape
ANSI) 
.
,
, , 
.
,
(, The the) (,
, . .) .
, HTML.
HTML , 
: it is <B>very</B> very
important.
! ,
. 

.
, ,
, 
.

25

, 
. 
, 
.

, ,

. , 
, ,
. , 
. 
, 
.

.
/ 
. , 
( ,
). , 
, .
Perl Java.
(Perl, Java, VB.NET . .) 
, . 
, 
, , 
. 
,
( , , 
. .).


,
, , . 
,
( , 
).
, , 
, 
. , 

, 
,
.

26

1.

. 
( 70 )
, SetSize
, ResetSize.
, (. .
setSIZE SetSize ). , 
32 000 , , 
. 
,
.
! 
, 
. 15 
2 . ! ( 
, , . 62.)
:
, ,
. 
, 

,  . 

, .
, .
! 
( egrep, 
) From: Subject: .
egrep, ( )
,
^(From|Subject):.
, 
, 5000 !

. (sed) 
, 
.
.

. ,
, 
. 
, , .
,
, , 
.

27

,
, .1

. , , 
. , ,
.


, ,
^( From|Subject ): 
.
. 
, 
. , 
, 
. 
, 
.


, ,
, .
, .
, (
, report.txt). UNIX DOS/Windows
,
*.txt. ( (
, (wildcards))
, . (*) 
, (?) 
. , *.txt * 
.txt.
,
.txt.

, 
. , ,

.
,
. , , , HTML,
, ... .
1

TiVo, !

28

1.

, (, 
) .

, 
.  
,
.


.
( * ) .
, . . , (
. 

.
, ,

.

, ,
.
,
. , 
^ (From|Subject):
, From: Subject:.
, .
( 
) .
, 
. ,
!1 
,
s!<emphasis>([09]+(\.[09])+){3})</emphasis>!<inet>$l</inet>!
1

!
: 3 , (
. ,
, , ,
. 

,
, .
, 
,
. ,
, !

29

.
Perl, 
.
<emphasis> IP ( 
, , 209.204.146.22).
Perl, /
,

<emphasis>([09]+(\.[09])+){3})</emphasis>

<inet>, <emphasis> 
. ,
,
.


 <emphasis> 
<inet>, ( (
. ,
.
,
.


,
. , 

. 
, ,

. , 

, ,
.
( 
), ,
, . 
, .
. .

,

, ,
 , , ,
. 

30

1.

,  
.


. 
, , ,
, , 
.

: egrep



. egrep.
egrep
. 
,
. egrep
, DOS, MacOS, Windows, UNIX . .
. ,
,
. 1.1. egrep 
,
. : , . 1.1,
, 
.1 egrep 
.
, ( 
)

. ,
.
1

, 

. , , 
. , 

. ,
, * 
, 
, egrep. , ,

. COMMAND.COM CMD.EXE
Windows .

31

egrep

,
egrep

% egrep '^(From|Subject): ' mailboxfile


. 1.1. egrep


, , , 
, 
. , ^ | 

.
,
, egrep,
. 
, cat
, cat. 
, ,
vacation.
cat, cat
vacation .
, 
, egrep .
, egrep
, 
, , 
.

egrep
, egrep
.
, .
,
.
, , 
.
, 
.

32

1.


, ^ (, 
) $ (), 
. , 
cat cat
, ^cat
, cat ^
( )
. , cat$ cat
, ,
scat.

. , :

^cat , cat.
:

^cat , , 
, ,
t.
, 
,
. ^cat$, ^$ 
^? 
, .
^ $ , 
, . ,
.
 
, .



, grey,
gray. [], 
(character class), 
, . e
, a
, ea .
, gr[ea]y : g,
r, e a, 
y.

. ,

sep[ea]r[ea]te, , 

33

egrep

seperate, separate, separete 


.
, 
(, g r gr[ae]y)
g, 
r . .). 
. , 
,
.
:

[Ss]mith. , 
, smith ( Smith)
, blacksmith.
, . 

, .
. ,

[123456] .
<[123456]>,
<1>, <2>, <3> . . 
HTML.
 ((
) ; , <H[16]> 
. [09] [az] 
.

^cat$, ^$ ^
. 32.

^cat$ : , (
, ),
cat,
.
: cat 
, , 
... cat .

^$
: , ,
.
: (
, ).

^
: , .
: ! 
, !

34

1.

,
[0123456789abcdefABCDEF] [09afAF] ( [A
Faf09], ). 
. 
: [09AZ_!.?]
, , 
, , .
:

. ,
. 
, ,
. , 

, .
, [09AZ_!.?]
.

. ( )
.
.


[] [^],
, . , [^16]
, 1 6. 
^  , 
, , 
, .
, , 
^, . 
, . , 

; . 
. 

( ).
^
, ,
, ( 
). 
,
; .
. , 
q u.

egrep

35

, q  

q[^u]. . 
, ! , 
.
:
% egrep 'q[^u]' word.list
Iraqi
Iraqian
miqra
qasida
qintar
qoph
zaqqum%

Qantas ( ) Iraq.
word.list, 
. ? ,
, .
:
, , 
, . ,
, Iraq
. 
, 
, .


. ()
, . 
, 
. , 
, 19/03/76, 190376
19.03.76. , 
, 
(/,  .), 19[./]03[./]76. 
19.03.76.
. 
19[./]03[./]76 ,
( : 

). ,
[ [^.
(, [./]),
, .

36

1.

19.03.76 , 
, /,  ..
, 
, 
, , lottery numbers: 19 219303 7639.
, 19[./]03[./]76 
, .

19.03.76 , .
? , 

. 35.
q[^u] Qantas Iraq?
Qantas ,
q , Qantas
. Q[^u], 
, . 
[Qq][^u] .
Iraq . 
q, ,
u, .

, egrep 
(, , !), q
. ,
u, .
 , 
.1 : egrep
( )
Iraq ,
, . 

, : (
, , (
( .
1


, miss. : miss.
, , :
Miss M 
. 
.
.

egrep

37

.

. , , 
19.03.76
, 
. , 
.


| . 
,
. , Bob Robert 
, a Bob|Robert ,
. , , 
(alternatives).
gr[ea]y. :
grey|gray gr(a|e)y.

(, , ). 
gr[a|e]y 
| , a e.
gr(a|e)y ,
gra|ey gra ey , 
. .
: (First]|1st)[Ss]treet.1 ,
First 1st st,
Fir|1)st[Ss]treet. , 
, , (first|1st) (fir|1)st 
.

. , 
:

Jeffrey|Jeffery
Jeff(rey|ery)

Jeff(re|er)y

,
, :

(Geoff|Jeff)(rey|ery)
(Geo|Je)ff(rey|ery)

(Geo|Je)ff(re|er)y

, .

38

1.

,
( ) Jeffrey|Geoffery|Jeffery|Geoffrey.
.
, gr[ea]y gr(a|e)y
. 
. 
. 
, 
. , ,
 (, ,
), 
.
, .
, ^ $ 
. :

^From|Subject|Date: ^(From|Subject|Date):. 
,
( , ). 
; ^From, Subject,
Date: . ,
^ : .
:

^(From|Subject|Date):

, 
: ,
From, Subject Date :. 
:
1. , From, :
2. , Subject, :
3. , Date, :.
, , 
From:, Subject:, Date:, ,

.
:
% egrep '^(From<Subject<Date): ' mailbox
From: elvis@tabloid.org (The King)
Subject: be seein' ya around
Date: Mon, 23 Oct 2006 11:04:13
From: The Prez <president@whitehouse.gov>
Date: Wed, 25 Oct 2006 8:36:24
Subject: now, about your vote
.
.
.

egrep

39


,
.

(, Subject From), 
, 
DATE from . , 
.
From [Ff][Rr][Oo]
[Mm], from, 
. ,
egrep 
, . . . 
;
, . 
egrep 
i. 
:
% egrep i '^(From;Subject;Date): ' mailbox

, ,
:
SUBJECT: MAKE MONEY FAST

i 
. 
.


, , 
, 
, . 
cat, gray Smith. ,
egrep , 
egrep , 
( ).
\<
\>, egrep. 
^ $ , 
, , . ^ $,


. \<cat\> 
, 
cat, . ,

40

1.

That

dang  tootin ' #@!%*

, \<

varmint' s

cost

, \>

me $ 199 . 95!

. 1.2.

cat.
\<cat cat\> , 
cat.
: < >
+ 
. .
, 
, 
.
, egrep
,
.
, 
 ; 
, . . 1.2 
.
( egrep) , 
; , .
,
 ,
.


. 1.1 , 
.
.

, 
,
( ). ,
, . ,
( ) ,
. , ^ 
, 
[ .

41

egrep

1.1.



 ,

[]
[^]

^
$
\<
\>

+
+



*
*
*

egrep
|
()


|, 

. [abc]
(a|b|c) ,
.
, 
.
, 
, 
: \<(1,000,0001|million |thousandthou)\>. 
, .

,
, . ,

[^x] , x,
, 
x. 
, . 
,

[^x] .
i 
( 39).1
, , 
,
, .

, 
, , 39 .

42

1.


color colour.
u, 
colou?r. ? ( ) ,
. 
, 
, 
.
, ,

. , colou?r c,
o, l, o, u? r.
u? :
u, 
. ,
?, . 
, , ?, 
. , semico
lon colo u? 
( colo u, ). 
r ,
semicolon colou?r.
, 4 (Ju
ly), July July Jul,
fourth, 4th 4. ,
(July|Jul)(fourth|4th|4),
.
, (July|Jul) (July?). 
, . 
| , .
, 
, July?
. July?(fourth|4th|4).
. 4th|4
4(th)?. , ? 
.
, . 
? ( , 
) .

, July?(fourth |4(th)?.

, . 
, 
, (, , 

egrep

43

) .
, ,
. (
)

.
, .

:
+ () * ((
). +
, a * 
( ). , *
, ,
. +
( 
), 
. ,
, ?,

+ *, ,
, .
*, ?, . 
, ( )
. +,
.
, ?
, * .
. 33
<H[16]> . HTML1 , 
> 
, <H3> <H4>. * 
, ( )
, <H[16]*. 
<H1>, ,
.
HTML
<HRSIZE=14>, ,
14 . <H3>, 
. 
, =. , 
1

HTML, . , 
, 
. , HTML,
, .

44

1.

HR SIZE,
. *,
+. , 
. 
*, .
<HR+SIZE*=*14*>.

, . 
(, 14)
. 14
.
[09], 
+, 14
[09]+. 
, +, ? . . 
.
<HR+SIZE*=*[09]+*>
, 
. , egrep
i ( 39), 
[Hh][Rr] HR. 
<HR +SIZE *= *[09]+ *> 
. ,
, 
. 
,
, , , j 4 ( , 
, 
, egrep ).

. , 
,
<HR> ( , >
). 
, ?
,
( ). , .
(
), , ?, * +
.
. 1.2.
: 
,
( ).
.

45

egrep

1.2.
!


; (
)


;
( 
)


; 
( 
)


egrep

: {min, max}.
. , {3,12} 12 ,
, 3 . 
[azAZ]{1,5}
( ).
{0,1} ?.

egrep. ,
3
, .



: |
(
, ? *). 
,
egrep ( GNU),
.

, 
.
, .
, 
, thethe. ,
thetheory, , 

46

1.


. 44.
, 
, .
?. 
, 
: ()?
:

<HR(+SIZE*=*[09]+)?*>

: * .
,
<HR>. , 

SIZE.
, + SIZE 
. , 
HR
SIZE. <HR> .
egrep 
, . 39: \<thethe\>.
+,
.

. , : 
. egrep 
(backreferencing), 
. ,
, ,
.
\<the+the\> the
[AZaz]+.
, , 
. , the
\1. \<([AZaz]+)+\1\>.

, ,
\1 (
) .
,
\1, \2, \3
. .

egrep

47

, ([a
z])([09])\1\2 \1 , [az],
a \2 , [09].
thethe [AZaz]+ 
the. , 
the \1
+ , \1
the. , \> , 
( 
thetheft). 
, . ,
(, 
that ),
.
, 
( egrep \<\> ).

Thethe, i,
. 39.1
:
% egrep i '\<([az]+) +\1\>'

, 
!

, 
.

. egrep 
,
, 
. , 
.


, 
? , ega.att.com 
ega.att.com 
megawattcomputing. , 
: . .
1

, GNU egrep i 
,  .
, egrep the the, The the.

48

1.

, ,
\: ega\.att\.com. 
\. (escaped) . 

.1

. 
+ 
, .
: (, (very))
\([azAZ]+\). 
\ \( \)
() ,
.
\ , 
, 
. , ,
egrep \<, \>, \1 . . 
. .


,
. , , ,
.


, 
egrep. 
, .
.
, ,
, . 

 . 
, 
(flavors) .
.


, 
egrep , \ 
, .

49



( egrep ), 
.
, :

, ;

, .

, egrep , 
, 
. , ,
:
zip Is 44272. If you write, send $4.95 to cover postage and

[09]+, ,
. (
( , , 
. .
), , ,
.


, ,
,
.

, . ,
.
, 
, . . . 
, 
.



( . .), 
,
. 
[azAZ_][azAZ_09]*. 
, (
*) .
, , 32 ,
{0,31}, 
{min, max} ( 
. 45).

50

1.

,
, , :

"[^"]*".
, , 
. 
, ... ! 
[^"] , ", 
, .
( )
",
\ , "nailthe2\"x4\"plank".
,
, .


( )
: \$[09]+(\.[09][09])?.

: \$, + ()?.
 , 
,  .
 (
),  ,
.
. , 
,
$1000, $1,000. 
, 
egrep. egrep , . 

.
, ,
, 
^$.
, ( )
,
.
, $.49.
+ *, .
?
5 ( 245).

51

(URL) HTTP/HTML
URL , 
URL 
.
URL 
. ? 
, URL, 
, .
URL HTML/HTTP 
:
http:///.html

.htm.
, , ( )
(. . , www.yahoo.com),
, ,
http:// ,
[az09_.]+.
,

[az09_:@&?=+,.!/~+%$]*. : 
,
, ( 34).
, 
:
% egrep i '\<http //[az09_.:]+/[az09_:@&?=+,.!/~*%$]*\.html?\>'

,
http://..../foo.html, 
, , URL.
? , . 

. ,
:
% egrep i '\<http://[^ ]*\.html?\>'

,

. URL
.

HTML
,  , 
HTML, egrep.
HTML 
, 
.

52

1.

<TITLE> <HR>
<.*>.
. , . 
<.*> , : <, 
, 
>. ,

, this <I>short</I> example.
,
, 
. ,
, 
. 
, .

9:17 am 12:30 pm

. ,

[09]?[09]:[09][09](am|pm)

9:17am 12:30m, 
99:99m.
, ,
. 1?[09] 
19 ( 0 ), 
: 1[012] [19] .
(1[012]|[19]).
. 
[05], [09].
, (1[012]|[19]):[05][09](am|pm).

24 ,
0 23. , 
, , 09:59.
, 
.


(regex)
, 
(regular expression) , 
, . 
regex ().
( FedEx, g (), 
regular (), , Regina)

53

,
..., .
,
, .

, ,
, . 
a cat,
. 
, .

(
) , 
. , * 
, 
. ,
. , . , 
\* \\* (
\ ),
.

, 
. 3 .

, 
,
, , .
. egrep 
\<\>, .
,
\b (
). 
. ,
, .

. 

.

\<\>, ,
. (
, .
. ,
,

54

1.

24
. 52.
, 
.
: ( 00 09 , ),
( 10 19 ) ( 20 23 ). 
: 0?[09]|1[09]|2[03].
, ,
: [01]?[09]|2[03].
, 
. ,
, , 
.
[01]?[09]|2[03]
00 01 02 03 04 05 06 07 08 09

[01]?[49]|[012]?[03]
0

00 01 02 03 04 05 06 07 08 09

00 01 02 03 04 05 06 07 08 09

10 11 12 13 14 15 16 17 18 19

10 11 12 13 14 15 16 17 18 19

20 21 22 23

20 21 22 23


. , (
)
. , , 
egrep, .
1990 Perl 
,

Perl ( 
, Perl).
PHP, Python,
Java, Microsoft .NET Framework, Tcl, 
. . 
. Perl
( 
, ). 
.

(subexpression)
, , 
, 

55

. , ^(Subject|Date): Subject|Date
. Subject
Date . S 
, u, b j ...
16 [16]*, 

. , H, 16 * 
[16]*.
, (*, + ?)

. mis+pell
s, mis is. , 
,
( ) .

. , 
, .
, (
. ,
64 53 @ 5
ASCII, , , EBCDIC 
(  
).

.
.
, Latin1
` , 1
A

. : ,
, 
, .

ASCII, 
. 

( 3, 140).
,

. , 3
.
1


(Ken Lunde) CJKV Information Processing. 
CJKV , , (
, .

56

1.


, .

, , 
, 

, .

 , 
.

a*((ab)*|b*) cid.
, ,
, 
,
. 
. , 
.
,

egrep. ,
. , ,
.
, 
,
.
,  , 
, 
. 
.
, .
, 
. ,
, 
.
, .
, , 

. 
, , 
. ,
.
.
:

57

. 

, egrep.

, ,
. 
.

. 

, 
. (
) 
.
, . 
3.

. 
( ) , 
, .
,
. , 
,

. 4, 5 6.


.
,
. 
.
,  : , 
, .
. , 
, 
.
, 
. , ,
.
2 
. 3 
( 
). 4 . 5
. 6 
, ,
.
, ( 4, 5 6),
.

58

1.

. 1.3 egrep,
.
1.3. egrep
,

[]

[^]


\, 

(
)


( )

, 

{min, max}

min , 
max

\<

/>

()

,


\1,\2,...

, , 
. .

egrep.

59

, 
:

egrep .

.
( 38),
( 45) ( 37).


,
( 34).

. , 
( 37).


, . 
,
, ( 34).

i
( 39).

:
1. \ + , 
(, \* 
).
2. \ + ,
(, \<
).
3. \ + ( 
, \ ).

, egrep

.
, ? *,

. , 
( 43).


, , 
, 

, egrep, . 
, ,

60

1.


. , 
, ,
, . ., : 
?
(schaff(
kopf) ,
. ,
, . 
, , 
: ,
? 
. ,
. , ,
, 
.
,  ,
. ,

, 
. , 
. , 
,
.

2

?
, Perl 
. , :
$/ = ".\n";
while (<>) {
next if !s/\b([az]+)((?:\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$3\e[m/ig;
s/^(?:[^\e]*\n)+//mg; # .
s/^/$ARGV: /mg;
# .
print;
}

, .
Perl, ,
(!). , , ,
egrep
. 
:

\b([az]+)((?:\s|<[^>]+>)+)(\1\b)
^(?:[^\e]*\n)+
^

Perl,
( 
) , PHP, Python, Java, Visual
Basic .NET, Tcl . .
, ^ , 
, egrep. , Perl egrep
.
, Perl ( 

62

2.

) . 
.


 (
; ;
HTML) 
.
, 
, . 
, 
egrep, 
.

, PHP, Java VB.NET, 
Perl. ( )

egrep, 
 .
Perl  ,

. , Perl
, 
.
, . 26, 
, ResetSize 
SetSize. 
Perl, :
% perl 0ne 'print "$ARGV\n" if s/ResetSize//ig != s/SetSize//ig' *

( , , ,
).
Perl, .
: .
, :
, 
Pascal1.
Perl,
,
( 7,
1

Pascal , 
. . (William F. Matton)
.

63

Perl, ).
, Perl
;
.
Perl
,
. , 
Perl. (
.

Perl
Perl 1980 
, .

, awk sed, 
Pascal.
Perl , DOS/Windows, Mac
OS, OS/2, VMS Unix. 

WWW. ,
Perl ,
www.perl.com.
Perl 5.8,
5.005.
:
$celsius = 30;
$fahrenheit = ($celsius * 9 / 5) + 32; #
#
print "$celsius C is $fahrenheit F.\n"; #

:
30 C is 86 F.

( $fahrenheit $celsius)

( ). 
# .
, #, Java VB.NET,
, Perl 
, . "$celsius is $fahrenhe
it F.\n" .
( \n
).
Perl 
:

64

2.
$celsius = 20;
while ($celsius <= 45)
{
$fahrenheit = ($celsius * 9 / 5) + 32;

#
#

print "$celsius C is $fahrenheit F.\n";


$celsius = $celsius + 5;
}

, while, 
, ( $celsius <= 45) .
(, temps),
.
:
% perl w temps
20 C is 68 F.
25 C is 77 F.
30 C is 86 F.
35 C is 95 F.
40 C is 104 F.
45 C is 113 F.

w 
. Perl
, 
(, 
. . Perl ).
, .
, Perl 
.


Perl 
. 
, . 
$reply ,
:
if ($reply =~ m/^[09]+$/) {
print "only digits\n";
} else {
print "not only digits\n";
}

. 
^[09]+$, m// 
Perl, . m (
, / 

65

.1 =~ 
m// , (
$reply).
=~ = ==. == ,
( , 
eq). = , 
$celsius = 20. , =~
, ( 
m/^[09]+$/,
$reply). 
,
.
=~ , .
, :
if ($reply =~ m/^[09]+$/)

:
^[09]+$ , 
$reply, ...
$reply =~ m/^[09]+$/ true,
^[09]+$ $reply, false
. if
.
: $reply =~ m/[09]+/ ( ,
, ^ $) true,
$reply . ^$ ,
$reply .
. 

, , . 
, 
.
:
print "Enter a temperature in Celsius:\n";
$celsius = <STDIN>; #
chomp($celsius);
# $celsius
if ($celsius =~ m/^[09]+$/) {
$fahrenheit = ($celsius * 9 / 5) + 32; #
1

m 
. : $reply =~ /^[09]+$/.
, Perl, 
. , m 
, 
.

66

2.
#
print "$celsius C is $fahrenheit F\n";
} else {
print "Expecting a number, so I don't understand \"$celsius\".\n";
}

\ 
print. 
, . 
,
. Perl
,
Java, Python . 
( 71)
. VB.NET, 
(\"),
("").
, c2f. 
:
% perl w c2f
Enter a temperature in Celsius:
22
22 C is 71.599999999999994316 F

 . , print
.
Perl,
,

printf:
printf "%.2f C is %.2f F\n", $celsius, $fahrenheit;

printf printf format


Pascal, Tcl, elisp Python. 
, . 
:
Enter a temperature in Celsius:
22
22.00 C is 71.60 F


, ,
. 
Perl
. 
,
. ?

67

[+]?, .

, (\.[09]*)?. \. 
, \.[09]* 
,
. \.[09]*

( )?, ( 
\.?[09]*, 
, 
\.).
, :
if ($celsius =~ m/^[+]?[09]+(\.[09]*)?$/) {

, 32, 3.723 +98.6. 


, , 
(, .357). , 
(. . 0.357),
.
, 5 ( 245).


, 
, . 

F. , 
[CF]. 
, 
.
, egrep 
\1, \2, \3, 
, 
( 45). Perl

, 
, 
.
, ,
( 178), Perl 
$1, $2, $3 . ., ,
, . . .
, 
. . Perl

.

68

2.

: \1
, 
, $1
.
,
,
. , $1 
, :
$celsius =~ m/^[+]?[09]+ [CF] $/
$celsius =~ m/^([+]?[09]+)([CF])$/

?
, :
* 

|
, 
.
,
. . 2.1, $1 , $2
(C F).  . 2.2 ,

.

print "Enter a temperature (e.g., 32F, 100C):\n";
$input = <STDIN>; # .
chomp($input);
# $input .
if ($input =~ m/^([+]?[09]+)([CF])$/)
{
# . $1 , $2  "C" "F".
$InputNum = $1; # ,

$celsius =~ m/^([+]?[09]+)([CF])$/
$1

. 2.1.

$2

69


$type = $2;

# .

if ($type eq "C") { # 'eq'


# ,
$celsius = $InputNum;
$fahrenheit = ($celsius * 9 / 5) + 32;
} else {
# . .
$fahrenheit = $InputNum;
$celsius = ($fahrenheit  32) * 5 / 9;
}
# , :
printf "%.2f C is %.2f F\n", $celsius, $fahrenheit;
} else {
# , .
print "Expecting a number followed by \"C\" or \"F\",\n";
print "so I don't understand \"$input\".\n";
}

convert, 
:
% perl w convert
Enter a temperature (e.g., 32F, 100C):
39F
3.89 C is 39.00 F
% perl w convert
Enter a temperature (e.g., 32F, 100C):
39C
39.00 C is 102.20 F


(
)

. 2.2. (

70

2.
% perl w convert
Enter a temperature (e.g., 32F, 100C):
oops
Expecting a number followed by "C" or "F",
so I don't understand "oops".



, Perl, 

. 
: ,
, f c 
, .
98.6f.
, 
(\.[09]*)?:
if ($input =~ m/^([+]?[09]+(\.[09]*)?)([CF])$/)

:
. 
, . 
,
, 
. 
,
$2. . 2.3.
, 
[CF] 
, . ,
, ,
, , $type $3
$2 (, 
).
$1
$2

$3

$input =~ m/^([+]?[09]+(\.[09]*)?)([CF])$/
1#

2#

. 2.3.

3#

71

.
,
, *,
(
):
if ($input =~ m/^([+]?[09]+(\.[09]*)?) *([CF])$/)

,
, 
,
(whitespace).
. ,
TAB * ,

: [ TAB ]*.
, (*| TAB *)?
, 
.

TAB . , 
.  [
]*,
, , 
.
Perl \t,
. 
, 
. , [ TAB ]* [\t]*.
: \n ( 
), \f ( ) \b (). , \b 
, .
? 
.


\n , 
, . Perl
, 
. 
(, , VB.NET 
).

. , \t 
,
\t , 
.

72

2.

: (?:)
. 2.3 (\.[09]*)? 
, 
? \.[09]* 
. , 
, 
$2, . ,

,
( ),
.
Perl,
. 
() ,
(?:), 
. 
(?:,
.
? (
. 122 , ).
, :
if ($input =~ m/^([+]?[09]+(?:\.[09]*)?)([CF])$/)

, [CF] 
, 
$2, (?:) 
.
. 

( 
6). 

, .
, (?:)
. 
? 
, . ,
, 
( 
).
() 
() , 
, .

73

. 71.
[ TAB ]* *| TAB *?
(*| TAB *) *, TAB *. 
, ( ),
( ). 
.
C , [ TAB ]* [ TAB ], 
. TAB
, .
[ TAB ]* (| TAB )*,
, 4, 
.

, ,
. \t
, , 
, , 
, .

. 1 egrep , 
, . egrep
, 
. ,
, ,
.
, 
, 
( DOS ).
.

, egrep
(. 
$, *, ? . ., 
.
, 
Perl 
, 
. 
, 
.

74

2.

\b? Perl 
, 
. ,
Perl  
( ).

( \s)

[\t]*. , 
: \s.
\t, , 
\s
, .
, , . 

, \s* , [\t]*.

\s* .
:
$input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/

, , 
. 
: [CFcf]. , 
:
$input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/i

i . m// 
Perl ,
. i ,
m// Perl,
. 
, i 
egrep ( 39).
, i , 
/i ( 
/ ). /i 
Perl;
, , 
.
, /g (
) /x ( 
), .
:
% perl w convert
Enter a temperature (e.g., 32F, 100C):

75

32 f
0.00 C is 32.00 F
% perl w convert
Enter a temperature (e.g., 32F, 100C):
50 c
10.00 C is 50.00 F

! 50 , 
50 ! 
. , ?
:
if ($input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/i)
{
.
.
.
$type = $3; # $type $3
if ($type eq "C") { # 'eq'
.
.
.
} else {
.
.
.

, f
, 
. , $type
C, ,
.
c, $type:
if ($type eq "C" or $type eq "c") {

, ,
:
if ($type =~ m/c/i) {

, .
. 
, 
.

print "Enter a temperature (e.g., 32F, 100C):\n";
$input = <STDIN>; # .
chomp($input);
# $input .
if ($input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/i)
{
# . $1 , $3 "C" "F".
$InputNum = $1; # ,
$type = $3;
# .

76

2.
if ($type =~ m/c/i) { # Is it "c" or "C"?
# ,
$celsius = $InputNum;
$fahrenheit = ($celsius + 9 / 5) + 32;
} else {
# , . .
$fahrenheit = $InputNum;
$celsius = ($fahrenheit  32) * 5 / 9;
}
# , :
printf "%.2f C is %.2f F\n", $celsius, $fahrenheit;
} else {
# , .
print "Expecting a number followed by \"C\" or \"F\",\n";
print "so I don't understand \"$input\".\n";
}


Perl,
, 
:
1. Perl 
egrep; 
. Perl 
, 
. (Java, Python, .NET Tcl) 
, Perl.
2. 
Svariable =~ m/ _/. m
(match), / 
, . 
, true false.
3. ( ) 
.

, ,
.
( , ,
. .), 
Perl, PHP, Java, Tcl, GNU Emacs, awk, Python
. ,

 .
4. , 
Perl, (
).

77




(, 
, , . .)
\S , \s
\w [azAZO9_] ( \w+
)
\W , \w, . . [^azAZ09_]
\d [09], . .
\D , \d, . . [^09]
5. /i 
.
/i, 
i.
6.
(?:).
7. $1, $2, $3
, 
.
(

; 
).

, 1.
, (Washington(DC)?). (
) ()
, 
.
\t
\n
\r
\s



, ,
, , . 
, ,
Perl .
, $var =~ m//
true
false . $var =
=~ s///
: $var, 
.

78

2.

, m//,
( /)
, . , 
( , 
$1, $2 . .,
).
, $var =~ s/// 
$var (, , 
). , 
$var JeffFriedl,
$var =~ s/Jeff/Jeffrey/;

$var JeffreyFriedl.
$var JeffreyFriedl,
JeffreyreyFriedl.
.
, egrep

\< \> . Perl


\b:
$var =~ s/\bJeff\b/Jeffrey/.

. s///,
m//, (, /i . . 74).
. 
?
$var =~ s/\bJeff\b/Jeff/i;

:
,
. ,
,
:
Dear =FIRST=,
You have been chosen to win a brand new =TRINKET=! Free!
Could you use another =TRINKET= in the =FAMILY= household?
Yes =SUCKER=, I bet you could! Just respond by.....

, 
:
$given = "Tom";
$family = "Cruise";
$wunderprize = "100% genuine faux diamond";

:
$letter =~ s/=FIRST=/$given/g;

79

$letter =~ s/=FAMILY=/$family/g;
$letter =~ s/=SUCKER=/$given $family/g;
$letter =~ s/=TRINKET=/fabulous $wunderprize/g;

, , 
, .
Perl 
, . , 
s/=TRINKET=/fabulous $wunderprize/g 
, "fabulous $wunderprize".
, 
,
(, 
).
/g . 
, s/// 
( ). 
/g , 

, .
:
Dear Tom,
You have been chosen to win a brand new fabulous 100% genuine faux di
amond! Free!
Could you use another fabulous 100% genuine faux diamond in the Cruise
household?
Yes Tom Cruise, I bet you could! Just respond by.....

:
, 
Perl 
. 9,0500000037272. , 
9,05, 
Perl , 
. 
printf
,
, . ,
, 1/8, .125,
.
:
, ,
. .
12.3750000000392 12.375
12,375, 37.500 37,50. , 
.

80

2.

. 78.
$var =~ s/\bJeff \b/Jeff/i?
 , , 
.
\bJEFF\b, \bjeff\b \bjEfF\b, ,
.
/i Jeff .
Jeff, 
( /i , 
, 7).
jeff 
Jeff ( ).

? ,
, $price, 

$price =~ s/(\.\d\d[l9]?)\d*/$l/

(: \d, . 77,
).
\. , 
. \d\d
, . [19]? 
, .
,
,
$1. $1
. ,
. $1
.
. 
, . . \d* 
.
; 4,
. 
.


,
. 
, .

81

Enter
. ,
. , ,
sysread read. ,

.
, 
:
% perl  i  's/sysread/read/g' _

Perl s/sysread/read/g. (, 
, e, 
.) p , 
, i
,
.
: ($var =~ ),
p
. , /g
.
,
; Perl
. 

. Perl,
, 
.


. , 
.

,
. , 
, 
.
(, )
, . , 
, mkreply, 
king.in, 
:
% perl w mkreply king.in > king.out

( : w 
, 64.)

82

2.


From elvis Thu Feb 29 11:15 2007
Received: from elvis@localhost by tabloid.org (8.11.3) id KA8CMY
Received: from tabloid.org by gateway.net (8.12.5/2) id N8XBK
To: jfriedl@regex.info (Jeffrey Friedl)
From: elvis@tabloid.org (The King)
Date: Thu, Feb 29 2007 11:15
MessageId: <2007022939939.KA8CMY@tabloid.org>
Subject: Be seein' ya around
ReplyTo: elvis@hh.tabloid.org
XMailer: Madam Zelda's Psychic Orb [version 3.7 PL92]
Sorry I haven't been around lately. A few years back I checked
into that ole heartbreak hotel in the sky, ifyaknowwhatImean.
The Duke says "hi".
Elvis

, king.out :
To: elvis@hh.tabloid.org (The King)
From: jfriedl@regex.info (Jeffrey Friedl)
Subject: Re: Be seein' ya around
On
|>
|>
|>
|>

Thu, Feb 29 2007 11:15 The King wrote:


Sorry I haven't been around lately. A few years back I checked
into that ole heartbreak hotel in the sky, ifyaknowwhatImean.
The Duke says "hi".
Elvis

. 
( elvis@hh.tab
loid.org, ReplyTo ), 
(The King), , . 
, 
.
:
1. ;
2. ;
3. |>.
, , 
. , Perl 
<>.
($variable =
<>), . 
,
Perl ( king.in ).

83

<> > _
Perl >=/<=.
getline() Perl.
, <>
( 
false),
:
while ($line = <>) {
... $line ...
}

, 

. , 
; . 
, :
#
while ($line = < >) {
if ($line =~ m/^\s*$/) {
last; # while,
}
... ...
}
... ...
.
.
.

, ,
^\s*$. 
( ), 
( , 
),
.1 last while, 
.
, , , 
, .
.
,
:
1

, 
,
. ^ $
( ).
,
, $line 
.

84

2.
if ($line =~ m/^Subject: (.*)/i) {
$subject = $1;
}

, Sub
ject:. , 
.* .
.* , 
$1.
$subject. ,
(
), if ,
$subject .
Date ReplyTo:
if ($line =~ m/^Date: (.*)/i) {
$date = $1;
}
if ($line =~ m/^ReplyTo: (.*)/i) {
$reply_address = $1;
}

From: . ,
, From:, , 
From. :
From: elvis@tabloid.org (The King)

, , 
. .
, ^From:(\S+).
, \S , 
( 77), \S+ 
( ). 
. 
, . ,
, \(
\) ( ,
).
! 
[^()]* (:
; 

, ).
, , :

From:(\S+)\(([^()]*)\)

,
. 2.4 .

85

.*
.*
, (
,
). , 
, .
.
,
, 
.
( 52),

4 ( 210).

, . 2.4, ,
$2, 
$1:
if ($line =~ m/^From: (\S+) \(([^()]*)\)/i) {
$reply_address = $1;
$from_name = $2;
}

Rep
lyTo, $1 
. Reply
To, $reply_address . 
:
while ($line = <>)
{
if ($line =~ m/^\s*$/ ) { # ...
last; # 'while'.
}

,
#

^From: (\S+) \(([^()]*)\)


$1

$2

. 2.4. ; $1 $2

86

2.
if ($line =~ m/^Subject: (.*)/i) {
$subject = $1;
}
if ($line =~ m/^Date: (.*)/i) {
$date = $1;
}
if ($line =~ m/^ReplyTo: (\S+)/i) {
$reply_address = $1;
}
if ($line =~ m/^From: (\S+) \(([^()]*)\)/i) {
$reply_address = $1;
$from_name = $2;
}
}


, ,
. 
.
while :1
print
print
print
print

"To: $reply_address ($from_name)\n";


"From: jfriedl\@regex.info (Jeffrey Friedl)\n";
"Subject: Re: $subject\n";
"\n" ; # ,
# .

Re:,
. 
:
print "On $date $from_name wrote:\n";

( )
|>:
while ($line = <>) {
print "|> $line";
}

, $line 
.
,
:
$line =~ s/^/|> /;
print $line;
1

, , Perl 
@ (107).

87

^ , ,
. , ,

|>, . . |> . 

, ( 
) .

,
, , 
. , ,

, a Perl . 
Perl ,
, 
.
, .
From: ; 
.
, $from_name
. 

/, 
(
) :
if (

not defined($reply_address)
or not defined($from_name)
or not defined($subject)
or not defined($date) )

{
die "couldn't glean the required information!";
}

Perl defined ,
, die 
.
, , From: 
ReplyTo:. From:
, $reply_address,
ReplyTo:.

, ...
,
,

. 

88

2.

Pascal,
, 
Pascal 
Perl, Pascal! 
,
, .
!



.
print "The US population is $pop\n";

The US population is 298444215, 


298,444,215.
?

, 
. ,
,
.
: , 
, .

, (lookaround).

(\b, ^ $)
, .
, 
.
(
, , 
. ,
.
(?=). , 
(?=\d) , 
. ,
( ). 
(?<=).
, (?<=\d) ,
(. . ).


( )
: 

89

, 
, . ,
, .
Jeffrey :
by Jeffrey Friedl.


, (?=Jeffrey), :
by Jeffrey Friedl.

,
, , 
. , 
, (,

Jeff), . 
(?=Jeffrey)Jeff, . 2.5,
Jeff , Jeffrey.

by Jeffrey Friedl.

Jeff,
by Thomas Jefferson

Jeff ,  
, (?=Jeffrey),
. 
, 
.
,
.
, (?=Jeffrey)Jeff 
Jeff(?=rey).
Jeff , Jeffrey.

"by Jeffrey Friedl"

(?=Jeffrey)Jeff

. 2.5. (?=Jeffrey)Jeff

90

2.


. Jeff(?=Jeffrey) 
Jeff,
Jeffrey.
, 
, . ,
(?:) (
. 72), 
. 
, 
(?. ,
, . 
(?:),
(?=) (?<=) .


,
.
Jeffs Jeffs.
, 
s/Jeffs/Jeff's/g ( /g
. 79). 
: s/\bJeffs\b/Jeff's/g.

s/\b(Jeff)(s)\b/$1'$2/g, 
, s/\bJeffs\b/
Jeff's/g. :
s/\bJeff(?=s\b)/Jeff'/g

,
s\b .
. 2.6 , 
. 
s.
Jeff , 
. , 
s\b (. . Jeff s
). s\b , 
, s
. : Jeff
, 
. , 


. 

91

"see Jeffs book"



\b Jeff (?=s\b)

. 2.6. \b Jeff(?=s\b)

Jeffs,
Jeff.
, 
? ,
 . 

. :
Jeffs,
, 
Jeff, .
s , .
.
, 
, . , 
, 
. .
s

.
Jeff ?
(?<=\bJeff) (?=s\b) (. 2.7),
: , Jeff,
s..
, . , 
, :
s/(?<=\bJeff)(?=s\b)/'/g

. 
,
. 
.
, s/^/|>/
|>.

92

2.

"see Jeffs book"

(?<=\b Jeff)(?=s\b)

. 2.7. (?<=\b Jeff) (?=s\b)

,
? , 
s/(?=s\b)(?<=\bJeff)/'/g?
, .
. . 2.1
Jeffs Jeff's.
2.1. Jeffs

s/\b Jeffs\b/Jeff's/g

, 
; , 

.
Jeffs 
.

s/\b(Jeff)(s)\b/$1'$2/g

.
Jeffs.

s/\bJeff(?=s\b)/Jeff'/g

s , 


.

s/(?<=\bJeff)(?=s\b)/'/g


.
(
, . 

.

s/(?=s\b)(?<=\bJeff)/'/g

,
.
,

.

93

,
. ,
Jeffs , 
.

/i? : , . 
, .


, , Jeffs 
:
, .
: 
,
, . 
.
, 
(?<=\d).
.
\d\d\d. 
()+, ,
$, 
. (\d\d\d)+$
, , 
(?=) , 

, 123 456 789.
, 
(?<=\d).
:
$ =~ s/(?<=\d)(?=(\d\d\d)+$)/,/g;
print "The US population is $pop\n";

The US population is 298,444,215, 


. , \d\d\d
.
,
$1 .
(?:) 
, . 72. 
(?<=\d)(?=(?:\d\d\d)+$). 

, , 
$1, . 
, 

94

2.

. 92.
?
(?=s\b) (?<=\bJeff) .
, ( ,
, ),
, 
. , Thoma s
Jeff erson (?=s\b) (?<=\bJeff)
( ), 
, 
.

, 
. ,
,
. 4 ,

.
. 
, ()
, (?:) .
. 
, 
. (?:) , ,
, 
( ).


, 
, . :
$text = "The population of 298444215 is growing";
.
.
.
Stext =~ s/(?<=\d)(?=(\d\d\d)+$)/./g;
print "$text\n";

, $ 
, .
, 
, ,
; ...of 2,9,8,4,4,4,215 is...!

95

$ \b? 
, Perl .
\w ( 77), Perl
, 
 . ,
, 
(, ), (
, ), .
, , 
, ? Jeffs. 
,
. , ,

, (
(
, . 
. 2.2, 

. , 
.
, , 
\w, , \w,
(?<!\w)(?=\w), 
(?<=\w)(?!\w).
, (?<!\w)(?=\w)|(?<=\w)(?!\w)

. 93.

Jeffs /i?
, 
(. . 
Jeff's) 
. . 2.1,
$1 $2 
.
, .
, 
.

. 
/i.
JEFFS Jeff's Jeff'S .

96

2.

\b. 
\b, 
, 
( 174).
2.2.

, ...
 (?<=)


 (?<!)

 (?=)

 (?!)

, (?!\d), 
. \b $, 
:
$text =~ s/(?<=\d)(?=(\d\d\d)+(?!\d))/,/g;

...tone of
12345Hz, . ,
...the 1970s.... ,
...in 1970 ..., . 
, ,
, (
, ).

, (?!\w)
(?!\d). , \D (, 
. 77) , (?!\d)
. .
, ,
. , \D
( . 36).


( ) 
, . 

, Perl ,
.
, 
. :

97


$text =~ s/(\d)(?=(\d\d\d)+(?!\d))/$l,/g;

, 
, 
\d, , 
$1.
?
(?!\d) \b, ,
, 
? , :
$text =~ s/(\d)((\d\d\d)+\b)/$l,$2/g;

, 
.

HTML

HTML. ,
, , 
,
.
, , 
,
. ( 
) , 
. Perl 
:
undef $/; #
$text = <>; # ,

, :
This is a sample line.
It has three lines.
That's all

$text :
This is a sample line. NL Hit has three lines. NL That's all NL


:
This is a sample line. CR

NL Hit

has three lines. CR

NL That's

all CR

NL


, ( , Windows) 
+ .
,
.

98

2.

. 97.
$text =~ s/(\d)((\d\d\d)+\b)/$1,$2/g 
?
, , 
298,444215. , , 
(\d\d\d)+,
, 
,
/g.
,
. ,
,
,
. 
,
. ,
.

. 
(, while), 
.
(
 /g). :
while
#
#
#
}

( Stext =~ s/(\d)((\d\d\d)+\b)/$l,$2/g ) {

,
.


&, < > 
, HTML: &amp;, &lt; &gt;.
HTML ,
.
HTML:
$text =~ s/&/&amp;/g; # HTML
$text =~ s/</&lt:/g; # &, < >
$text =~ s/>/&gt:/g; # HTML

/g, 
(
).

99

&,
.



HTML <p>. , 
.
. 

$text =~ s/^$/<>/g;

: 
, . 
, . 33, 
egrep, ,
.
Perl , 
, 
.
, . 83, ^ $
,
.1 ,
, 
.
, 
,
^ $
, . Perl
/m:
$text =~ s/^$/<>/mg;

/m /g (

). 
.
, chap
ter. NL NL Thus, chapter. NL <p> NL Thus.
,
. ^*$
, ^[\t\r]*$ , 
,
1

$ , 
, . 
. 169.

100

2.

. 
^$ , ,
^$ .
, , 
( ).
\s (. 74), 
^\s*$, 
. 83. [\t\r] \s, 
\s
: , ,
,
, .
, ^\s*$ . ,
<> 
, .
, $text
with. NL

NL NL TAB

NL Therefore


$text =~ s/^[ \t\r]*$/<p>/mg;

:
with. NL <p> NL <p> NL <p> NL Therefore

$text =~ s/^\s*$/<p>/mg;

:
with. NL <p> NL Therefore

^\s*$.


HTML 

mailto , jfriedl@oreilly.com 
: <href="mailto:jfriedl@oreilly.com>jfriedl@oreilly.com</a>.

. 
; 
, 
.
_@_.


101

, , 
:
$text =~ s/\b(_\@_)\b/< href="mailto:$l">$l<\/a>/g;

\ 
</>. 
. \@ ( 107), 
, Perl @ 
.
\ /
. ,
Perl s///, 
/. / 
, , Perl 
,
. ,
</>, <\/a>,
.
, , Perl 

, s!!! s{}{}.
/ 
, . 
,
.
. 
\b\b.
, jfriedl@oreil
ly.compiler. ,
, 
. , 
. , 
: <href="mailto:$1">$1</a>.

.
,
_ _. (
regex.info www.oreilly.com) ,
com, edu,
info, uk . . 
\w+\@\w+(\.\w+)+,
\w+.
, .
( 
, , ),

102

2.

\w+ \w[.\w]*. 
, \w,
. :
,
, az. 
.\w
, 
, .
. 
, ,
. 
, , 
\w+(\.\w+)+, [\w.]+, 
.....
Artichokes 4@1.00, .

, \w+(\.\w+)*\.(com|edu|
info).
com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[az][az],
,
. \w+,
\.\w+, 
, .
, \w 
. ASCII, ,
,
ASCII (, , , . .),
.
. [azAZ09]
[az09] /i ( 
). , 
[az09] ( , 
). , 
[az09]+(\.[az09]+)*\.(com|edu|info).
, , 
. ,
[az09]+(\.[az09]+)*\.(com|edu|info)
run C:\\startup.command at startup,
, ,
, . ,

$text = s{\b(_\@_)\b}{< href="mailto:$l">$l</a>}gi;

( ,
/i), 

103

. , Perl 
, ! 
/x, :
$text =~ s{
\b
# $1...
(
_
\@
_
)
\b
}{< href="mailto:$1">$l</a>}gix;

! / (
/g /i) , 
. , 
, 
, . ,
, #.
, /x 
, # , : 

( 147).
( ,
/), #
, , 
. ,
\s, m/<a \s+ href=>/x.
, / 
, .
, s{}{}, 
} (
, }), x 
/x.

, 
,
. :
undef $/: #
$text = <>; # ,
$text =~ s/&/&amp;/g;
$text = s/</&lt:/g;
$text =~ s/>/&gt;/g;

# HTML ...
# &, < >
# HTML

Stext =~ s/^\s*$/<p>/mg; #

104

2.
#
Stext =~ s{
\b
# $1...
(
\w[.\w]*
#
\@
[az09]+(\.[az09]+)*\.(com|edu|info) #
)
\b
}{< href="mailto:$1">$l</a>}gix;
print Stext; #

,
, : /m
, 
^ $. , 
(
, ).

HTTP URL
HTTP URL
. http://www.yahoo.com
<ahref="http://www.yahoo.com/">http://www.ya
hoo.com</a>.
HTTP URL http://xoc/ny,
/ . ,
:
$text =~ s{
\b
# URL $1 ...
(
http://
(
/
)?
)
}{<a href="$1">$1</a>}gix;


, 
. ; 
[az09_:@&?=+,.!/~*'%$]*
( 51), ASCII, ,
(< > ( ) { } . .).
Perl, 
@ $. 

105

( 107). 
, :
$text =~ s{
\b
# URL $1 ...
(
http:// [az09]+(\.[az09]+)*\.(com|edu|info) \b #
(
/ [az09_:\@&?=+,.!/~*'%\$]* #
)?
)
}{<a href="$1">$1</a>}gix;

\b URL 
:
http://www.oreilly.com/catalog/regex3/
\b , 
.
, , 
URL. :
Read "odd" news at http://dailynews.yahoo.com/h/od, and
maybe some tech stuff at http://www.slashdot.com!


, ,
URL. URL

.,?! ( , 
).

[.,?!] (. . (?<![.,?!]))
. , 
URL
, 
. , 
URL,
. ,  
, (
5, 258).
:
undef $/; # ""
$text = <>; # , .
$text =~ s/&/&amp;/g; # HTML ...
$text =~ s/</&lt;/g; # &, < >
$text =~ s/>/&gt;/g; # HTML

106

2.

$text =~ s/^\s+$/<p>/mg; # .
# ...
$text =~ s{
\b
# $1 ...
(
\w[.\w]*
#
\@
[az09]+(\.[az09]+)*\.(com;edu;info) #
)
\b
}{<a href="mailto:$1">$1</a>}gix;
# HTTP URL ...
$text =~ s{
\b
# URL $1 ...
(
http:// [az09]+(\.[az09]+)*\.(com;edu;info) \b #
(
/ [az09R:\@&?=+,.!/~+'%\$]* #
(?<![.,?!])
# [.,?!]
)?
)
}{<a href="$1">$1</a>}gix;
print $text; # .


:
. ,
.

, $HostnameRegex, 
:
$HostnameRegex = qr/[az09]+(\.[az09]+)*\.(com|edu|info)/i;
# ...
$text =~ s{
\b
# $1...
(
\w[.\w]*
#
\@
$HostnameRegex #
)
\b
}{<a href="mailto:$1">$1</a>}gix;
# HTTP URL ...
$text =~ s{

107

\b
# URL $1...
(
http:// $HostnameRegex \b
#
(
/ [az09_:\@&?=+,.!/~*'%\$]* #
(?<![.,?!]) # [.,?!]
)?
)
}{<a href="$1">$1</a>}gix;

Perl qr. m s,
(. . qr// m// s///),
, , 
, 
.

( , , 
$HostnameRegex, ). 
,
. : 
,
. 

6 ( 337),
7 ( 366).

; 
, Java .NET
8 9.
$ @
, , $ 
, (. .
) . $ 
, . 
$ , 
Perl 
, . 
$ .
, $
URL .
@. @ Perl 
,
. , @
, ; 
.

108

2.

( , Java, VB.NET, , #, Emacs awk) 


. (Perl, PHP,
Python, Ruby Tcl) ,
. ( 135).


, 1
.
, :
$/ = ".\n";
while (<>) {
next if !s/\b([az]+)((?:\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$3\e[m/ig;
s/^(?:[^\e]*\n)+//mg; # .
s/^/$ARGV: /mg;
# .
print;
}

, Perl,
<>, s/// print. 
! Perl (
), 
,  .
, .
, 
, 1, 
:
% perl w FindDbl ch01.txt
ch01.txt: check for doubled words (such as this this ), a common problem with
ch01.txt: * Find doubled words despite capitalization differences, such as with
ch01.txt: ' The the ', as well as allow differing amounts of whitespace (space,
ch01.txt: tabs, /\<(1,000,000;million| thousand thousand)/. But alternation
ch01.txt: can't be of this chapter. If you knew the the specific doubled word
ch01.txt: to find (such
.
.
.

Perl. 
Java
.
s{}{}(
/x,
( next if ! next unless).
.

$/ = ".\n";
while (<>)

# ;
# " "

109

{
next unless s{

### :
\b
# ....
( [az]+ )
# , $1 ( \1).
### / <...>
(
# $2.
(?:
# ( )
\s
# ( ).
|
#
<[^>]+> # <TAG>.
)+
#
# .
)
### :
(\1\b)
# \b .
# $3.
#( )
}
# /i, /g /x
{\e[7m$1\e[m$2\e[7m$3\e[m}igx;
s/^(?:[^\e]*\n)+//mg; # .
s/^/$ARGV: /mg;
# .
print;
}

, 
. , ,
. 
Perl ( 7).
, , :
Perl,
.

,
, 
, .
$/ (,
!) <> 
, , 
, . 
, 
.
, , <>,
? while <> 

110

2.

.1 , 
s/// print. 

, , 
,
.
next unless Perl
( ),
.
, .
"$1$2$3" 
Escape ANSI, 
(
, ). \[7m 
, \[m (\ 
Perl 
, Escape ANSI).
,
, "$1$2$3" . , 
Escape,
( )
.
, $1 $3 (,
!),
.
, 
.
,

,
Escape \. 
, . 
/m
^([^\]*\n)+ , 
\, . 
, 
\, . . .2
1

$_ (, !), 

.

,
ANSI. 
.

111

$ARGV . 
/m /g 
, 
.
, print Es
cape ANSI. while
( , , 
), .

,
, Perl 
,
. , , 
,
.
,
Perl 
, 
. , , 
, 
+  ,
,
.
. , 
3 ( 126),

.
, ,
, ,
, ; true
false , 
. , ( 
)
, 
Java. ja
va.util.regex Java 1.4.
, 
Perl,
Pattern.compile. , Java
\,
Java,
. 
\, , 
, Java 
( 71).

112

2.

, , 
, ,
. Pattern.compile

, Pattern (re
gexl . .)
regexl.matcher(text), 
. 
, :


.
Java
import java.io.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class TwoWord
{
public static void main(String [] args)
{
Pattern regex1 = Pattern.compile(
"\\b([az]+)((?:\\s<\\<[^>]+\\>)+)(\\1\\b)",
Pattern.CASE_INSENSITIVE);
String replace1 = "\033[7m$1\033[m$2\033[7m$3\033[m";
Pattern regex2 = Pattern.compile("^(?:[^\\e]*\\n)+", Pattern.MULTILINE);
Pattern regex3 = Pattern.compile("^([^\\n]+)", Pattern.MULTILINE);
// ....
for (int i = 0; i < args.length; i++)
{
try {
BufferedReader in = new BufferedReader(new FileReader(args[i]));
String text;
// .....
while ((text = getPara(in)) != null)
{
//
text = regex1.matcher(text).replaceAll(replace1);
text = regex2.matcher(text).replaceAll("");
text = regex3.matcher(text).replaceAll(args[i] + ": $1");
//
System.out.print(text);
}
} catch (IOException e) {
System.err.println("can't read ["+args[i]+"]: " + e.getMessage());
}
}


}
//
static String getPara(BufferedReader in) throws java.io.IOException
{
StringBuffer buf = new StringBuffer();
String line;
while ((line = in.readLine()) != null &&
(buf.length() == 0 ;; line.length() != 0))
{
buf.append(line + "\n");
}
return buf.length() == 0 ? null : buf.toString();
}
}

113

3
:

,
, . 
, ,
.
egrep Perl Java
, , 

.

.
. 
.

, 
, , 
.
.
, 
,
.


, 
. 

, 
,

115
 .

; . 58
. ,
, .

. 
(, ,
), 
(,
).
.
(, 
),
, .
: ?
? ? 
?
(, , )?
,
.
,
. ,
, ,
: ,
 . 

, 

.
,
.


, 
.

,
. , 
, . 

.
, 
.

116

3. :

, . 

, .
, , 
, , ( 
) . , 

.



.
( ) 
. . (
) .
, 
,
.
, ,
.


1940 .
,  (Warren McCulloch) 
(Walter Pitts),
.1
, (Step
hen Kleene) 
, (regular sets).

, .
1950 60
. (Robert Cons
table) 2 .
1

A logical calculus of the ideas immanent in nervous activity


Bulletin of Math. Biophysics ( 5,
1943 .) Embodiments of Mind (MIT Press,
1965 .). (
, 
1 150 !), ,
.
Robert L. Constable The Role of Finite Automata in the Development of Mo
dern Computing Theory, The Kleene Symposium,
Barwise, Keisler Kunen (NorthHolland Publishing Company, 1980), 6183.

117

, 
,
, ,
Regular Expression Search Algorithm 1968 
.1 
, IBM 7094.
qed , 
UNIX ed.
ed 
qed,
. ed 
, 
. , g/Regular Expression/p,
Global Regular Expression Print (
). 
, . 
grep,
egrep.

grep
, , 
egrep. 
* , + ? (
).
grep \(\), 

.2 grep ,
. ^ 
, , 
. 
. , 
$ .

end$|^start. , , 
!

. , grep
, * 
1

Communications of the ACM, Vol. 11, No. 6, June 1968.

: ed ( grep)
, ,

 
, .

118

3. :

, , 
. , grep 
, .
, grep
.

grep
grep ,
,
30 
. , 
. grep 
.
AT&T Bell Labs grep ,
\{min, max\},
lex. y,
, 
.
, y i. , *

.

egrep
(Alfred Aho), AT&T
Bell Labs, egrep,
, 1. ,
(
) . egrep
+ ?,
,
egrep.
,
, . .
. , egrep
,
, 
. .


(
awk, sed lex). ,  
,
. , , .
, grep
,
+, grep 

119

, . 
\+ ,
.

. ,
. ,
,
, 
, ,  .

, (
).

POSIX
POSIX (Portable Operating System Interface
) 1986 . 
.

, , 
. , 
, ,
POSIX. POSIX
: BRE (basic regular exp
ressions, . . ) ERE (extended re
gular expressions, . . ). PO
SIX .
POSIX . 3.1.
3.1.
POSIX

, ^, $, [], [^]

BRE

ERE

+ ?

+ ?

\{min, max\}
\(\)

{min, max}
()

\1 \9

POSIX (
(locale) ,
: ,

120

3. :

, . . 

. ,
. ,
Latin1 (ISO88591), ( 224 160 
) . 
, ,
.
\w,
( [azAZ09_]).
POSIX , .
\w ,
, , 
ASCII.
, 

. . 141.


1986 (Henry Spencer)
, .
(
) . , 
( , ),

.

Perl
(Larry Wall) 
, Perl.
patch, 
, Perl 
.
Perl 1987 . 
;
, 
, .
Perl

sed awk 
. 
rn 
, . , rn
Emacs 

121

(James Gosling).1 Perl


,
. ,
9 , 9
|, 
| . ,
, \w 
( \s \d ).
{min, max}.
2 1988 .

,
.  
9 , 
|. \s \d, \w
, 
Perl. , 
( , \D, \W \S, ,
,
).
/i, .
3 , 1989 . 
/,
, 
. , , 
, . , 3
8
, 
, ASCII.
4 , 1991 , 

1993 . 
, ( \D 
,
).
, 
1994 .
5 1994 . Perl

. 
1

Java.
, 
. Java 1.4
, 8.

122

3. :

, 
\G, 171,
( 72), 
( 184), ( 90) /1 ( 102).

,
, 
.
( 
)  
. (), [], <> {}
, (?,
. 
, Perl 
, .
;
, (?,
.
Perl , 
, .
,
,
. , 
, Perl Porters,
.

, ( 88), 
( 180) . 
( 182) 
 
. ,
,
( 393). Perl 5.8.8.


Perl 5 
World Wide Web. Perl 
,  ,
, Perl 
1

, / ,

. . ,
Perl ,
/.

123

. Perl , 
.

Perl.
, Perl
.
Tcl, Python, Microsoft .NET, Ruby, PHP, C/C++ 
Java.
1997 (
), 
(Philip Hazel) PCRE (Perl Compatible Regular
Expressions ,
Perl) , 

Perl. PCRE

. 
PCRE
, PHP, Apache Version 2, Exim, Postfix Nmap.1

,
. 3.2 
, .
(,
).
3.2. ,
GNU awk 3.1

java.util.regex (Java 1.5.0, A.K.A


Java 5.0)

Procmail 3.22

GNU egrep/grep 2.5.1

.NET Framework 2.0

Python 2.3.5

GNU Emacs 21.3.1

PCRE 6.6

Ruby 1.8.4

flex 2.5.31

Perl 5.8.8

GNU sed 4.0.7

MySQL 5.1

PHP ( preg)
5.1.4 / 4.4.3

Tcl 8.4



, , .
. 3.3 
.
1

PCRE : ftp://ftp.csx.cam.ac.
uk/pub/software/programming/pcre/.

124

3. :

3.3.

!

grep

! GNU
Emacs

egrep

Tcl

Perl

.NET

Java
Sun

+, ^, $, []

? + |

\? \+ \|
\(\)
(?:)

? + |
()

? + \|
\(\)

? + |
()

? + |
()

? + |
()

? + |
()

\< \>

\< \> \b, \m, \M, \y \b, \B


\B

\b, \B

\b, \B

\w, \W


. 
, , 
,
.
, . ,
Tl 
, . 
[:<:] [:>:],
, 
\m, \M \y ( , 
).
, grep egrep,

. ,
 
(, GNU 

).
, , 
. 
, Perl, .NET Java 
, .
, . 3.3.
* ,
?

125

? 
?  (NUL)?


(. . , 
)? 

?

?

?

? ,
( 
)?

,
? 
?


\123? , 
? 
? 

 ?

\w  
 ? ( 
\w, . 3.3, 
.) \w 
,
?
?


, . 3.3, 
. , 
, .
, 
, .
, , , egrep
(Jul|July), GNU Emacs \(Jul\|Ju
ly\), , .
(
, ) , 
, 
: 
Jul July.
, 

126

3. :

, (July|Jul) \(July|Jul\),
. , .
, , 
, , 
.
Perl egrep , 
Perl 
. ,
.



, , 
, : 
, , . egrep
, 
. 
(, , 1)
, egrep. 
(, 
) 
, ,
.
, , 
.
egrep, 
( , ),
, 
. 
(
(
) ( 
).
, .

: , 
.
,
Perl. 
. 
, 
.
. 
(

127

Perl), Java, .NET, Tcl, Python, PHP, Emacs


lisp Ruby.



Perl, . 83:
if ($line =~ m/^Subject: (.*)/i) {
$subject = $1,
}

; 
,
. Perl 
^Subject:(.*) , $line,

. $1 , 
. 
$subject.

,
UNIX procmail.

, .
, Perl, () 
.
,
.

, ,
. 
, , , 

.
, ,
.
.



 .

, (
) (
). 
, 

128

3. :

, 
.
Java, VB.NET,
PHP Python.

Java
Subject
Java java.util.regex ( Java 
8).

import java.util.regex.*; //
..
.
Pattern r = Pattern.compile("^Subject: (.*)", Pattern.CASE_INSENSITIVE);
Matcher m = r.matcher(line);
if (m.find()) {
subject = m.group(1);
}

, 
;
, . 
, , 
.
 
,
java.util.regex Sun: Pattern Matcher. 
:

,
. Pattern.
, 
. Matcher.
. find 
, , 
.
, 
, .
( ) 
, . Perl 
, Java 
.

Java, Sun,
,

129

.
,
; 
.
Pattern.matches():
if (! Pattern.matches("\\s*", line))
{
// ... ...
}

^$

. ,
 , 
Sun. 
( 
, ), 
(
 ) 
( 
, 
6).
Java,
Sun 
. ,
, 
:
if (! line.matches("\\s*", ))
// ... ...
}


 , 
, , 
.

VB
.NET

,  
,
, . 
Subject VB.NET ( .NET 
9):
Imports System.Text.RegularExpresslons '
'

130

3. :
.
.
.
Dim R as Regex = New Regex("^Subject: (.*)", RegexOptions.IgnoreCase)
Dim M as Match = R.Match(line)
If M.Success
subject = M.Groups(1).Value
End If

Java,
, .NET 
Value . ?
, 
, , 
( ).
.NET , 
. :
If Not Regex.IsMatch(Line, "^\s*$") Then
// ... ...
End If

Pattern.matches Sun, 
^$, 
Microsoft .
, 

( ).

PHP
Subject PHP 
preg, 
. ( PHP 10.)
if (preg_match('/^Subject: (.+)/i', $line, $matches))
$Subject = $matches[1];

Python
Subject 
Python:
import re;
.
.
.
R = re.compi1e( "^Subject: (.*)" , re.IGNORECASE);
M = R.search(line)
if M:
subject = M.group(l)

131

?
,
?
,
, . 

Java; , 
, 
Sun. 
, : 
, 
Sun.

PHP, 
,
.
PHP, ,
PHP . (,
preg
, .)


Subject
.
,
.
( 103)
Perl 
mailto:
$text =~ s{
\b
# $1
(
\w[.\w]+
#
@
[\w]+(\.[\w]+)*\.(com|edu|info) #
)
\b
}{<a href="mailto:$1">$1</a>}gix;

Perl 
, , , 
, 
. 
, .
,

132

3. :

, , 
.
, .

Java
, 
java.util.regex Sun:
import java.util.regex.*; //
.
.
.
Pattern r = Pattern.compile(
"\\b
\n"+
"# $1...
\n"+
"(
\n"+
"
\\w[.\\w]*
# \n"+
"
@
\n"+
"
[\\w]+(\\.[\\w]+)*\\.(com|edu|info) #
\n"+
")
\n"+
"\\b
\n",
Pattern.CASE_INSENSITIVE|Pattern.COMMENTS);
Matcher m = r.matcher(text);
text = m.replaceAll("<a href=\"mailto:$1\">$1</a>");

. 
, , \
\\ . 
, \\w \w .
System.
out.rintln(r.pattern()); ,
. ,
,
, .
# ,
. , ,
, 
.
Perl 
/g, /i / ( , , (
176). java.util.regex
(replaceAll replace)
, (, Pattern.CASE_INSENSITIVE
Pattern.COMMENTS).

VB.NET
VB.NET :
Dim R As Regex = New Regex

133

("\b
" & _
"(?# $1... )
" & _
"(
" & _
" \w[.\w]*
(?# ) " & _
" @
" & _
" [\w]+(\.[\w]+)*\.(com|edu|info) (?# )
" & _
")
" & _
"\b
", _
RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
text = R.Replace(text, "<a href=""mailto:${1}"">${1}</a>")

 VB.NET ( 
, 
) VB.NET
, . , \
VB.NET , 
. , 
VB.NET 
( ,
, ).

PHP
PHP:
$text = preg_replace('{
\b
# $1...
(
\w[.\w]*
#
@
[\w]+(\.[\w]+)*\.(com|edu|info) #
)
\b
}ix',
'<a href="mailto:$1">$1</a>', #
$text);

(Java VB.NET), 
$text,
Perl.

Awk
awk
/_/ ,
var ~ .

134

3. :

awk
Perl (, Perl 
sed). awk 
, 
sub():
sub(/mizpel/, "misspell")

mizpel
, misspell. 
Perl s/mizpel/misspell/.
/g
awk : gsub(/mizpel/, "misspell").

Tcl
Tcl ,
, 
Tcl. Tcl
:
regsub mizpel $var misspell newvar

var, 
mizpel misspell 
newvar ( $).
Tcl ,
, 
. Tcl
regsub . , all 
(
):
regsub all mizpel $var misspell newvar

nocase 
( egrep i 
Perl /i).

GNU Emacs
GNU Emacs (
Emacs) elisp
(Emacs lisp)
. , researchforward, 

,
, .

.

135

. 3.3 ( 124),
Emacs \. ,
\<\([az]+\)\([\n\t]\|<[^>]+>\)+\1\>
(. 1). 
, 
Emacs \n \t. ,
Emacs, , 
,
. 

. , . 
\ elisp ,
 . 
:
(defun FlndNextDbl ()
"move to next doubled word. Ignoring <> tags" (interactive)
(researchforward "\\<\\([az]+\\)\\([\n \t]\\|<[^>]+>\\)+\\1\\>")
)

(definekey globalmap "\C


x\Cd" 'FindNextDbl),
Controlx Controld .

,
. 
, , 
. !  ,
.

,

, :
, .
,
. 
, 
,  
, . 
, ,
.


: ,
Perl, awk sed,

136

3. :

, "^From:(.*)".
( )

.

. , 
, 
, .

\t, \\ \2,
. ,
, , 
\ 
\\ . , 
\n \\n.
\ 
"\n", NL , 
, \n. ,

/, NL , \n 
, .
, . . 3.4
\t \2 (2 ASCII *). 

, .
3.4.

"[\t\x2A]"

"[\\t\\x2A]"

"\t\x2A"

"\\t\\x2A"

'[ TAB *]'

'[\t\x2A]'

' TAB *'

'\t\x2A'

[ TAB *]

[\t\x2A]

AB *

\t\x2A

   
* * , 
*

/x

 
* *


, 
*

, 
, \ 
. , VB.NET
. 
. 

137

, 
:
,
?

Java
Java:
, \ . 
\t ( 
), \n ( ), \\ ( \) . .
\ ,
Java, .

VB.NET
VB.NET , 
Java.
VB.NET : 
.
, "he said ""hi""\." he said
"hi"\..

#
Microsoft .NET Framework 
,
, 
. Visual Basic
. .NET, C#,
.
C# , ,
,
\" 
"". C# 
@"",
\ , 
: 
. , \t\x2A
"\\t\\x2A", @"\t\x2A". 

@"".


, 
C#. , ,
\n,
,
Perl ( 108), {} 

138

3. :

, 
.
,

\,
. Java C# 
\ , 
. \t
, ,

\t "\\t". , "\w" 
\w , \w
.
, , 
, ,
, .
, ,
VB.NET @""
C#, . ,
, \
, \\
\. ( \) 
. 
, '\t\x2A' \t\x2A. 
, ,
.
, ,
10 ( 526).

Python
Python 
. 
, , 
. Python
'''''' """""",

. 
\n; 
, ( ) 
( Java #, 
).
#, Python
, r, 
, (
# @""). , r"\t\x2A" 
\t\x2A.

139

Python \ , ,
(
): , r"he said \"hi\"\." 
he said \"hi\"\..
, 
Python \" ",
: r'he said "hi"\.'

Tcl
Tcl ,
. , 
Tcl , ,
, , 
. 

\n, \ .
, 
.
Tcl , 
Python, r'' 
: {}. , \+
, , 
\t\x2A {\t\x2A}.

( ). 
\, 
\ .

Perl
Perl, ,
. , 
. ,
$str =~ m/(\w+)/;


$regex = '(\w+)';
$str =~ $regex;

$regex = "(\\w+)";
$str =~ $regex;

( 297, 418).
, Perl 
,
, :

140

3. :

( 
).

\Q\E ( 149).

\N{},
. , 
Hola! \N{INVERTED EXCLAMATION
MARK}Hola!.

Perl 
. , 
Perl, .
,
. 
, Perl,
,
\Q\E Perl. 
 ,
( 
. .), , 
.
7, . 350.


,
. 110
ASCII n,
EBCDIC >. ?  
, 
. ,
.
ASCII 
, . ISO88591
( Latin1) 
,
. , Latin1
234 
, ASCII.
: , 

, ? ,
234, 116, 101 115,
Latin1 ( tes),
^\w+$
^\b. , 

141

\w \b Latin1;
, , .



. 
, :
?

,
?

?


, :
, ,
? . ^x (
?
\w, \d, \s, \b 
? , ,
, \w \b?

? [az]?

? , ?
,
. , \b java.util.regex
, \w
ASCII.
.

, 
.
, , . . 
. ,

49 333. 
, , 
U+. 
49 333 C0B5,
U+C0B5.

3 E ,
e.

142

3. :

,
. 
, UCS2 ( 
), UCS4 ( 
), UTF16 ( , 
) UTF8 ( 
, ). ,

, 
. , ,
(, )
(ASCII, Latin1, UTF8 ) , 
.
/, 
.
, ,
\u, 
( 155). 
,
, \uC0B5 . ,
\uC0B5 U+C0B5
, 
( , 
).
UTF8,
, . 
. (
, 
preg PHP u; 528.)
,



, 
. , ,
, 
: U+0061 () 
U+0300 (`).
,
( ). 
,

. U+0061 + U+0300?

, . 
, 

143

. , (U+0061 + U+0300) 
^..$ ^.$.
Perl PCRE ( preg PHP) 
\X, ,
. ,
. 
. 155.


. 
(, A), A+, ,
, A (
). ,

, [ ] 
, 
[A ].
, 
, A +.



,
.
,
U+0061, U+0300.
U+00E0. 
? La
tin1. Latin1, 
, , U+00E0,

U+0061 + U+0300. ,
,
java.util.regex
CANON_EQ, 
, 
( 440).
: 
,
. , I (U+0049)

I ( U+0399). 
..
I , 
(U+00CF; U+03AA; U+0049 U+0308; U+0399 U+0308). ,

, , .

144

3. :

,
. , Unicode 
SQUARE HZ (U+3390), Hz 
Hz (U+0048 U+007A).
Hz
, 
, ,
, . 
,
(U+0020),
(U+00A0) ,
.

3.1+ , U+FFFF
3.1, 2001 
, ,
U+FFFF ( 
,
). , 
Clef
(U+1D121). ,
U+FFFF , .
\u
.
, 
\u,
\{},
. , Clef 
\x{1D121}.


( 
),
(line terminators). . 3.5.
3.5.

LF

U+000A

(ASCII)

VT

U+000B

(ASCII)

FF

U+000C

(ASCII)

CR

U+000D

(ASCII)

CR/LF U+000D U+000A

/ (ASCII
)

NEL

()

U+0085

145

LS

U+2028

()

PS

U+2029

()


( 
, ). 
, 
. ( 146),

^, $ \Z ( 147).




.
Perl /x
(, 
102) /i ( 74).

, , 
. 
, , , Perl /i,
i PHP Pattern.CASE_INSENSITIVE 
java.util.regex ( 132). , 
, (?i) (
) (?i) (). 
(?i:) (?i:),
.

( 176). 
, .


, (b 
b B),
. 
, , 
.
, 
. , 
. ,

Ruby.

146

3. :

(
) 
. , 
, 
, .

. :
, ; 

( 
Perl Java java.util.regex).
: 
. , 
SS, Perl
.
,


. , J (U+01F0)
U+006A U+030C
( 142). , , 
. 
, 
! , 
.



.
( java.util.regex), # 
. 
Perl ( 102), Java ( 132) VB.NET ( 132).
, 
( java.util.
regex). , 
, . 
\123 
\12,

3, \123, .
, , 
, .
ASCII.


( )
. . 
UNIX ,

147

sed lex 
.

.* , 
,
.1 
,
( ),
. .
,
, 
.
.
.
.
(, 
Java Sun) 

( 144). Tcl 
, 

, 
.

/s Perl
(singleline).
,
^ $, 
, . 
, 
.


( )

^ $. ^
, ,
. 

, ^
. 
( 99) Perl,
HTML. 
1

, ed,
.*.

148

3. :

, s/^$/<p>/mg
...tags. NL NL It's... ...tags. NL <p> NL It's.... 
.
$ , 
$ ( 169).
, 
,
$.
, , 
\A \Z, ^ $ : 
. , \A
\Z .
$ \Z
, .
\z, 
. 
. 169.
., . 
GNU Emacs
, 
. , lex 
$ ( ^ 
).
( java.util.regex) 

( 144). Ruby
, Python

\Z \z, $.
. 
, 
, 
.

., ^ $. 
,
. .
,

^ $ .1
1

Tcl ,  
. Tcl

( ., ), 
. 
,
Tcl.

149



( ) . , 
[az]* [az]*. 
(
),

. 
,
, . , 
PCRE (. . PHP) Perl 
\Q\E, 
(, \).


30 

. ,
, 
, .
, 
, ,
.
, 
. .
, , 

. 
, 
.

, ,
. 
, .
,

151

: \n, \t, \a, \b, \e, \f, \r, \v, ...

153

: \

155

/: \x, \x{}, \u, \U,...

155

: \c

150

3. :

155

: [az] [^z]

156

157

: \C

157

: \X

158

: \w, \d, \s, \W, \D, \S

159

, : \p{},\P{}

164

: [[az]&&[^aeiou]]

166

POSIX: [[:alpha:]]

167

POSIX: [[.spanll.]]

168

POSIX: [[=n=]]

168

Emacs


169

/ : ^, \

169

/ : $, \Z, \z

171

( ): \G

174

: \b, \B, \<, \>,...


(?=), (?!...);
(?<=), (?<!)

175


176

177

(?), (?i) (?i)


(?:),
(?i:)
: (?#) #

177

: \Q\E

177

, ,
178 : (), \1, \2,
178
180

: (?:)
: (?<>)

181

: (?>)
: ||

182

: (? if then | else)

180

183

: *, +, ?, {min,max}

184

: *?, +?, ??, {min,max}?

184

: *+, ++, ?+, {min,max}+

151



, .



,
.
\

( ). 
ASCII <BEL>, 007 ( ).

\b

. ASCII <BS>, 010 ( 


). : \b

, ( 174).

\e

Escape. ASCII <ESC>,


033 ( ).

\f

. ASCII <FF>,
014 ( ).

\n

. ( Unix DOS/
Windows) ASCII <LF>, 012 ( 
). MacOS ASCII
<CR>, 015 ( ). Java
.NET ASCII <LF> 
.

\r

. ASCII <CR>. 
MacOS ASCII <LF>. Java
.NET ASCII <CR> 
.

\t

() .
ASCII <>, 011 ( ).

\v

. ASCII
<VT>, 013 ( ).

. 3.6 
.
, 
. 
( 135), 
,
.

152

3. :

\r (
)

\t ()

\v (!
)

\n (
)

\f (
)

\e (ASCII!!
Escape)

\a ()

Python

\b ()

\b (
)

3.6. ,

Tcl

\y

Perl

Java

SR

SR

SR

SR

GNU awk
GNU sed

GNU Emacs

.NET

PHP ( preg)

MySQL
GNU grep/egrep

flex
Ruby

. 123.
; C ;
SR ( );
X ( 
);
X ( 
);
S ( ).


,
\n \r 1 , 
1

C C++, \
\ C 
,
.

, , 
. ,
\n \r, .

153


. , , 
, , 
\n. 
(,
HTTP), \012 
, (\012
). DOS, 
\015\012. \015?\012 
DOS Unix (
: , 
, 
169).

: \
, (. . 
8), 
,
. , \015\012 ASCII
CR/LF. 
, . 
, Perl ASCII Escape

\e, awk . awk 


, Escape 
ASCII: \033.
. 3.7 
.

\0 . 
, ( ,
, \1).

, 
. (, , java.util.regex) 
; 
, 
0.
, , 
\565 (8 
\000 \377). ,
, (
), 
. 
\377.

154

3. :

3.7.

Python

\0, \07,\377

Tcl

\0, \77, \777

Perl

\0, \77, \777

\xFF
\x \uFFFF; \UFFFFFFFF
\xF; \xFF; \x{}

Java

\07, \077, \0377

GNU awk

\7, \77,\377

\xFF; \uFFFF
\x

GNU sed

GNU Emacs

.NET

\0, \77,\377

\xFF, \uFFFF

PHP ( preg)

\0, \77, \377

\xF, \xFF, \x{}

\7, \77,\377

\xF, \xFF

\7, \77,\377

\xF, \xFF

MySQL
GNU egrep

GNU grep

flex
Ruby

\0 \0 , 
;
\7, \77 ;
\07 , 0;
\077 , 0;
\377 \377;
\0377 \0377;
\777 \777;
\ \ ;
\{} \{} ;
\xF, \xFF \ ;
\uFFFF \u
;
\UFFFF \U
;
\UFFFFFFFF \U
.
. 123.

155

:
\x, \x{}, \u, \U


( 16) \x, \u \U. , 
\x0D\x0A ASCII
CR/LF. . 3.7
.
,
,
( ).
. 3.7.

: \
\,

, 32 ( 
). , \cH Control+H, 
ASCII Backspace, a \cJ 
ASCII ( \n,
\r
151).
, , 
. 
, 
.
, , ,
Java Sun. 

, \c
.
, GNU Emacs,
?\^(
(, ?^H ).



, 
, 
.

: [az] [^az]
,
, 

156

3. :

, 
. , * 
, a 
. (
\b) 
( 152).

,
(. . [09]
[9081726354]). (, 
Java Sun) 
, 
, .
.
, ,
.

, .

(
). 
[^LMNOP] [\x00KQ\xFF]. 
,
, ,
255 (\xFF), [^LMNOP]
, L, M, N, .
,
. , [aZ]

.
[azAZ] ( , 
ASCII). . \{L} 
. 159. 
\x80\xFF.

()

, , 
, . 
, 
. ,
:

(,
Java Sun)
( 144).

157


( 146).
POSIX , 
(. . , ), 
 ( 
).


,
, , 
, [^"] 
. , ".* "[^"]*" 
.
.
( ) . 146.


Perl PCRE (. . PHP) \C,
, 
, ( 
). 

,
. 
, 
, .


: \X
Perl PHP \X

\P{M}\p{M}*, ..
(, \p{M}),

(, \p{M}).
( 142), 
, (
a U+0061 ` U+0300)
. 

. , 
% , , c, 
% (U+0063 U+0327 U+0306).
francais franais
fran.ais fran[c]ais 
, , 

158

3. :

U+00C7, 
, c 
(U+0063 U+0327). 
fran[c ?|]ais,
fran.ais 
fran\Xais.
, \X ,
\X .
, \X 
( 144), 
( 146).
, 
, \X 
( ).

: \w, \d, \s, \W, \D, \S



, :
\d

. [09]
.

\D

He!. [^\d].

\w

, . [azAZ09_], 
,
 
( 119). 
\w  
( : ja
va.util.regex PCRE, PHP, \w 
[azAZ09_]).

\W

, . [^\w].

\s

. , ASCII,
[\f\n\r\t\v].

U+0085, \p{Z} ( 
).

\S

He! ( [^\s]).

. 119, POSIX
( \w).
\w 
, \p(L}) .

159

, :
\{}, \{}

( 141),
. 
(, ,
, 
, . .).

, 
\p{(
} (, ) \P{}
(, ). 
\p{L} (L , 
), (
( ). 
,

\p{} \P{}, .
. 3.8. ( ,
, , 
) 
. 
(L , S . .),
(Letter, Symbol . .).
, Perl.
3.8.

\p{L}

\p{Letter} ,

\p{M}

\p{Mark} , ,
(
, . .)

\p{Z}

\p{Separator} , ,
(
. .)

\p{S}

\p{Symbol}

\p{N}

\p{Number}

\p{P}

\p{Punctuation}

\p{C}

\p{Other} ( 
)


(, \pL \p{L}).
( ) 

160

3. :

In Is (, \p{IsL}). 
,
In/Is .1
. 3.9, 
,
(. 3.9), 
: , , ,
 .
.
3.9.

\p{Ll}

\p{Lowercase_Letter} .

\p{Lu}

\p{Uppercase_Letter} .

\p{Lt}

\p{Titlecase_Letter} , (
, D d
D).

\p{L&}

,
\p{Ll}, \p{Lu} \p{Lt}.

\p{Lm}

\p{Modifier_Letter} 
, .

\p{Lo}

\p{Other_Letter} ,
, , ,
. .

\p{Mn}

\p{Non_Spacing_Mark} ,
(, , . .).

\p{Mc}

\p{Spacing_Combining_Mark} , 
( ,
, , ,
, , , , ,
).

\p{Me}

\p{Enclosing_Mark} ,
(, , ).

\p{Zs}

\p{Space_Separator} ( ,
).

\p{Zl}

\p{Line_Separator} LINE SEPARATOR (U+2028).

, Is/In 
. ,
. 
Perl 5.8 
Perl. Perl :
Is In, 
( 162); In.

161



\p{Zp}

\p{Paragraph_Separator} PARAGRAPH SEPARATOR


(U+2029).

\p{Sm}

\p{Math_Symbol} +, , /, < ,

\p{Sc}

\p{Currency_Symbol} $, c| , , ,

\p{Sk}

\p{Modlfier_Symbol}
, .

\p{So}

\p{Other_Symbol} , 
, . .

\p{Nd}

\p{Decimal_Digit_Number} 0 9
( , ).

\p{Nl}

\p{Letter_Number} .

\p{No}

\p{Other_Number} , ; 
, , (
, ).

\p{Pd}

\p{Dash_Punctuation} .

\p{Ps}

\p{Open_Punctuatlon} , (,

\p{Pe}

\p{Close_Punctuation} , ), , ,

\p{Pi}

\p{Initial_Punctuation} , , , <,

\p{Pf}

\p{Final_Punctuation} , , , >,

\p{Pc}

\p{Connector_Punctuatlon} ,
(, ).

\p{Po}

\p{Other_Punctuation} : !, &, , :, :: ,

\p{Cc}

\p{Control} ASCII Latin1 (TAB, LF, CR...)

\p{Cf}

\p{Format} ,
.

\p{Co}

\p{Private_Use} ,
( . .).

\p{Cn}

\p{Unassigned} , .

, ,

, 
\p{L&},
, , \p{Lu}\p{Ll}\p{Lt}.
. 3.9 (, Lowerca
se_Letter Ll), .
(
, LowercaseLetter, LOWERCASE_LETTER, LowercaseLetter, Lower
caseLetter . .),
, . 3.9.

162

3. :


( ), 
\p(}. , \p{Hebrew} ( 
) , .
,

(, ).
(Gujarati, Thai,
Cherokee...), (Latin, Cyril
lic). , 
Hiragana,
Katakana, Han ( ) Latin. 
.
,
, ( 
) . (, 
) . , 
IsCommon,
\\p{IsCommon}. , In
herited, , 
, .

, .
.
, Tibetan 256
U+0F00 U+0FFF. Perl java.util.regex 
\p{InTibetan}, .NET 
\p{IsTibetan} ( ).
: 
(Hebrew, Tamil, Basic_Latin, Hangul_Jamo, Cyrillic, Katakana)
(Currency, Arrows, Box_Drawing,
Dingbats).
Tibetan , 
, ,
, , 
.
:

, 
. , Tibetan 25% 
.

, ,
. , Currency

163

, $, , , (
, \p{Sc}).

. ,
( ) Latin_1_Supplement.


. ,
Greek Greek_Extended.

, .
 (,
Tibetan, Tibetan).
, . 3.10, 
. Perl java.util.regex Tibetan 
\p{InTibetan}, a .NET Framework 
\p{IsTibetan} (
, Perl (
Tibetan).

, , 
. . 3.10 ,
.
, 
\p{}. 
( , ),
, . .

. 
.

: [[az][aeiou]]
.NET
, , 
. , [[az][aeiou]] 
, [az],
, [aeiou], . . 
.
[\p{P}[\p{Ps}\p{Pe}]] 
, \p{P}, 
, [\p{Ps}\p{Pe}], . . 
, 
, ( >>.

164

3. :

3.10. //

Perl

Java

.NET

PHP/
PCRE

\p{L}


\pL


\p{IsL}


\\p{Letter}
\p{L&}


\p{Greek}


\p{IsGreek}
\p{Cyrillic}


\p{InCyrillic}

\p{IsCyrillic}
\{}

\p{^}

\p{Any}

\p{all}

\p{Assigned}

\P{Cn}

\P{Cn} \P{Cn}

\p{Unassigned}

\P{Cn}

\p{Cn} \p{Cn}

, 
. ( . 123.)

: [[az]&&[^aeiou]]
Java Sun
(, , 
). 
, ( , Java
,

[[az]&&[^aeiou]]). 
,
(OR) (AND).

165

.
: [abcxyz]
[[abc][xyz]], [abc[xyz]] [[abc]xyz].
,
. 

, |
or.
,
.

. 
, . , 
[\p{InThai}&&\P{Cn}]
Thai, . 
(. . , 
) \p{InThai} \{n}. ,
\{} ( P) , 
. , \P{Cn} 
,
, , , (
( Sun 
Assigned \P{Cn} , 
\p{Assigned}).
. 
. , 
[[this][that]]
, [this] [that],
[this]
[that]. .
, ,
[\p{InThai}&&\P{Cn}] : , 
\p{InThai} \P{Cn}, 
: ,
\p{InThai} \P{Cn}.
: 

.

[\p{InThai}&&\P{Cn}], ,
\P{Cn} [^\p(Cn}], 
[\p{InThai}&&
[|^\p{Cn}]]. , 
Thai Thai

166

3. :

, .
,
, [\p(InThai}&&[^\p(Cn}]]
\p{InThai} \p{Cn}.
, [[az]&&[^aeiou]], 
.
. , [this&&[^that]] :
[this] [that]. && [^]
,

[ &&[^]].


,
( 175),
. [\p{InThai}&&[^\p{Cn}]]
(?!\p{Cn})\p{InThai}.1 

, 
. 
( .NET InThai IsThai, 164):
(?!\p{Cn})\p{InThai}
(?=\P{Cn})\p{InThai}
\p{InThai}(?<!\p{Cn})
\p{InThai}(?<=\P{Cn})

POSIX: [[:alpha:]]
, , POSIX
(bracket expression).
POSIX 
, 2, 

.
1

, Perl
\p{Thai}, \p{Thai} , 
. 
Thai .
, 
, .
, .
http://unicode.org.


POSIX ,
, POSIX 
, .

167

POSIX 
, 
POSIX.
[:lower:],
( 119). 
[:lower:] az. 
,
, [az], [[:lower:]]. ,
,
, . . ( 
).
POSIX
, 
:
[:alnum:]

[:alpha:]

[:blank:]

[:cntrl:]

[:digit:]

[:graph:]

( ,
. .)

[:lower:]

[:print:]

[:graph:],

[:punct:]

[:space:]

([:blank:], ,
. .)

[:upper:]

[:xdlgit:] ,
(. . 09afAF)
, ( 159), 

POSIX. ,
, 
POSIX.


POSIX: [[.spanll.]]
(
,
. 
, ll (, , tortilla),
,

168

3. :

l m, s t,
ss. 
, , , 
spanll eszet.
,
POSIX, , 
( spanll),
. ,
[^abc] ll.

[..]: torti
[[.spanll.]]a tortilla. 
, 
. , 
,
!


POSIX: [[=n=]]
(
(character equivalents), ,  

. , 
 n, n ,
, a, a. ,
[::], 
, 
; , [[=n=][=a=]]
.
,
, 
.

([..], [.b.], [..] . .), 
[[=n=][=a=]] 
[na].

Emacs
GNU Emacs 
\w, \s . .; 
:
\s

, 
Emacs,

169

, 
Emacs.
, \sw , , a \s 
.

\w \s.
\S

Emacs ,

. , , 
, .




, .

: ^, \A
^ , 
, ( 147)
.
^
( 144).
\A ( )
.

: $, \Z, \z
. 3.11, 
. $ 
.

. , s$ (,
s)
s NL , s
.
$ 
. 

( 144). (
, Java $ 
, 442.)
( 147) $, 

( ).
\Z ( ) 
$ ,

170

3. :

,
. \z
. 
. 3.11.
3.11.

Java Perl PHP Python Ruby Tcl .NET

...
^

$ 1
,

$



(147)

^

$


...
^

^ 1

$

\A ^

\Z $ 1

\z

:
1

Java Sun
(143).
2
Ruby $ ^ ,
\A \Z .
3
Python \Z .
4
Ruby \A ^ .
5
Ruby \Z $
, .
.
( . 123.)

171


( ): \G
\G Perl 
/g ( 79). 
, .
\G , \A.
, \G
. , 
( s///g 
) 
\G .
Perl \G ,
.

, \G, ,
. 
, 
\G , 
.

Perl 
/c ( 380), ,
\G , 
. 

, 
.

\G
,
( pos 378). , ,
. , 

, .


. 
, \G Perl
,
. , 
.
?
,
\G
?
,
. . 268,

172

3. :

: 
x? abcde.
abcde, .

, , , 
. 
, 
\G
( 191). , s/x?/!/g
abcde, !a!b!c!d!e!.

,
. : 
\G? Perl s\Gx?/!/g
abcde !abcde, , Perl \G
.
\G .

\G Perl
, 
HTML $html , 

HTML ( <IMG> <A>,
&gt;).
Yahoo! , 
HTML .

Perl m//gc, 

,
( 380).
, 
, . 

,

,
.
my $need_close_anchor = 0; # True, <A>,
# </A>.
while (not $html =~ m/\G\z/gc) #
# ...
{

173

if ($html =~ m/\G(\w+)/gc) {
... $1 ,
, ...
} elsif ($html =~ m/\G[^<>&\w]+/gc) {
# ,
# HTML, .
} elsif ($html =~ m/\G<img\s+([^>]+)>/gci) {
... <IMG>  ...
.
.
.
} elsif (not $need_close_anchor and $html =~ m/\G<A\s+([^>]+)>/
gci) {
... ...
.
.
.
$need_close_anchor = 1; # </A>
} elsif ($need_close_anchor and $html =~ m{\G</A>}gci){
$need_close_anchor = 0; # ,
} elsif ($html =~ m/\G&(#\d+<\w+);/gc){
# &gt; &#123;
} else {
# , .
# HTML ,
# .
my $location = pos($html); #
# HTML.
my ($badstuff) = $html =~ m/\G(.{1,12})/s;
die "Unexpected HTML at position $location: $badstuff\n";
}
}
# , HTML <A>
if ($need_close_anchor) {
die "Missing final </A>"
}

, 
!a!b!c!d!e, , \G 

.
,
. , Micro
soft .NET java.util.regex 
, ( 
). , PHP Ruby \G 
, Perl. java.util.regex
.NET .

174

3. :

: \b, \B, \<, \>, ...


, 
. . (
( \<
\>), (
\b).
, (
( \B).
. 3.12. ,
, ,
.
.
3.12.

GNU awk

\< \>

\y

\B

GNU egrep

\< \>

\b

\B

GNU Emacs

\< \>

\b

\B

Java

\B

(?<!\pL)(?=\pL) (?<=\pL)(?!\pL)

\b

[[:<:]] [[:>:]]

[[:<:]]|[[:>:]]

.NET

(?<!\w)(?=\w) (?<=\w)(?!\w)

\b

\B

Perl

(?<!\w)(?=\w) (?<=\w)(?!\w)

\b

\B

MySQL

PHP
Python

(?<!\pL)(?=\pL) (?<=\pL)(?!\pL)

\b

(?<!\w)(?=\w) (?<=\w)(?!\w)

\b

Ruby

\b

\B
\B
\B

GNU sed

\< \>

\b

\B

Tcl

\m \M

\y

\Y

, , ASCII ( 
8 ), 
. ( . 123.)

,
, , 
. 
,
. ,
\w, . , 
Java Sun \w 
ASCII , Java, 
\pL
( \p{L} 159).

175


. 
NE14AD8
, a M.I.T. .

(?=), (?!);
(?<=), (?<!)


( 88).


?
(
!).
Perl Python,
. ,
(?<!\w) (?<!this|that) , a (?<!books?) (?<!^\w+:)
, 
. , (?<!books?),

(?<!book)(?<!books), .

, (?<!books?)
(?<!book|books). PCRE ( reg
) .
, 
, . ,
(?<!books?) , (?<!^\w+:) ,
\w+ .
Java Sun.
, 
, ,
, ( 
).
, .
, , 

, (?<!^\w+:) 
. , .NET
Microsoft, ,

( ,
, 
,
).

176

3. :


,
( 145),
( , ). 
, .

: (?),
(?i) (?i)

( 145)
. (?i)
( ) (?i) (). 
, <B>(?i)very(?i)</B> very
,
. , <B>VERY</B>
<B>Very</B>, <b>Very</b>.

(?i), Perl, java.util.regex, Ruby1 .NET.


Python, Tcl, (?i).
(?i),
, (. .
).
(?i) (?i) 
: <B>(?:(?i)very)</B>.

(i). 
, . 3.13. 
. 
, PHP
( 527), Tcl (. ).
3.13.

(145)

(146)

(146)

(147)

Ruby, (?i) ,
, |, 
( ,
).

177

: (?:),
(?i:)
, ,
. 
(?i:)
, .
<B>(?:(?i)very)</B> <B>(?i:very)</B>.
, , 
. Tcl
Python (?i),
(?i:).

: (?#) #
(?#).
,
( 146). 
, 

(, VB.NET) 132, 497.

: \Q\E
\Q\E Perl.
, \Q,
(, \; , 

). , , 
,
.
.
, WWW , 
, $query,
m/$query/i. $query 
C:\WINDOWS\, 
, 
, (
\).
Perl m/\Q$query\E/i; C:\WINDOWS\ 
C:\\WINDOWS\\,
C:\WINDOWS\, .

 ( 127),
.
,

. , VB

178

3. :

Regex.Escape ( 511), PHP


preg_quote ( 557), Java quote ( 470).

\Q\E java.
util.regex Sun PCRE (. . preg PHP).
, \Q\E 
Perl ( Perl) Perl
? , Perl \Q\E
( ,
), ,
. 7 ( 352).
java.util.regex \Q\E
. Java 1.6.0 
, 
Java.

, ,

: () \1, \2,
: 
.
(), \(\).
GNU Emacs, sed, vi grep.

,
. 68, 70 85. ,
, ,

\1, \2 . .

. , (
, ), 
, 
(, Perl
$1, $2 . .).

\1 ; sed
vi. . 3.14 ,

, .

: (?:)
(?:) ,

.

179

$1, $2 . . 
, (1|one)(?:and|or)(2|two)
$1 1 one, a $2 2 two. 
.
3.14.

GNU egrep

( )

GNU Emacs

(matchstring 0)
(\& )

(matchstring 1)
(\1 )

GNU awk

substr($text, RSTART, RLENGTH) \1 


(\& )
gensub

MySQL
Perl

67 $&

$1

PHP

532 $matches[0]

$matches[1]

Python

122 MatchObj.group(0)

MatchObj.group(1)

Ruby

$&

GNU sed

& (  \1 ( 
)

)

Java
Tcl

128 MatcherObj.group()

$1

MatcherObj.group(1)


regexp
129 MatchObj.Groups(0)

MatchObj.Groups(1)

C#

MatchObj.Groups(0)

MatchObj.Groups(1)

vi

&

\1

VB.NET

( . 123.)

.
, 
, 
$1. , 

,
(
6).

. 
. 106, $HostnameRegex
. , 

Perl m/(\s*)$HostnameRegex(\s*)/. ,

180

3. :

$1,
$2, : 
$4, $HostnameRegex 
:
$HostnameRegex = qr/[az09]+(\.[az09]+)*\.(com|edu|info)/i;

, 
$HostnameRegex . 
, Perl, 
, 
.

: (?<>)
Python .NET , 
, . Python 
(?<>), .NET (?<>) (
). .NET:

\b(?<Area>\d\d\d)(?<Exch>\d\d\d)(?<Num>\d\d\d\d)\b

Python/PHP:

\b(?P<Area>\d\d\d\)(?P<Exch>\d\d\d)(?P<Num>\d\d\d\d)\b


Area, Exch Num,
. , VB.NET
.NET RegexObj.Groups("Area"), # Regfex(
Obj.Groups["Area"], Python RegexObj.groups("Area"), PHP
$matches["Area"]. 
.

\k<Area> .NET (?P=Area) Python PHP.
Python .NET ( PHP) 
. ,
(###) ###,
(
.NET): (?:\((?<Area>\d\d\d)\)|(?<Area>\d\d\d)). 
, 
Area.

: (?>)
(?>)

( 216). , ,
, ( 
, . . ) ,

181

, 
. ,

.
.*! Hola!,
, .*
: (?>.*)!. .*
(Hola!), 
.* (
!), .
.* , 
; !
, .
,
. , 
( 217) 
,
( 329).

: ||

. 
. | ,
. 
| \|.

. , this and|or that
(this and)|(or that), this (and|or) that, 
and|or .

(this|that| ). ,
(this|that)?.1
POSIX , 
lex awk. , 
.
, , 
.

, (this|that|)

((?:this|that)?). (this|that)? ,
. 
,
.

182

3. :

: (?if then | else)


if/
then/else . if
, , a then
else . if 
, then, else.
else , | 
.
if ,

.


if ,
, 
.
HTML <IMG>, 
<></>. 
, 
( else) 
:
( <A\s+[^>]+> \s* )? # <>, .
<IMG\s+[^>]+>
# <IMG>.
(?(1)\s*</A>)
# <>,
# <>.

(1) (?(1)) , 
. 
, 
.
,
<>. (<)?\w+(?(1)>) 
, (<?)\w+(?(1)>) . 
.
() 
, ( ,
) . ()
< ,
,
< . , if (?(1))
.
( 180) 
.

183


if 
, (?=) (?<=).
, 
then.
else. :

(?(?<=NUM:)\d+|\w+). 
\d+ NUM:, 
\w+. .

Perl : 
Perl.
, 
(then else). 
7 . 392.

: *, +, ?, {min, max}
(*, +, ? ,
)
. ,
+ ? \+ \?. 
, 
, .
: {min, max} \{min, max\}

.
.
([az]{3} [az]\{3\} 
) , 
. [az]
[az][az],
( 307).
: , X{0,0} 
X. X{0,0} , 
: X ,
. , 
X{0,0} X , 
, 
.1 
1

{0,0} . 
! (
GNU awk, GNU grep Perl) {0,0} 
*, (
sed grep) ?. !

184

3. :


.

: *?, +?, ??, {min, max}?



*?, +?, ?? {min,max}?.
. 

.
,
. 
,
( 204).

: *+, ++, ?+, {min, max}+


(possessive) 
java.util.regex PCRE (. . PHP),
, 
.
, ,
,
. , 

( 
).

,
. .++
(?>.+), 
,
( 307).


, , 
. .
, ,
. 
4 , 
, 
, ,
,
. .
.
5
.

185

6
. 
, , 
. 6
( 
) ,
.
4, 5 6 .
, 
.
, 
, ,
. ,
, 
.

4



. , 

. , 
, 
, .
, ?
, 
(, Perl, Tcl,
Python, .NET, Ruby, PHP, Java
. .) , 
, ,

. , 
, .

!
,
. , , 
. ,
. 
, , . ?


, .
, 

187

, .
,
, .
. 
, .
,

.
, .
,
: .  ,
, .



: ,
.1 
, , ,
, 
. 
, , 
. ,
: ( ) 
( ).
,
, 
. , 
. 

, 
.


,
( ,
). , 
, 
.
, (
, ). 
1

,
. 
: 
.

188

4.

,
. , , 
. ,
.



: () ().
( 201),
( ).
, ,
, . , 
, .NET, PHP, Ruby,
Perl, Python, GNU Emacs, ed, sed, vi, grep
egrep awk. , 
egrep awk, lex flex.
,

( 

). . 4.1

, .
, , 
.
4.1.

awk ( ), egrep ( ),
flex, lex, MySQL, Procmail

GNU Emacs, Java, grep ( ), less, more,


.NET, PCRE, Perl, PHP ( 

), Python, Ruby, sed ( ), vi

POSIX HKA

mawk, Mortice Kern Systems, GNU Emacs ( 


)

GNU awk, GNU grep/egrep, Tcl

3, 20
, 
. POSIX,
, 

189

. ,
( ) 
, , ,
, 
. 
:

( POSIX
)

( : Perl, .NET,
PHP, Java, Python, ...)

POSIX

POSIX

POSIX (
). POSIX
, , 
( 166). 
, 
.
, , 
egrep, awk lex, 
, 
. 
, 
POSIX, .
, (POSIX
HKA), , 
, 
. 
, . ,
, 
. , 
. .



. 
. 
. ,
? 
?
? POSIX HKA? 
?

190

4.


, , 
, 
, 
, (
, 
?). , ,
, 
. , , 
. 4.1, 
,
.

?

, 
.
( 184)? , 
. , 
, POSIX .
nfa|nfa
not nfanot; nfa, , 
. nfanot,
POSIX , .

POSIX ?
POSIX
.

, 
.

. X(.+)+X =XX=============
=========, :
echo =XX========================================= | egrep 'X(.+)+X'

,
(
, , POSIX ). 
, , 
. 
,
.

191


, ,
.
( ),
.



,  
. 
,
, 
. , 

( , ).
Perl,
, 
,
. 
, 3 ( 149).

. ,
, , 
. ,
. 
:
1. .
2. (*, +, ? {m,n}) 
.
,
. .

1:
,
, , 
. 
( ).
, , 
.
: 
, 
( ). 
,

192

4.

( ) .
, 
, .
.
, 
(
).
, ORA FLORAL 
( ORA FLO). 
, (LOR), . 
, , ,

: FLORAL.
, 
. , cat
The dragging belly indicates your cat is too fat

indicates, cat,
. cat , cat indicates
, . ,
egrep, , (
, ,
(, ) .
: The dragging belly indicates your cat is
too fat fat|cat|belly|your? 
.



, 
. (
) ,
.
, 
, 
, . .
.
, ,
, 
. 
6.


.
, , , 

193

. 
, (*
), , . .
. 3 ( 149). ( 
) , , 

. 
.
(, , \*, !,

. .)

,  z !

 ? 
,

usa u, s,
a. 
, b b B, 
( , 
146).
, ,
, , 
. . ( 155) : 
.1

, ( 146), 
. 
, \w, \W \d.

,
, .
(, ^, \Z, (?<=\d) )
: (^, $, \G,
\b, ... 169) ( 
175).

(^, \Z, ...),
(\<, \b, ...). ,

.
1

, (167), 
POSIX 
, . ,
(145),
.

194

4.

. 192.
: 
, fat|cat|belly|your The
dragging belly indicates your cat is too fat , fat, 
fat .
,
fat , 
(. . 
) , 
. , 
,
. 
,

fat, cat, bll your,


.

,


, .
(
$1) . 
(), 
() ,
. 
.
, .1
, 
. , awk, lex egrep 
 $1.
, GNU egrep 
.
! , 
( 
) . 
, 
( 
, ).
1

, ,  
, . 
. 231.

195

2:

. 
(*, +),
. . , 
.
, ,
(?, *, + {min, max}) .
(, a a?, ()
()* [09] [09]+), 
, 
, , 
. 

. 
, 
. (
, 
.)
,
, 

.


 .
: \b\w+s\b,
, s (, regexes). \w+
, s 
. , \w+
regexes: 
s\b .
, 
, 
( ,
*, ? {min, max})... ,
. ,
.
(
) , , 
.
(
) . , , [09]+ 
March1998. 1
+. 

196

4.

,
, , 
998. 
[09] , +
.


, .
,
, .
( 83),
^Subject:. 
^Subject:(.*). 
($1 Perl).1
, .* , 
: ^Subject: ,
. ,
^Subject: , 
.* , 
* 
.
.* ? *

, $1.

, .*.
.* ,
, * 

( .* , , 
).
, ,
.


, 
, . , 
, 
.* ^Subject:(.*).*. : . 
.* ( ) ,
1


(
). 
, .

197

.* .
, * 
. .* 
, $2 .
, .*
, ? 
, . \w+s,
, 
,
.
^.*([09][09]).
, , 
$1. :

.* . ([09]
[09]) ,
: , .*, ! ,
. 
, ,
. ,
, . , 
, 
+.
, ^.*([09][09]) about24cha
racterslong. .* , 
[09] .* 
g ( ).

[09]  , .* 
n long.
15 , .* 4.
, [09].
, , . .* 
, 
. .* 2,
[09]. 4
, about24char. $1 
24.


^.*([09]+).
, ,
. ,
Copyright 2003.? 
.

198

4.


,  . .* ...
[09] ... .
, 
. , 
, ,
. , .




. , 
, .

: ,
, 
to(nite|knight|night)
tonight.
, t, , 
. 
. ,
;
, .
to(nite|knight|night) 
t. ,
t. , o 
, 
.
(nite|knight|night), 
nite, knight, night.
, .
, , 
, tonight 
. ( 116), 
, , 
.
, nite,
: n, 
i, t e. (
), . .
,
( ).
,
, .

199

. 197.

^.*([09]+) Copyright 2003.?
, ,
. .* 
,
[09]+.
3,
[09].
+, 
; 
, .

, .* 0 
. [09]+ 
, : ,
. 
:  , 
, .
$1 3.
, : [09]+
[09]*,
.*. + * ^.*([0
9]+) ^.*(.*), 
^Subject:(.*).* . 196, 
.* .



. , 

.

(, 
).
, 
( )
. 5 6 ,
,
. 
, .

200

4.

: ,
, 
. 
tonight 
, t:

t onight

: t o(nite|knight|night)


. 
:

toni ght

: to(ni te|knight|ni ght)

( , knight,
). , g,
. h t 
, ,
.
,
.

( ) . 

. , 

. , to()?
, 

. (to) 
, 
.

, , 
, ( ) , 
.


,
, ,
. , 
,
( 
).

201

.

( ).
, ,
( ), 
. 
,
, 
. , (

. , , 
( 
, ), ,
, 
. .
, 
, : (
()
(). , , 
. 
.1


,
,
. , 

. , tonight

:
to(ni(ght|te)|knight)
tonite|toknight|tonight
to(k?night|nite)

,  
.
, 
, .

, ...
! , 
,
, .
.

202

4.

:
, ,
.
,
( 
), 
. ,

abc [aaa](b|b{1}|b)c, .
:



.
,
.
,
. , 
, .
,

(backtracking).

:

, 
,
.
,
, (
) (
, ).
, 
.  
, 
, , 
. 
(
, ).


, . 
.

203

,
, .
,
. ,
, .
,
( ) .
. 
x?, , x 
. x+ +
, 
. x 
, ,
x. 
, , ... 
... . . , , 
, ,
(
, ).


to(nite|knight|night) 
hottonictonight! (, , , 
). t . 
h,
.
, .
. t , 
o , 
.
, tonic, .
to , 
. 
(
) , 
. , nite. 
n + i + t ...,
toni c, . 
, 
.
(, knight), . 
, night.
. ,
, tonic, 
.

204

4.

, 
tonight!. night
( , 
).


,
.
, 
? , 
? 
:
,
(, 
?, * ), (

.
. ,

. , 
. :

.
LIFO ( ).

, ,
. 
, . LIFO
:
,
.



(saved states). 
,
. 
, , 
. 
, 
, . , 
, .

205


ab?c abc. 
c :
a bc a b?c

, b?,
b ?  ? 
. ,
,
:
a bc ab? c

, ,
b?, 
, b (. . ). , 
, b , ?.
,
b. , 
:
ab c ab? c

, c, . ,
. 
, 
.


ac,
b. , 
. , , 
?, .
, , 
.

.
a c

ab? c

b. c c , 
.

bX. 
b ? :

206

4.
a bX ab? c

b ,
, c X. 
. c 
b, . ,
. 
,
,
.
? . 
.
. 
:
a bX ab?c

, , ,
. ( ab X abX )
,
.


, 
. , 
ab??c abc. a 
:
a bc a b??c

b??
: b
? ??,
,
.
, .
:
a bc a bc

, 
b , b
( , ,
; ,
). , 
, 
:

207

a bc ab?? c

c b,
:
a bc a bc

, 
c c. 
, ab?c, 
.


, 
, , .
,
. , 
? 
??. , 
* +.

*, +
x* 
x?x?x?x?x?x? ( , (((x(x?)?)?)?)?)1,
, .
, 
( ),
*. 
, * 
.
, [09]+ 1234num [09] 
4. 
, ,
+:

1 234
12 34
123 4
1234

num
num
num
num

, 
[09] 
. [09] ,
( 
) a1234 num [09]+
1

, 
; .

208

4.

.
. , ,
.
: a 1234num 
, + 
. ,
[09]*? ( !)
, .


^.*([09][09])
. 197. ,
, , .
.
95472,USA.
.* 
13 . 
 , * 13 
, ( ) .

^.* ([09][09]) ,
.

[09]. , .
: ( 
). , 
, , . . , 
.* A. 
A [09] .
.
, 
2.
[09] .
, . ,
[09] 

[09]. 
7, [09] 
.
( 2). , :
95472,USA, $1 72.
. , 
,
. 

209

. 208.
, 1234num
[09]* 1234
num?
, . ,
. : , *,
. 
, . 
, , 
.
1234num,
.
,
. [09]* 
 , 
:
a 1234

[09]*

1 
:
a 1234

[09]*

^.* ([09][09]) ,
, ^.* [09][09], 
, ...
[09]. 

, $1, .
, , *
( ),
, (
. .* ,
, 
. 
, , ^.*([09]+) [09]+
( 197).


( )
, (

210

4.

, 
).
, ,
. , 
. ,
. ,
. ,
,
, .

. 
, , 
( 6).
, 
. 
,
, 
. ,
, .


, .*
.1 , .*
, . ,
, 
.
. 
, . 
".*",
, .*, ,
:
The name "McDonald's" is said "makudonarudo" in Japanese

, 
, .
" 
.*, 
. ( 
), , 
. 
, , :
1

, 
, 
, 
.

211

The name "McDonald's" is said "makudonarudo" in Japanese

, . ,
.*,
,
.
"McDonald's"? ,
, , ,
, . .* 
[^"]*, .
"[^"]*"
. [^"]*
. 
, McDonald's.
, [^"] 
, 
. , 
:
The name "McDonald's" is said "makudonarudo" in Japanese

, 
, [^"]
, a . . 
,
[^"\n].


HTML. , 
<B>very</B> very 
, .
<B></B> , 
. , 
<B> </B>. ,

.*:
<B>Billions</B> and <B>Zillions</B> of suns
<B>.*</B>,
.*
, </B>.
, <B>, </>.

, 
, ,
, . , 
<B>[^</B>]*</B> . 
, </>. 

212

4.

[^</B>] .
, , <, >,
/ B. [^/<>]. 
, , </> . ( ,
</B> , ,
; 
.)


, 
. 
( 184),
*? *.
, <B>.*?</B> :
<B>Billions</B> and <B>Zillions</B> of suns
<B> 
, ,
, 
<:
<B> Billions

<B>.*? </B>

< , 
.*?, 
( , ).
, . 
B <B>Billions *? 
, .
, 
. < , 
.*? . 
.*? Billions,
< ( 
</B>):
<B>Billions</B> and <B>Zillions</B> of suns
, *
, . ,
, 
,
( ). 
, 
. , 
. 
<B>.*?</B> :
<>illions and <B>Zillions</B> of suns

213

. ,
, 
. .*?
<B> Zillions </B>.
, 

. ".*" [^"]

.

( 175), 
. (?!<B>) 
, <B> . 
<B>.*?</B>, 
a ((?!<B>).) , 
, , .
, ,
( 146)
:
<>
(
(?! <> )
.
)*?
</>

#
#
#
#
#
#

<>
...
<> ...
...
...

, 
, 
:
<>
(
(?! </?> )
.
)*
</>

#
#
#
#
#
#

<>
...
<> </> ...
... .
( )
...


</B> <B>. , 
,
. 
; 
6 ( 329).



( 79).

214

4.

, :  
1.625 3.00 
1.625000000002828 3,00000000028822. 
:
$price =~ s/(\.\d\d[l9]?)\d*/$1/;

, ,
, $ric. \.\d\d
, a [19]? 
, .
:
, 
,
$1. $1 
. , 
. $1 
. 
, .
, . . \d*
.

, , . , , 
$price .
27.625 (\.\d\d[19]?)
. \d*
, .625 .625 , 
.
, , 
,  
(. . \d* , 
)? ,
! \d* \d+:
$price =~ s/(\.\d\d[l9]?)\d+/$1/

1.625000000002828
, , 9.43 \d+
, . 
, ? ! , 
(, 27.625)? 
, . 

27.625. , 
5.
. ,
(\.\d\d[19]?)\d+ 27.625, ,
\d+ . 

215

, [19] 5 (
,
. [19]? ,
5 \d+. 
, , : .625 
.62, .
[19]?
? ,
5 ,
[19]?? 
. , 
.

,
, : 

, 
. 
( 
),

. , (
,
, .

(, !),

.
, ,
,
, 
. 
,
,
( 6).
, ,

, .
, ".*" 
:
The name "McDonald's" is said "makudonarudo" in Japanese

, ".*"
, ".*?" 
.

216

4.



.625 
.
. ,
,
, 
, 
. , .
, .625 ,
. 
(\.\d\d[19]?) \d+, 
. , , [19]
, 
. ,
, ,
 [19] (
, ,
, 
[19]).
 ( ,
?
[19])? , , 
[19] .
! 
, , 
.5000? [19]
, [19],
\d+ .
, ,
, 
[19] , 
[19] . [19] , 
[19], 
. (
,
(?>) ( 180) 
[19]?+ ( 184). .

(?>)
(?>) , ,
(. . 
), ,
, . , 

217

, , 

. ,
, 
,
, ( 
).
\.\d\d(?>[19]?))\d+.
, , [19]
, 
, ?. 

\d+. 
, 
.
[19] ,
,
, ?, . 
, ,
. .625,
, , .625000. 
, \d+ .625000
. .625, 
\d+ , 
[19] .
, , 
.625 , .


,
(. 215) :
, 
, . 
, , 
.
,
.

:
. , 
, 
.
.625000. , 
.

218

4.

. ,
, 
. .625.
. 
,
.
.

. 
.
: (?>.*?)? ,
? , .
. 
:
, 
, . , 
, , 

,
.
. 
^\w+: Subject.
, , 
,
.
, : \w+ 
.
,
\w+ ( , 
).
: ,
:
Subjec t

^\w+ :

:
( t). 
,
:
S ubject

^\w+ :


.

, .

219

, 
, +!
, , \w+, 
, 
: ^(?>\w+):. 
; 
\w+
( ).
, , 

( 
307).
6 ( 329), 
. , 
.

?+, *+, ++ {max,min}+



, , 
. , + 
,
^\w+. ++
( , ).
, 
. \w++
(?>\w+), .1
^(?>\w+):
^\w++:, (\.\d\d(?>[19]?))\d+ (\.\d\d[19]
?+)\d+ .
, (?>M)+ (?>M+). 
,
M, , 
M . 
, M+,
.
, (?>M)+ (?>M+) ,
M++, 

(, (\\"|[^"])*+)
?> : (?>\\"|[^"])*. ,
1


, 
(307).

220

4.

. 218.
(?>.*?)?
! 
! *? 
*, 
,
. 
, 
, 
.  
, ,
.
, 
, 
; M++ (?>M)+. 
, + 
: (?>(\\"|[^"])*).


, (
2, 88) 
. 
: 
. , 
/ ,
( )
( ).

?

. 
. , 
? 
,
. 
,
, ,
.
?
,
, 

221

. , , 
,
, , . 
, ,
.
( ),
.
, 
. 
, (, 
), . 
? , 
.



, ,
,
: 

( 
, , Tcl), 

. ?>

(?=())\1 , ^(?>\w+): ^(?=(\w+))\1:.


\w+ 
.
, 
( ,
).

( , ,
), .
; \1 ,
,
. \1 ,
.

, 
\1.
, \w+:
:.

222

4.

?
, 

. 
,
? ,
, ? 
, , .
, ,
.
( )?
, Perl, 
Java, .NET ( 188). 
,
.
(Subject|Date):.
, ,

Subject. ,
, :. ,
, 
( Date), 
. , (
. 
,
( ).
, tour|to|tournament, 
threetournamentswon
? ()
, three to
urnamentswon. , tour. 
,
tour 
. .
, , 
, , . 
,


, , ,
, .
.
POSIX , 
,
( tournament). Perl, PHP,
.NET, java.util.regex 

223

( 188) 
.


(\.\d\d[19]?)\d* . 214. ,
\.\d\d[19]? \.\d\d, \.\d\d[19],
(\.\d\d|\.\d\d[19])\d* ( 

). 
?
, .
.
, \.\d\d. , 
\d*, . 
, \d* , , 
,
( , , 
, 
\d* ). , 
, , 
. 
.
: (\.\d\d[19]
|\.\d\d)\d*, , 
,
(\.\d\d[19]?)\d*. 

[19], .
,
, .
[19]? 
, 
?. 
, 
.
,
. , ,

a*((ab)*|b*) 
, . 
, (ab)*, 
, ( b*)
. :

*((ab)*|b*|.*|partridgeinapeartree|[z])

. :


224

4.

,
.



,
,
. , 
Jan 31.  ,

Jan[0123][09], Jan00
Jan39, Jan7.
, 
. 1 9

0?[19], . [12][09] 
10 29, a 3[01] 
. ,
Jan(0?[19]|[12][09]|3[01]).
, Jan 31 is my
dad's birthday? , , Jan 31, 
Jan 3. ?
, 0?[19], 
0? , , 



, . 
, .
1

01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
1

20 21 22 23 24 25 26 27 28 29
30 31

01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19

31|[123]0|[012]?[19]

20 21 22 23 24 25 26 27 28 29
01 02

01 02 03 04 05 06 07 08 09

30 31

[12][09]|3[01]|0?[19]

10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31

0[19]|[12][09]?|3[01]?|[49]

, POSIX

225

[19] 3.
, .
, 
, 
. : Jan([12][09]
|3[01]|0?[19]).
: Jan(31|[123]0|
[012]?[19]). , 
. , ,

Jan(0[19]|[12][09]?|3[01]?|[49]),
. 
( 
, 
).

, POSIX
,
: 
 
, . 
.
, ,
, .



. ,
one(self)?(selfsufficient)? one
selfsufficient. one,
(self)?, (selfsufficient)? 
sufficient. ,
, . ,
oneselfsufficient 
( POSIX ,
).
, oneselfsuf
ficient. , (self)?  
,

(selfsufficient)?. ,
,
. , 

.

226

4.

,
: . 
, . 

. 
\, .
:
SRC=array.c builtin.c eval.c field. gawkmisc.c i. main. \
missing. msg.c node. re. version.

= 
^\w+=.*,
(,
). 
, (\\\n.*)*,
^\w+=.*(\\\n.*)*. ,

, \+ . 
, 
. , .* , \
, 
( 197).
, 
, .

, 
^\w+=.*?(\\\n.*?)*. 
, 
, , \\ 
\ . . 

, ,
= , 
 . 
SRC=;
, .
.
( 235).

POSIX ,

POSIX ,
, ,
, .

, POSIX

227

,
. , 
. , 
? POSIX ,
oneselfsufficient
,
.

.
? ,
, .
, 
. POSIX HKA.

(self)? ,
one(self)? (selfsufficient)?
one selfsufficient. oneselfsufficient,
, POSIX

oneselfsufficient.
7 , 
POSIX Perl
( 402).



, POSIX
, .
POSIX 
. 6 
, 
.



. 
.
?
,

( , ).
,

 , 

228

4.

.
.

, 
, 
.
.

:

, 
.

.
, 
. 
.

, ( )
. 
, 
. ? 
, ,
, 
.
.
,
, 
.
? 
. , 
,  .
, , , 
.
, 
,
Compilers Principles, Techniques, and Tools (Ad
disonWesley, 1996), (Alfred Aho),
(Ravi Sethi) (Jeffrey Ullman),
 
. ,
, 
Principles of Compiler Design,
.

, POSIX

229

,
, .

( ) 
. 
.
, 
,
.


, 
.


,

.
. 

POSIX .



.
,
.

, 
, (
6) , 
. , 
( ).
, .
, POSIX 
, 
, 
, . POSIX 
.
,
, .
, ,
^
( 192).
6.

230

4.

,
( , ), 
, 
, , 
.

,
.

. 
, ,
. ( ,
.) ,
, 
.


( , POSIX)
, . 
,
 . 
,
, 
. 
, 
, , .

, 
, :

,
. 
, 
;


1 ( 175);


. 
( 
), 

lex (trailing context),



,
.

231

, POSIX

:
?
, 
.
,
, 
. . 228 ,

, .
, . 
, .
GNU grep , .
, ,
. 
GNU awk 

GNU grep, ,
, 
. 
, GNU awk
gensub.
Tcl 
. (,
,
120).
Tcl 
, ,
.
,
POSIX (225), 
, 6.
.

, 
;
( 184) 
( 180).


,
.
( , )
.

232

4.

, 
ed 7 ( 1979 )
350 (,
grep 478 ).
( 8, 1986 ), 
, 1900 .
rx POSIX , (Tom Lord)
GNU sed, 9700 .
,
egrep 7 400 , 
POSIX (1992 ) 
4500 .
, GNU
egrep 2.0 (
8300 ), /
Tcl 9500 .
, . 

Pascal. Pascal 
, 
. 
,

.

, , 
, . , 
, . ,
, . , ,
. 
, ( ,
, 
). ,
,
 .

. 
( 198),
( 200). . 201.
POSIX ( 226)
:
(
)

233

POSIX ( 
)

, POSIX ( 
, )

, ,
,
. 
, 
. . 4.1 ( 188)
, 
( 190) ,
.
, :
,
. 
( 191).

?
,
. .
( 225). ,
( 227).
,
.
( 202, 207). 
: , , (* ) (
( 195), () 
( 216). 
( 222), POSIX .
POSIX .
, 
( 6).

. , 
, , 
, .
,
,
. 
.

5



, 
, .

: , ,
, . , 
,
.
.

, . 
( 
) .
. 
. 
,
.
, HTML, 
HTML. 
, ;
. , 
! 
,
.
,
, 
.

235



:
, ,



(. . ,
)
.

grep, , ,

. ,
.

:
, , 
. 
.
, . 
, ^(display|geometry|cemap||quick24|random|
raw)$, ,
 . 
, 
 ,
, 100 
. , 
.
, .



( 226). 
, ^\w+=.*(\\\n.*)* 
:
SRC=array.c builtin.c eval.c field. gawkmisc.c io. main. \
missing. msg.c node. re. version.

, .* \ 
(\\\n.*)*, 
. : , 
\,

236

5.

, [^\n\\] (
\n ; ,
,
( 157)):
, :

^\w+=[^\n\\]*(\\\n[^\n\\]*)*

, ,
: \ ( , 
). 
, . ,
, ,
.
: 
,
, . 
, : 
,
. 
( \ \n), \+ 
. \\.
,
\+ .
, ^\w+=([^\n\\]|\\.)*,
. 
^ 
(147);
.
, .
(330).

IP
. 
IP (Internet protocol), . . , 
(, 1.2.3.4). 
001.002.003.004.
IP, [09]
*\.[09]*\.[09]*\.[09]*, ,
and then.....?. 
: 
(
, , ).
, ,
. 

237

, IP,
^$. :
^[09]+\.[09]+\.[09]+\.[09]+$

[09] Perl \d,


:1 ^\d+\.\d+\.\d+\.\d+$,  
, IP,
1234.5678.9101112.131415. ( IP 
0255.) ,
, ^\d\d\d\.
\d\d\d\.\d\d\d\.\d\d\d$. . 
, (
, 1.234.5.67). 
{min, max}, 
^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$,
\d\d?\d? \d(\d\d?)?.
,
.
, 
.
, , \d{1,3}
999, 255
IP.
IP 0 255, 
. : 0|1|2|3|253|254|255.
,
0|00|000|1|01|001|,
. , 
, 
, .
.
, 
. 
, 
,
\d|\d\d. ,
0 1,
000199. , 
[01]\d\d; \d|\d\d|[01]\d\d.
1

, , .
\d ,

[09], . ,
\d ,
ASCII (157).

238

5.

, 1
( 54) ( 224).
, 2, ,
255 . , 
5, , . ,
6. 2[04]
\d|25[05].
, 
, , . 
\d|\d\d|[01]\d\d|2[04]\d|25[05].
, ,
[01]?\d\d?|2[04]\d|25[05]. 
, 
. : \d\d?
\d?\d
. 

. 
, ,
.
, ,
0 255. 
\d{1,3} . 
(
):

^([01]?\d\d?|2[04]\d|25[05])\.([01]?\d\d?|2[04]\d|25[05])\.
([01]?\d\d?|2[04]\d|25[05])\.([01]?\d\d?|2[04]\d|25[05])$

! ? 
. 0.0.0.0,
, 
. ( 175),
, (?!0+\.0+\.0+\.0+$)
^,  
,
.
 
. , ^\d{1,3}
\.\d{1,3}\.\d{1,3}\.\d{1,3}$,
, $1, $2, $3 $4, 
.


^ $. 
.

239

ip=72123.3.21.993 ( 
) i=123.3.21.223.

223, . (
, ( 
), . 
, [01]?\d\d?, ,
$ .
( 224), 
.
,
,
(,
POSIX , 
).
,
. ! . 
. , .
1.2.3.4.5.6.
,
 
. , 
(?<![\w.])(?![\w.])
, 
[\w.] (
). 
(^|)(|$).


( /usr/local/bin/perl UNIX
\Program Files\Yahoo!\Messenger Windows) 
. , 
, ,
, Perl, PHP, Java VB.NET. 
, 

, .


, 
/usr/local/bin/gcc g.
.
/ ( \ Windows) . 
/, ,
. , .* 

240

5.

, . ^.*/
.* ,
/, .
,
f, .
Unix:

Perl

$f =~ s{^.*/}{};

PHP

$f = preg_replace('{^.*/}', '', $f);

java.util.regex

f = f.replaceFirst("^.*/", "");

VB.NET

f = Regex.Replace(f, "^.*/", "")

( , 
) , 
.

Windows, 
\, /.
^.*\\. ,
\ 
\\, 

\ 
, .

Perl

$f =~ s/^.*\\//;

PHP

$f = preg_replace('/^.*\\\/', '', $f);

java.util.regex

f = f.replaceFirst("^.*\\\\", "");

VB.NET

f = Regex.Replace(f, "^.*\\", "")


. \\\\,
\ Java ( 135).
:
?
/, , 
. .
, 
, 
. , ,
^ ( 

241

) ,
/. ,
. .*
, / \.
, 
.*, .
, (
!
,
.
( )
.
,
,

(, ).

, , 
.* , 
, 
( 301).

, .


: 
, 
, [^/]*$. 
, 
. Perl:
$WholePath =~ m{([^/]*)$}; #
# $WholePath.
$FileName = $1;
#

: , 
, . 
$ ,
. , 

$1,
( /, 
).
[^/]
*$ . ,
, , 

242

5.

. /usr/local/bin/
perl 40
. , 
local/. [^/]*$ 
l /, $ 
l, a, c, o l (
). ,
l l/,
l l/ . .
,
, 40
40 !..
,
.
, .
, ,


. 
.



: . 
; , . , 
^(.*)/(.*)$, 
$1 $2. 
, , , ,
, .* 
$2 , /. .*
, 
/. 
.* . 
, $1 , $2 .
: ,
(.*)/ (.*) , .
, 
. 
,
[^/]*. ^(.*)/([^/]*)$. 
, , 
.

,
. 
file.txt, , ,

243

.
:
if ( $WholePath =~ m!^(.*)/([^/]*)$! ){
#  $1 $2
$LeadingPath = $1;
$FileName = $2,
} else {
# , /
$LeadingPath = "."; # "file.txt"
# "./file.txt" ("."  )
$FileName = $WholePath;
}


(, ) 
. 
, , ,
. ,
 
, . 
. 
,

. , 
\bfoo\([^)]*\).
.

 foo. , 
. ,
foo(2,4.0) foo(somevar,3.7), , 
. , foo(bar(somevar),3.7) .
 [^)]*.
, ,
:
1. \(.*\)

,

2. \([^)]*\)

3. \([^()]*\)

. 5.1 , .

244

5.

val = foo(bar(this), 3.7) + 2 * (that  1);



2
1

. 5.1.

, 
1, . 
(this),
foo, 
. , .
, ,
(
. 
, Perl, .NET PCRE/
PHP ,
(. 394, 515 563 ). ,
,

, . 
:

\([^()]*(\([^()]*\)[^()]*)*\)

, ,
. 
, . 
Perl $depth 
, .
Perl ,
:
$regex = '\(' . '(?:[^()]|\(' $depth . '[^()]*' . '\))*' $depth . '\)';

.
1

.* ,
, *
. ,

.* .

245


, ,
. ,
HTML ,
<HR> ( 
).
s/*/<HR>/, ,
, . ?
: s/*/<HR>/ <HR> 
, , !
: , , 
. * 
.
, .
*.
.
,
, . 
,
,
. 
: ?[09]*\.?[09]*.
, ,
1, 272.37, 129238843., .191919 .0. .
, ,
thistexthasnonumber, nothinghere 
? 
.
, , 
. 
, 
. , 
, num123, 
, !
, .
,
. 
,
. 
+: ?[09]+.

, 
.

246

5.

\.?[09]*, [09]* 
.
, , .
( ,
): (\.[09]*)?. 
, 
. 
: , [09]*
.
, ?[09]+(\.[09]*)?. 
.007, 
.
, 
, ,
(,
).

: ?[09]+(\.[09]*)?|?\.[09]+.
, , 
. , ... 
,
? . , ?
,
?([09]+(\.[09]*)?|\.[09]+).
,
2003.04.12. ,
, 
, , 
, . 
, 
, ^$
num\s*=\s*$.


IP , 
, 
.
, (, , ) 
. :

,
/* */.
HTML, <>,
<CODE>.

247

, HTML, su
per exciting a <I>super exciting</I> offer!.
.mailrc. 
, :
alias _

: alias jeff jfriedl@regex.info. 


, .
, ,
, a passport needs a
"2\"x3\" likeness" of the holder..
CSV (commaseparated values ,
).

:
1. .
2. ( , 
).
3. .
, ,
, 

.


2\"3\". ,
.
,
.
, . 
, (. . 
[^"]), . 
, , 
\. 

( 175). , "([^"]|(?<=\\)")*",
2\"3\".
, 

. , . 
,
:
Darth Symbol: "/|\\" or "[^^]"

248

5.
Darth Symbol: "/|\\" or "[^^]"


\, ,
; , 
.
\, ,
\\ , 
. ,
\, , 
. 
.
,
.
(\\.), ,
([^"]); , 
"(\\.|[^"])*". , !.. , . 
 ,
,
 ( ) :
"You need a 2\"x3\" photo.

? , 

( 213). ,
, ,
,
:
2\"x3 \"

(\\.| [^"])

[^"] \,
.
.

,
, ,
.
, 
, 
. , 
, .
? . 235,

\, [^"] [^\\"].
, \ 

249


. "(\\.|[^\\"])* 
. , 
; 
301.
.
, , 
(,
).
, , 
( 184) 
( 180) "(\\.|[^"])*+
"(?>(\\.|[^"])*) . 
, , 
,
. , , ,
.

,
, 
. ,
(
, ,
( 
).


,
, 
.
, 
, . 
, a, , 
( 140), .
, 
.
,
, . ,
, 


( 146).
, (?s:.), 
.

250

5.

, ,
, .
, .
, 
, ,
 .
,
, 
. 
, 
.
, .


,
.

:
s/^\s+//;
s/\s+$//;

* +, 

.
 
,
. ,
, 
.
s/\s*(.*?)\s*$/$1/s

, 
 , 
( Perl 5 ). ,

\s*$, 
.
s/^\s*((?:.*\S)?)\s*$/$1/s

, ,
. 
^\s* , .* 
. 
\S 

HTML

251

, , 
\s*$
.
, 
, . 
, .
s/^\s+|\s+$//g
.
( ),
,
,
.
/g 
, .
/g, 
, 
. 4 .
,

. ,
,
,
.
s/^\s+//;
s/\s+$//;

, 
.

HTML
2 
HTML ( 97), 
URL.
,
HTML.

HTML
HTML <[^>]+>
, Perl
:
$html =~ s/<[^>]+>//g;

,
>, HTML: <input

252

5.

name=dir value=">">. HTML 


< > , 
, .
<[^>]+> , 
.
<> , 
, 
, >, . HTML 
, . 
, 
"[^"]*" [^']*'.

[^'">], :

<("[^"]*"|'[^']*'|[^'">])*>

, 
:
<
(
"[^"]*"
|
'[^']*'
|
[^'">]
)*
>

# "<"
#
. . .
#
, ,
#
...
#
, ,
#
...
#
"" .
#
# ">"

. ,
, , 
,
. 
,
, 
.
,

+ *? , , 
, (,
alt=""). +, *, 
[^'">]
()*. (
, ([^'">]+)*) , 
; 
( 279).
,
: ,
, , 

253

HTML

( 178).
, , >
, 
. , 
, .
, ,
.

(?>) ( *, 
).

HTML
, URL .
:
< href="http://www.oreilly.com">O'Reilly Media</a>

<A> , 
. <A> 
, ,
URL.

* <a\b([^>]+)>(.*?)</a>,

. <A> 
$1, $2. , [^>]+
, .

,
.
<A> , 

. URL href=. HTML
,
/ ( 
).
, 
Perl, $Html:
# while(...) 
#
while ($Htm1 =~ m{<a\b(^>]+)>(.*?)</a>}ig)
{
my $Guts = $1; #
my $L1nk = $2; # .
if ($Guts =~ m{
\b HREF

# "href"

254

5.
\s* = \s*
(?:
"([^"]*)"
|
'([^']*)'
|
([^'">\s]+)
)
}i)

# "="
# ...
#
, .
#
...
#
, ,
#
...
#

#
#

{
my $Url = $+; # $1, $2,
# .
print "$Url with link text: $L1nk\n";
}
}

.

.
,

, .
, 
> ,
=.
+, 
href. He
, 
+ . 252? , 
, 
. ,
.
URL
$1, $2 $3. 
. Perl $+, 
$1,
$2..., . 
URL.
Perl $+ , 
URL. 

, 
.
( 180),
VB.NET . 257 (, .NET
, $+
, 502).

HTML

255

HTTP URL
URL,
, , (
). URL
, URL ;
.
, URL, 
. ,

^http://, / ( ),
: ^http://([^/]+)(/.*)?$.
, URL 
, 
: ^http://([^/:]+)(:(\d+))?(/.*)?$.
Perl URL:
if ($url =~ m{^http://([^/:]+)(:(\d+))?(/.+)?$}i)
{
my $host = $1;
my $port = $3 || 80;
# $3,
# ;
# 80
my $path = $4 || "/";
# $4, ;
# "/".
print "Host: $host\n";
print "Port: $port\n";
print "Path: $path\n";
} else {
print "Not an HTTP URL\n";
}



[^/:]+. , 2 ( 106)
[az]+(\.[az]+)*\.(com|edu||info). 
, ?
, 
. 
(, , URL), 

. , http://
, [^/:]+ 
. 
2 
, .
, , 
. ,

256

5.

, 
.
, ,
ASCII, , .
, 
[az09]|[az09][az09]*[az09],
. 2,
(com, edu, uk ) 
. ,
:
^
(?i) # .
# , ...
(?: [az09]\. | [az09][az09]*[az09]\. )+
# ...
(?: com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[az][az] )
$


, 
: 63 
. , [az09]*

[az09]{0,61}.
, 
. , (com, edu
. .), .
,
. ,
, ai, :
http://ai/ . 
cc, co, dk, mm, ph, tj, tv, tw.
, 
(?:)+ (?:)*.
:
^
(?i) # .
# , ...
(?: [az09]\. | [az09][az09]{0,61}[az09]\. )*
# ...
(?: com|edu|gov|int|mil|net|org|biz|inf|name|museum|coop|aero|[az][az] )
$

, 
. , 
, 
, :
? ,

257

HTML

? , .
, 
2 
. , , 
.

VB.NET
HTML
, Html:
Imports System.Text.RegularExpressions
.
.
.
'

Dim A_Regex as Regex = New Regex(
_
"<a\b(?<guts>[^>]+)>(?<Link>.*?)</a>", _
RegexOptions.IgnoreCase)
Dim GutsRegex as Regex = New Regex( _
"\b HREF
(?# 'href'
)" & _
"\s* = \s*
(?# '=' )" & _
"(?:
(?# ... )" & _
" ""(?<url>[^""]*)"" (?# ,
)" & _
" |
(?# ...
)" & _
" '(?<url>[^']*)'
(?# ,
)" & _
" |
(?# ...
)" & _
" (?<url>[^'"">\s]+) (?#
)" & _
")
(?#
)" , _
RegexOptlons.IgnoreCase OR RegexOptions.IgnorePatternWhitespace)
' 'Html...
Dim CheckA as Match = A_Regex.Match(Html)
' ...
While CheckA.Success
' <>, URL.
Dim UrlCheck as Match = _
GutsRegex.Match(CheckA.Groups("guts").Value)
If UrlCheck.Success
' , URL/
Console.WriteLine("Url " & UrlCheck.Groups("url").Value & _
" WITH LINK " & CheckA.Groups("Link").Value)
End If
CheckA = CheckA.NextMatch
End While

,
.

258

5.

VB.NET, ,
Imports, 
.
(?#), 
VB.NET , 
#
( 
# 
).

# ,
&chr(10) ( 497).

( 137).
,

Groups("url") Groups(1), Groups(2) . .

URL
Yahoo! Finance, 
.
,
HTML, ( 
http://finance.yahoo.com 10 ,
, ).
, 
( ) 
, ,
, URL, , (
. 
, ,
Yahoo! .
URL mailto,
http, https ftp. http://, 
, URL,
http://[\w]+(\.\w[\w]*)+.
(
ASCII) [az09] \w.

\w , 
, ,
.
URL http:// mailto::
visit us at www.orei11y.com or mall to orders@orei11y.com

HTML

259

. 

, :
(?i: [az09] (?:[az09]*[az09])? \. )+ #
# .com
(?i: m\b
| edu\b
| biz\b
| org\b
| gov\b
| in(?:t|fo)\b # .int .Info
| mil\b
| net\b
| name\b
| museum\b
| coop\b
| aero\b
| [az][az]\b #
)

(?i:) (?i:)

( 176). URL
www.OReilly.com, NT.TO (
Nortel Networks ) 
, , 
. URL (
, .com) , 
URL .
( URL,
), ( 
. .) . , 
(?i:) ,
, URL 
, .
URL ,
:
\b
# URL (://_ _)
(
# ftp://, http:// https://
(ftp|https?)://[\w]+(\.\w[\w]*)+
|
#
_____
)
#
( : \d+ )?

260

5.
# URL / ...
(
/
)?

,
(. . http://www.oreilly.com/catalog/
regex/). , 
, .
2, , URL ,
URL. :
Read his comments at http://www.oreilly.com/ask_tim/index.html. He...

index.html URL
, index.html URL.
,
, 
, . 
2 ,
, URL 
. Yahoo! Finance 
, 
, (
).
URL
\b
# URL (://_ _)
(
# ftp://, http:// https://
(ftp|https?)://[\w]+(\.\w[\w]*)+
|
#
(?i: [az09] (?:[az09]*[az09])? \. )+ #
# .com
(?i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t;fo)\b # .int .info
| mil\b
| net\b
| org\b
| [az][az]\b #
)
)
#
( : \d+ )?
# URL / ...

261

HTML
(
/
# ,
[^.!,?;"'<>()\[\]{}\s\x7F\xFF]*
(?:
[.!,?]+ [^.!,?;"'<>()\[\]{}\s\x7F\xFF]+
)*
)?


Java
String SubDomain = "(?i:[az09]|[az09][az09]*[az09])";
String TopDomains = "(?xi:com\\b
\n" +
"
|edu\\b
\n" +
"
|biz\\b
\n" +
"
|in(?:t|fo)\\b \n" +
"
|mil\\b
\n" +
"
|net\\b
\n" +
"
|org\\b
\n" +
"
|[az][az]\\b \n" + //
")
\n";
String Hostname = "(?:" + SubDomain + "\\.)+" + TopDomains;
String NOT_IN
= ";\"'<>()\\[\\]{}\\s\\x7F\\xFF";
String NOT_END
= "!.,?";
String ANYWHERE = "[^" + NOT_IN + NOT_END + "]";
String EMBEDDED = "[" + NOT_END + "]";
String UrlPath
= "/"+ANYWHERE + "*("+EMBEDDED+"+"+ANYWHERE+"+)*";
String Url =
"(?x:
\n"+
" \\b
\n"+
" ##
\n"+
" (
\n"+
"
(?: ftp ; http s? ): // [\\w]+(\\.\\w[\\w]+)+
\n"+
" |
\n"+
"
"+ Hostname + "
\n"+
" )
\n"+
" #
\n"+
" (?: :\\d+ )?
\n"+
"
\n"+
" # URL / \n"+
" (?: " + UrlPath + ")?
\n"+
")";
// , ,
Pattern UrlRegex = Pattern.compile(Url);
// url...
.
.
.

262

5.

,
; 2 (
. 105), . , Java (
) , .
,

. . 106 (
$HostnameRegex) . 261.



.
,

, .

, 
, 
( 
).
, 
, 5 .
, , ,
44. ,
:
03824531449411615213441829503544272752010217443235


\d\d\d\d\d
. Perl @zips = m/
\d\d\d\d\d/g;. , 
( , 
, Perl $_
110).
find . 
,
, Perl.
\d\d\d\d\d. , 
, 

; (
, ;
, ).

263

, \d\d\d\d\d 44\d\d\d , 
44, 
,
, 44 
. 44\d\d\d 
5314494116.
, \A,
, 
. 
,
. 
, ,
.



. , 
((44\d\d\d)). 

(?:), 
$1:

(?:[^4]\d\d\d\d|\d[^4]\d\d\d)*
:
, , 44 (
, [^4] [12359], 
, , 
). , 
(?:[^4][^4]\d\d\d)*, 
43210.

(?:(?!44)\d\d\d\d\d)*
,
, 44.
, ,
.
. (
44) (?!44) ,
.

(?:\d\d\d\d\d)*?
, 
, (. . 
, , 
, ). 
(?:\d\d\d\d\d)
. * 

264

5.

,
; , 
.
(44\d\d\d), :
@zips = m/(?:\d\d\d\d\d)*?(44\d\d\d)/g;

44,
(
@array = m//g Perl , 

; 375).
, , 

,
.


, 
? !
, 
, 
. ,
,
, 
.
:
03824 53144 94116 15213 44182 95035 44272 7 5 2 0 10217 443235

(
). 
, ,
, . 44272 
,
. ? , . 

, 
. 
10217 44323.
,
, 
. 
, 
.
( , 
, ) , 

265

(44\d\d\d) ,
?. ,

(?:(?!44)\d\d\d\d\d)* (?:[^4]\d\d\d\d|\d[^4]\d\d\d)* 

( , 
). (44\d\d\d)? 
, , .
. , 

, . ,
 
.

\G
, 
\G ( 171). 
, 
, 

.  ,
\G ,

( Ruby 
, \G ,
171).
,
@zips = m/\G(?:(?!44)\d\d\d\d\d)*(44\d\d\d)/g;

 .


, ,

. 
, , 
.
\d\d\d\d\d , 
44. Perl :
@zips = ( ); #
while (m/(\d\d\d\d\d)/g) {
$zip = $1;
if (substr($zip, 0, 2) eq "44") {
push @zips, $zip;
}
}

266

5.

\G
. 172, 
Perl.

,
, , 
(CSV Comma Separated Values), ,
. , 
, CSV, 
, .
CSV Microsoft
Excel, .1 Microsoft
CSV. 
(. . ,
), (
" "").
:
Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K

:
TenThousand
10000
2710

10,000
It's"10Grand",baby
10

,
. 
,
, [^",]+.
,
, . 
"", " .
, 
[^",]|"" "" "(?:[^"]|"")*" ( 
, (?:) 
(?>),
317).
, 
: [^",]+|"(?:[^"]|"")*".
1

CSV Microsoft
6 (330), 
.

267

,
(. 146):
# , ...
[^",]+
# ......
|
# ... ( )
" #
(?: [^"] | "" )*
" #

, 
CSV. 
, , .

, "" ".
. 
, 
. 
(), ""
". , 
,
. , 
, 
, , 
:
# , ...
( [^",]+ )
# ......
|
# ... ( )
" #
( (?: [^"] | "" )* )
" #

, , 
. , 
"" ".
Perl, ( 
 ) Java VB.NET
( , 10
PHP 569). , 
$line,
( , !)
while ($line =~ m{
# ,
# ..

268

5.
( [^",]+ )
# ......
|
# ... ( "")
" #
( (?: [^"] | "" )* )
" #
}gx)
{
if (defined $1) {
$field = $1;
} else {
$field = $2;
$field =~ s/""/"/g;
}
print "[$field]"; #
# $field...
}

, , :
[TenThousand][10000][2710][10,000][It's"10Grand",baby][10K]

, , , 
. $field , 

(10,000). ,
.
, , [^",]+ [^",]*
, ?
. :
[TenThousand][][10000][][2710][][][][10,000][][][It's"10Grand",...

 ! ,
.

( )* ,
. ,

TenThousand ,10000.
( ),
, 
. ,
, 

, 
( 171). 
( 
, ).

269


,
. 
, .
.
1. . 
, 
.
2. , 
,
. , .
, . 
( ),
,
( 
, ).
^|, $|,
, .
:
(?:^|,)
(?:
# ,
# ...
( [^",]* )
# ......
|
# ... ( )
" #
( (?: [^"] | "" )* )
" #
)

,
:
[TenThousand][10000][2710][][][000][][baby][10K]

:
[TenThousand][10000][2710][][10,000][It's"10Grand",baby][10K]

? , 
, 
... ? , .
. 224: ... (
(
, (
. 
, [^",]* ,

270

5.

,
.
,
.
:
(?:^|,)
(:? # ( )
" #
( (?: [^"] | "" )* ) )
" #
|
# ... ,
# .
( [^",]* )
)

!.. , 
. ? , 
. 
,
, 
\G ,
, 
. ,
, 
. \G,
.
,
,
[TenThousand][10000][2710][][][000][][baby][10K]


[TenThousand][10000][2710][][]

,
.


. ,
, 
.

^|,, (?<=^|,).
, 3 ( 175),

, .
, (?<=^|,) 
(:?^|(?<=,)), , 

271

, 
. ,
,  , 
"10, 000".
, .

CSV Java
CSV,
Sun java.util.regex. 

8 ( 476).
import java.util.regex.*;
..
.
String regex = // group(1)
//  group(2)
"\\G(?:^|,)
\n"+
"(?:
\n"+
"
# ...
\n"+
"
\" #
\n"+
"
( (?: [^\"]++ | \"\" )*+ )
\n"+
"
\" #
\n"+
" | # ......
\n"+
"
# , ... \n"+
"
( [^\",]* )
\n"+
")
\n";
// .
Matcher mMain = Pattern.compile(regex, Pattern.COMMENTS).matcher("");
// ""
Matcher mQuote = Pattern.compile("\"\"").matcher("");
.
..
// ,
mMain.reset(line); // CSV
// line
while (mMain.find())
{
String field;
if (mMain.start(2) >= 0)
field = mMain.group(2); //
//
else
// ,
field = mQuote.reset(mMain.group(1)).replaceAll("\"");
// ...
System.out.println("Field [" + field + "]");
}

272

5.

, ,
( ).
(?=$|,) 
.
? ,
,
,
, 
.


,
, 

( 180). (?:[^"]
|"")* (?>[^"]+|"")*. VB.NET 
.
( 184),
Java Sun, 
. Java.
. 
, . 330.

CSV
CSV ,
Microsoft, ,
. 
.

, ; (

?).
,
.
\ (. . 
"" \").
, \
( ).
. 

; 
\s*, . .
(?:^|,\s*.

273


( 249) [^"]+|"" [^\\"]+|\\.. ,
s/""/"/g s/\\(.)/$1/g
.

CSV VB.NET
Imports System.Text.RegularExpressions
.
.
.
Dim FieldRegex as Regex = New Regex( _
"(?:^|,)
" & _
"(?:
" & _
"
(?# ...)
" & _
"
"" (?# )
" & _
"
( (?> [^""]+ | """" )* )
" & _
"
"" (?# )
" & _
" (?# ... ...)
" & _
" |
" & _
" (?# ... , ...) " & _
"
([^"",]*)
" & _
" )", RegexOptions.IgnorePatternWhitespace)
Dim QuotesRegex as Regex = New Regex(" "" "" ") ',
'
.
.
.
Dim FieldMatch as Match = FieldRegex.Match(Line)
While FieldMatch.Success
Dim Field as String
If FieldMatch.Groups(1).Success
Field = QuotesRegex.Replace(FieldMatch.Groups(1).Value, """")
Else
Field = FieldMatch.Groups(2).Value
End If
Console.WriteLine("[" & Field & "]")
' ...
FieldMatch = FieldMatch.NextMatch
End While

6



Perl, Java, .NET, Python PHP ( 
,
. 188). , 

. ,
, . 
, 
. 
.
. , 
, , 
. 4 5,
,
(, , 
,
). , , 
, . 
, 
, 
.
, , 
. , 
, ,
,
, .

, ,
,

275

. , 
,
.


, 
. 
, 
,
. , marty smarty
: m s (), 
m m, a a . . ( 
). ( 
, 
).
,
, ,
, , , . . ,

; ,
.
: ,
, , .
, , ,
.
, . 

, ,
(, 
, 
). 
, .

POSIX
, 
( POSIX
). ,
. , 
, .
, 
.


, , 
. . 249

276

6.

"(\\.|[^\\"])*" , 
. ,
\. ,
, 
, . 
( ) 
\\., , 
[^\\"]. 
, , ,
.



, , 
, ,
[^\\"] ,
a \\. . [^\\"] , 

(, , *,
, ).
6.1 
. 
, .
.
, 
:


, POSIX ?


, ?

"(\\.|[^"\\])*"

"2\"x3\"

likeness"

"([^"\\]|\\.)*"

"2\"x3\"

likeness"

, ,

. 6.1. ( )

277

,
. 279. ,
, .


,
,
? 
, 
. "(\\.|[^"])*", ( 248)
. 
\\
, 
. 
, ,
. , , 
, .
, [^"] , 
,
:
"You need a 2\"3\" photo."

, , 
, .


. 6.1 , *
, 
( )
. , 
.
, , ,
, ,

[^"\\] (
) , [^\\"]+ ()*
. 
. 

* . 
.
,

. . 6.2 , .
"(\\.|[^\\"])* (
. 6.2) ,

278

6.

"(\\.|[^"\\])*"

"2\"x3\"

likeness"

"(\\.|[^"\\]+)*"

"2\"x3\"

likeness"

"([^"\\]|\\.)*"

"2\"x3\"

likeness"

"([^"\\]+|\\.)*"

"2\"x3\"

likeness"

, ,

. 6.2. ( )

, *.
. 6.2 , 

.
+ , 
, , ,
*. *
,
,
,
( 
).
6.1 , , 

*. 
,
.
.
6.1.

"([^\\"]|\\.)*"
. . .

"([^\\"]+|\\.)*"
. . .

"makudonarudo"

16

13

17

"2\"x3\" likeness"

22

15

25

"very99 long"

111

108

112

279


. 277.
?
POSIX . 

,
. ,
, , 
,
.
?
. 
,
( POSIX 
). , , ,
, .

( , ).

"(\\.|[^"])*"

"2\"x3\" likeness"
"makudonarudo"

"([^"]|\\.)*"

POSIX

. .

. .

32

14

22

48

30

28

14

16

40

26

109

111

325

216

86

124

86

124

86

"very99 long" 218


"No \"match\" here

124

, POSIX 
, 
( ). 
( ) 
, 
.


, .
, 
. :
, POSIX .
, , , "verylong"

280

6.

(324 518 553 658 426 726 783 156 020 576 256,
325 ). , .
50 ...
 .1
! ? 

+, *, 
,
.
. .


+, [^\\"] *,
([^\\"])* 
. , 
. . ,
. 
,

. 
, .
([^\\"]+)* , 
+ * , 
. makudonarudo. 
12 *,

[^\\"]+ (m a k u d o n a r u d )? 
, *, [^\\"]+ 
(makudonarudo)? , *,
[^\\"]+ 5,
3 4 (makud ona rudo)? 2, 2, 5 3 (ma ku do
nar udo)? ...
, (4096 12
). 
, POSIX 
, ,
(
).
, , 
, !2 4096 12 
, 
1

;
.
: ,
n, 2n+1. 2n+1+2n.

281

20 . 30 

, 40 . ,
.
! . POSIX
. , ,
. POSIX 
, 
. , 
,
. "No\"match\"here

8192 .
,
. , 
,  . ,
. 
, ,
:

, , , .

, .
, POSIX .
,

(
306).
, ,
.
, 

( ,
). , 
4 ( 190).
, 
. 
, ,
. 
. ,

, 
, .

282

6.



. . 

, 
.

. ".*"
The name "McDonald's" is said "makudonarudo" in Japanese

. 6.3.

, .
(), 
, A. 
, 
( 192) , ,
.

.* , 
* .
46 , .*, ,
46 ,
. 
.*
".* " anese .
, 
. , 

".*"

The name "McDonald's" is said "makudonarudo" in Japanese


A

B
C
D
E
F
G
H

. 6.3. ".*"

POSIX NFA

283

( ), . 

Japanes e. .
, A
B, 
B C.

".* " arudo "inJapa ( C).


, D
:
The name "McDonald's" is said "makudonarudo" in Japanese

. 
, 
.

POSIX
POSIX
, , 
,
. ,
, 
.


, . 
, DEF FGH BCD, 
, F
D.
I
.
, , 
( ), POSIX 
D.


, . 
".*"!, 
. 
, , , 
.
. 6.4. I
. 6.3. , 
D ( 
). , 
. 6.4

284

6.

".*"!

The name "McDonald's" is said "makudonarudo" in Japanese


A

B
C
D

E
F
G

H
I
J

K
L

M
N
O
P
Q

R
S
T

W
X
Y

. 6.4. ".*"!

, POSIX :
, POSIX , . .
.
, 
I, , 
. , J, Q
V, ,
. , Y 
, . 
. 6.4,
.

[^"]. 
, ,
. 
"[^"]*"! [^"]*
, .

285

. 6.5 , (
. 6.4). , 
. ,
.

"[^"]*"!

The name "McDonald's" is said "makudonarudo" in Japanese


A

G
H
I
J

O
P
Q

T
U
V

W
X
Y

. 6.5. "[^"]*"!


, 
. 
makudonarudo , 
u|v|w|x|y|z [uvwxyz]. 
1, [uvwxyz]
( 34) 
:
The name "McDonald's" is said "makudonarudo" in Japanese

u|v|w|x|y|z
,
204 . ,
 ,
1

, 
, 
.

286

6.

, .
, , 
,
.
,
,
.
.
,
.

, 
,
. 
.
, 
.
, ,
.
^(a|b|c|d|e|f|g)+$ ^[ag]+$. 
Perl, 
. ( , 
) Perl :
use Time::HiRes 'time'; # time()
# .
$StartTime = time();
"abababdedfg" =~ m/^(a|b|c|d|e|f|g)+$/;
$EndTime = time();
prlntf("Alternation takes %.3f seconds.\n", $EndTime  $StartTime);
$StartTime = time();
"abababdedfg" =~ m/^[ag]+$/;
$EndTime = time();
printf("Character class takes %.3f seconds.\n", $EndTime  $StartTime);

( 
),
.
.

. ,
,
.
.

287

, !
. 
,
.
Perl
:
Alternation takes 0.000 seconds.
Character class takes 0.000 seconds.

:
, 
. , 
, 10 10 000 000 
, .
. 

1/100 ,
.
. 
,

. (
,
. Perl
; 
.

, :
use Time::HiRes 'time';

# time()
# .
$TimesToDo = 1000;
#
$TestString = "abababdedfg" x 1000; #
$Count = $TimesToDo;
$StartTime = time();
while ($Count > 0) {
$TestString =~ m/^(a|b|c|d|e|f|g)+$/;
}
$EndTime = time();
printf("Alternation takes %.3f seconds.\n", $EndTime  $StartTime);
$Count = $TimesToDo;
$StartTime = time();
while ($Count > 0) {
$TestString =~ m/^[ag]+$/;
}
$EndTime = time();
printf("Character class takes %.3f seconds.\n", $EndTime  $StartTime);

288

6.

: $TestString $Count 
. $TestString 
x,
. Perl 5.8
:
Alternation takes 7.276 seconds
Character class takes 0.333 seconds

, 22 .

, 
, .


, 
:
$TimesToDo = 1000000:
$TestStrlng = "abababdedfg";

1000 , 1000
. , 
, :
Alternation takes 18.167 seconds
Character class takes 5.231 seconds

, . ?

, . . $Count 
1000 , .
11 5 . 
, 
(
,
1000 ).
, , 

.

PHP
PHP, 
preg:
$TimesToDo = 1000;
/* */
$TestString = "";
for ($i = 0; $i < 1000; $i++)

289

$TestString .= "abababdedfg";
/* */
$start = gettimeofday();
for ($i = 0; $i < $TimesToDo; $i++)
preg_match('/^(a|b|c|d|e|f|g)+$/', $TestString);
$final = gettimeofday();
$sec = ($final['sec'] + $final['usec']/1000000) 
($start['sec'] + $start['usec']/1000000);
printf("Alternation takes %.3f seconds\n", $sec);
/* */
$start = gettimeofday();
for ($i = 0; $i < $TimesToDo; $i++)
preg_match('/^[ag]+$/', $TestString);
$final = gettimeofday();
$sec = ($final['sec'] + $final['usec']/1000000) 
($start['sec'] + $start['usec']/1000000);
printf("Character class takes %.3f seconds\n", $sec);

:
Alternation takes 27.404 seconds
Character class takes 0.288 seconds

PHP
not being safe to rely on the system's timezone settings (
),
:
if (phpversion() >= 5)
date_default_timezone_set("GMT");

Java
Java 
. , 
, :
import java.util.regex.*;
public class JavaBenchmark {
public static void main(String [] args)
{
Matcher regex1 = Pattern.compile("^(a|b|c|d|e|f|g)+$").matcher("");
Matcher regex2 = Pattern.compile("^[ag]+$").matcher("");
long timesToDo = 1000;
StringBuffer temp = new StringBuffer();
for (int i = 1000; i > 0; i)
temp.append("abababdedfg");
String testString = temp.toString();
// ...
long count = timesToDo;

290

6.
long startTime = System.currentTimeMillis();
while (count > 0)
regex1.reset(testString).find();
double seconds = (System.currentTimeMillis()  startTime)/1000.0;
System.out.println("Alternation takes " + seconds + " seconds");
// ...
count = timesToDo;
startTime = System.currentTimeMillis();
while (count > 0)
regex2.reset(testString).find();
seconds = (System.currentTimeMillis()  startTime)/1000.0;
System.out.println("Character class takes " + seconds + " seconds");
}
}

:
. ,
.
, (VM)
. JRE Sun 
: , 
, , 
.

:
Alternation takes 19.318 seconds
Character class takes 1.685 seconds

:
Alternation takes 12.106 seconds
Character class takes 0.657 seconds

,
, 
, 
. 
JIT (JustInTime, . . 
, ),
, 
.
Java ,
BLTN (Better Late Than Never ,
), ,
.
BLTN , ,
, . 
, (,

291

), , 
(. . BLTN
).

:
//
for (int i = 4; i > 0; i )
{
long count = timesToDo;
long startTime = System.currentTimeMil1is();
while (count > 0)
regex1.reset(testString).find();
double seconds = (System.currentTimeMillis()  startTime)/1000.0;
System.out.println("Alternation takes " + seconds + " seconds");
}


(, 10 ), BLTN 
, 
.

8 25%:
Alternation takes 11.151 seconds
Character class takes 0.483 seconds

Java 
. 
, 
.

VB.NET
VB.NET:
Option Explicit On
Option Strict On
Imports System.Text.RegularExpressions
Module Benchmark
Sub Main()
Dim Regex1 as Regex = New Regex("^(a|b|c|d|e|f|g)+$")
Dim Regex2 as Regex = New Regex("^[ag]+$")
Dim TimesToDo as Integer = 1000
Dim TestString as String = ""
Dim I as Integer
For I = 1 to 1000
TestString = TestString & "abababdedfg"
Next
Dim StartTime as Double = Timer()

292

6.
For I = 1 to TimesToDo
Regex1.Match(TestString)
Next
Dim Seconds as Double = Math.Round(Timer()  StartTime, 3)
Console.WriteLine("Alternation takes " & Seconds & " seconds")
StartTime = Timer()
For I = 1 to TimesToDo
Regex2.Match(TestString)
Next
Seconds = Math.Round(Timer()  StartTime, 3)
Console.WriteLine("Character class takes " & Seconds & " seconds")
End Sub
End Module

:
Alternation takes 13.311 seconds
Character class takes 1.680 seconds

.NET Framework 
, Regex 
RegexOptions.Compiled ( 487). 
:
Alternation takes 5.499 seconds
Character class takes 1.157 seconds

Compiled ,
( 3 , 
1,5 ). , 

, .

Ruby
Ruby:
TimesToDo=1000
testString=""
for i in 1..1000
testString += "abababdedfg"
end
Regex1 = Regexp::new("^(a|b|c|d|e|f|g)+$");
Regex2 = Regexp::new("^[ag]+$");
startTime = Time.new.to_f
for i in 1..TimesToDo
Regex1.match(testString)
end
print "Alternation takes %.3f seconds\n" % (Time.new.to_f  startTime);
startTime = Time.new.to_f
for i in 1..TimesToDo
Regex2.match(testString)

293

end
print "Character class takes %.3f seconds\n" % (Time.new.to_f  startTime);

:
Alternation takes 16.311 seconds
Character class takes 3.479 seconds

Python
Python:
import re
import time
import fpformat
Regex1 = re.compile("^(a|b|c|d|e|f|g)+$")
Regex2 = re.compile("^[ag]+$")
TimesToDo = 1250;
TestString = ""
for i in range(800):
TestString += "abababdedfg"
StartTime = time.time()
for i in range(TimesToDo):
Regex1.search(TestString)
Seconds = time.time()  StartTime
print "Alternation takes " + fpformat.fix(Seconds,3) + " seconds"
StartTime = time.time()
for i in range(TimesToDo):
Regex2.search(TestString)
Seconds = time.time()  StartTime
print "Character class takes " + fpformat.fix(Seconds,3) + " seconds"

 Python 
, 
(
).
(
) .
:
Alternation takes 10.357 seconds
Character class takes 0.769 seconds

Tcl
Tcl:
set TimesToDo 1000
set TestString ""
for {set i 1000} {$i > 0} {incr i 1} {
append TestString "abababdedfg"

294

6.
}
set Count $TimesToDo
set StartTime [clock clicks milliseconds]
for {} {$Count > 0} {incr Count 1} {
regexp {^(a|b|c|d|e|f|g)+$} $TestString
}
set EndTime [clock clicks milliseconds]
set Seconds [expr ($EndTime  $StartTime)/1000.0]
puts [format "Alternation takes %.3f seconds" $Seconds]
set Count $TimesToDo
set StartTime [clock clicks milliseconds]
for {} {$Count > 0} {incr Count 1} {
regexp {^[ag]+$} $TestString
}
set EndTime [clock clicks milliseconds]
set Seconds [expr ($EndTime  $StartTime)/1000.0]
puts [format "Character class takes %.3f seconds" $Seconds]

:
Alternation takes 0.362 seconds
Character class takes 0.352 seconds

! 
, . 188, Tl 
/, 
.
, , Tl . 
. 298.




. :
. ( \d+)
, 
, 
;
. ,

, 
,
. , 
, \A ( ), 
, 

.

295

,
. 
( ,
),

, .


, 
.
, 
. 
, , 
, 
. , 
,
, 
.
. \b\B ( , 
)
. ,
\b\B 
,
. 
. .

. ? , , 
, .
\b\B , 
.
, ,
\b\B ,
.
, , 
, 
.1

\b\B, 
. 
,

(this |this other), .



(?!). , Perl,
. 399.

296

6.


, 
, . 
, , 
,
.

.



,

. , 
.

:
1. . 
.
, .
2. . 
.
3. . 
, 
, 4.
,
:

(, S,
u, b, j, e, ... Subject) .
, 
;


( , 
) , 
( );

. ,
, , $1, $2
.
 , 
, ,
.

297

4. . ,

. , POSIX
, 
, 
. 
, 
.
5. . , 
,
( 3).
6. .
, 
, ,
.

,
.


, 
.
, 
, .


 2 ( 85).
, 
, :
while () {
if ($line
if ($line
if ($1ine
if ($1ine
if ($1ine
.
.
.
}

=~
=~
=~
=~
=~

m/^\s*$/ )
m/^Subject: (.*)/)
m/^Date: (.*)/)
m/^ReplyTo: (\S+)/)
m/^From: (\S+) \(([^()]*)\)/)


.
.
, 
. 
() 
.

298

6.


,
. . 126, 
: , (
.

, Perl awk,
. 

;
, 
.

.

, Tcl

, ,
. ,
, ,
, 
, . 4
( 202), (,

this|that th(is|at))
. , 
, .
Tcl
/? Tcl
,
( 120). 
. 
2000 Usenet:
... Tcl 

, . 
, ,
; 
. 

.

Tcl 
. 
, .

299

(. .
) 
. m/^Sub
ject:\Q$DesiredSubject\E\s*$/ 
, 
. ,
, 
.
,
, , 
. 
( , 
)
, . 
, 
.
, 
, 
.


,

.

, .
, , 
,
. 
, 
.


.

 
, . 
, 
(, , 
). 

(. . , 
).
GNU Emacs 20 , Tcl 30 
, PHP 4000 . .NET

300

6.

15 , 
.

: ,
, ,
, , 
.


 
. 
(New Regex, re.compile Pattern.compi
le .NET, Python java.util.regex).
3 ( . 128) 
, 
(, 
) 
. ,
. 289, 291 292.


. 
.


/
( )
, 
, 

, 
.

, 
.
, ^Subject:(.*) Subject: 
. ,
( (r(Moore), 
( ,
). ,

. 
, 
(, Subject: : 
t).

301

^Sub
ject:(.*), , 
. , ,
this|that|other th,
, . ,
,
th , , 
t h, 
.

. 
, 
th(is|at) this|that.

// . 302.


^Subject:(.*)
, 
. ,
. 
,
, :\d{79}: (
81 ).

. 303.



, 
, 
, 
.

/
, 
, ^, , 
.
, 
/ , 
. , 
, , ^(this|that)
, ^,

302

6.

^this|^that.
^(this|that) , , ^(?:this|that) 
.

\A, 
\G ( ).


, , ,
.* .+
,

^. (
/ (. ), 
.
, 
, .*

.+ , 
, . ,

(.+)X\1 X, 
^ , 
"12342345".1

/
, 
, $
( 169),
. , 
regex(es)?$
2 , 
. 
.

//
(
/. 
( 
),

1

: Perl 
10 , 
Perl (Jeff Pinyan) 
2002 . , (.+)X\1 
, .

, , $ 
(171).

303

. , this|that|other
, [ot];

, 
. , 
.


(
, , 
. 
, \b(perl|java)\.regex\.info\b
.regex.info. 
, 
.regex.info 
.
, 
. 
\b(vb|java)\.regex\.info\b, 
, , 
. ,

\b(w+)\.regex\.info\b,
.


, 
( . 301),

, 
.



, ,
abc , 
a, b, c. 
, 
.


*, + ,
(,
), ,

. 

304

6.

, 
.
, 
.*
,
. , 
.
, .* (?:.)* , 
.* (?:.)*.
:
Java Sun 10% , Ruby
.NET . Python 
50, PCRE/PHP 150 !
Perl , ,
.* (?:.)* ( 
, ).


, (?:.)* 
.*, .


, ,
, ,
.
[.] \..


"(.*?)" 
() ,
("). 

( (
, ). 
,

, .
: 
, 

, .

,

305

,
.


(, ['"]
['"](.*?)["'], (
, . 303).



. , . 304
, 
Java Sun
10% . .NET Framework 
2,5 
, PCRE 150
. Perl (. . 
).

? . 150 
PCRE
,
, , 
. , 
, 
.
, 
, 10 
Java 150 PCRE. 
(?:.)* 11 , Java,
.* 13 !
Java Ruby
, Ruby 
2,5 Java.
, Ruby
10% Python, 
Python 20
Ruby.
, 
Perl. , 
Perl 10% Py
thon. ,

.

306

6.


, . 279,
(, (.+)*) 
. 

, .
, , 
,
.
, 10 000 , .*?
, 10 000 
, 
. , 
,
.

( 
). , Python 10 000 
. , 
, , 
.

. 

, 
, , "(.*)", "(.)*", "(.)*?" "([^"
])*?". 
, 
. . 293.



. 
,
, .


. , 
,

. , , ,
, .

,

307


. 
, 
, . , 
1/100 ,
.
.
; 
. , 
, , 
.




(
). 
( 184) . 
,

(
,

).


4 ( 219), ^\w+:
Subject. \w+ 
, , 

: , \w+
. ,

^(?>\w+): ^\w++.
.
, , (
, (
, 
.
,
, 
. ,
.

308

6.

,
. .* 
,
.


\d\d\d\d, 
\d{4}. 
? 
, 
? . 
, \d{4}
, , 

. ? .
, Perl, Python, PHP/PCRE .NET 
\d{4} 20% . ,
Ruby Java Sun \d\d\d\d 
( ).
, ,
. !
==== ={4}. , 
, ,
==== 
.
(
/ ( 302), .
Python Java Sun,
==== ={4} 100 !
, Perl, Ruby .NET 
====, ={4},
(
\d\d\d\d \d{4}).


:
,
(, 
), 
.
,
, 
.
, 
, Tcl. Tcl

309

, . 
, .NET Framework ,
.


, 
. 
,
, .

. 

.

. 
, 
( ,
). , x+ xx*

,
( 302) ( 302).

. ,
, 
.
, :
(?=t) this|that 
 ( 302)
, , 
t.

.
,
.

this|that. th,
th, th 
, 
. ,
th(?:is|at).

th , 
, 
. , th(?:is|at) 
th,
.

, 
.
:

310

6.


, 
, ;

, 
, ;

, , 
,
( 
);


, , 
;


,
;

( )
, . 
, ,
.

: (000|999)$ Perl
.

.

( ). !? , 

/ ( 302). 
Perl
,
.
,
,
, 
. 
, 
.

,
,
.



.  ( 127)

311

. ,
,

.
(, GNU Emacs Tcl) 
, , 
, ( 299).
(, Perl) 

, 
, 
. , Perl 
( 418).


, 
, (?:) ( 72). 
 
, ,
.
,
( 304).


, : 
.
, .*, 
(.)*. 
, ,
!


, 
, ^.*[:]. He 
,
,
, . ,
[.]
[*] (\.
\*).
( 146).
:
Perl, , 
^[Ff][Rr][Oo][Mm]: ^from:
. Perl 

312

6.

, 
. 
, 
.


, .*, 
^ \A ( 169). 
,
, . .
( 301) 

.


, ,
: 
, 
. ,
, 
, ,
,
.


xx* x+
x. {5,7}
{0,2}.



th(?:is|at) (?:this|that) , 
th . 
, :

(?:optim|standard)ization. , 
,
.



, ,
( ^, $
\G).
, . , 
, .

313

^ \G
^(?:abc|123) ^abc|^123 , 
/ ( 301)
. , 

. PCRE ( , 
) 
, 
.
(^abc) ^(abc).

,

,
. 
. ( PCRE,
Perl .NET) , 
( Ruby Sun Java)

.
, Python ,
. , 
Tcl ( 298).

$

. abc$|123$ (?:abc|123)$ ,

. Perl, 
(
/ ( 302). 
(|)$, ($|$).



, 
. 
, ^.*: ^.*?:;
, 
. ,
?
( ), 
, .

314

6.

.
,  , 

.  ,
. 
, , 
, 

,
( 304).
. 
, 
, .
,
, (,

^.*?: ^[^:]*:). ?
,

. 
Perl, , (
.



. 
: ,
, , January,

February, March . . , 
January|February|March|. 
, ,
( 303)
. ,
,
.

. Perl,
, , 

HASH(0x80f60ac). ,
. 
, , :

\b(?:SCALAR|ARRAY||HASH)\(0x[09afAF]+\).
, 
. ? Perl 
, 

315

, ( 431),
. , (
( 302), 
, (0 
. , 
,
. , Perl 
, 

. , .
,
. , 
, 
? , , \(0x(?<=(?:SCALAR||HASH)
\(0x)[09afAF]+\) ( ). \(0x
, (
)
, .
, \(0x, 
. , 
(
, /
( 302). ,
, Perl 
( 175), .
, Perl
\(0x , :
if (Sdata =~ m/\(0x/
and
Sdata =~ m/(?: SCALAR|ARRAY||HASH)\(0x[09afAF]+\)/)
{
#
}

\(0x ,

, 
. 
( ) ( ).1

. , ,
DBIx::DWIW, CPAN. 
MySQL. (Jeremy Zawodny)
Yahoo!.

316

6.


( 302) 
,

( 175). 
,
.
, Jan|Feb||Dec 
(?=[JFMASOND])(?:Jan|Feb||Dec). 
[JFMASOND] .
, 

. ,
,
, 
(Java, Perl, Python, Ruby, .NET). ,
[JFMASOND]
Jan|Feb||Dec.
PHP/PCRE ,
pcre_study PCRE (ghb
S 567). , Tcl
( 298).
, [JFMASOND],
, , , 
. 
,
?

[JFMASOND](?:(?<=J)an|(?<=F)eb||(?<=D)ec)
, 
, 
, . ,
, (
, 
[JFMASOND] .
,
Jan|Feb||Dec 
. 


. ,
. 
, .

317

Tcl!
, 
. . 298
,
Tcl , 
. 
, !
(?=[JFMASOND]) 
Tcl 100 .

PHP!
,
PHP, PHP 
study ( S). 
10 . 567.



( 180)
( 184) ,
. , 
^[^:]+: 
, [^:]+,
, , 
. ^(?>[^:]+): 
^[^:]++: , 
+ ( ).
,
( . 307,

).
, 

, . 
, ^.*: , ^(?>.*):
. .*, 
, :.
,
:, .



. ,

318

6.

. ,
this|that th(?:is|at). 
, 
, th.
, , 
.
.



,

( 54, 224, 238, 269). 
, 
, , 

.
,
( 256)
(?:aero|biz|com|coop|). ,
,
,
, 
? ,
(?:com|edu|org|net|)
.
,
. POSIX
,
.


. 
: (?:com|edu||[az][az])\b com\b|edu\b|\b|[az][az]\b.
\b, ,
.
, , 
 \b
, . 
, .
, .
,
, , , . 
, $OTHER* . 340.

319

,

, 
. , 
, :, 
(?:this|that): this:|that:, 
,
( 312). 
, 
.

$ , 
( 313).

(?:com|edu|)$ ,

com$|edu$|$ (
Perl).


, 
,

, .
,
. 
. , 
, ( 280), 
. , 
, .
, , * 
, (1\2\)*.
, , 
, "(\\.|[^\\"]+)*", . ,

,  !
.
1. , (\\.|[^\\"]+)*
.
,
. 
: , ()*,
. , (), 
, 
( , ).

320

6.

2. 
, . 

, , 
, . 
, .
, , . 
, , 
.

, ().
(?:), ,
. 
( 180)
( 184).

1:

"(\\.|[^\\"]+)*"
, 
. , 
"hi" "[^\\"]+". 
, 
", [^\\"] 
".
"he said \"hi there\" and left"

"[^\\"]+ \\.[^\\"]+ \\.[^\\"]+".


, . 6.2 ,
. 
.
, , 
,
.
6.2.

"hi there"

"[^\\"]+"

"just one \" here"

"[^\\"]+ \\.[^\\"]+"

"some \"quoted\" things"

"[^\\"]+ \\.[^\\"]+ \\"[^\\"]+"

"with \"a\" and \"b\"."

"[^\\"]+ \\.[^\\"]+ \\.[^\\"]+ \\.[^\\"]+ \\.[^\\"]+"

"\"ok\"\n"

" \\.[^\\"]+ \\. \\."

"empty \"\" quote"

"[^\\"]+ \\. \\.[^\\"]+"

321

. 6.2.
,
, . :
[^\\"]+,
\\.[^\\"]+. 
,

[^\\"]+( \\.[^\\"]+)*. , 
.


, ,
.
. 
, . 
, [^\\"] . 
[^\\"]+( \\.[^\\"]+)*, 
, +
( +)*.
, , "[^\\"]+
( \\.[^\\"]+)*". , 
. 6.2. ,

[^\\"]
.
, 
, 
.
: "[^\\"]*( \\.[^\\"]*)*".
? , 
?
, , 
. , "\"\"\"". 
,
, 
. ,
, ? 
? ?
"[^\\"]*( \\.[^\\"]*)*" .
"[^\\"]* 
; 
, . 
. ( \\.[^\\"]*)*"
()*, . ,
. , 
, "[^\\"]*", ,
.

322

6.

, ( \\.[^\\"]*)* , 
"[^\\"]*( \\.[^\\"]*)*". 
[^\\"]* ( 
"[^\\"]* \\."), 
. (
, ),
,
.
,
:

"[^\\"]*( \\.[^\\"]*)*"


,
: "[^\\"]*(\\.[^\\"]*)*".
, 
, ,
. 
:
, 
.
:

* ( *)*


"[^\\"]*( \\.[^\\"]*)*" 
, :
1. .
, ,
.
, [^\\"], 
\\., 
\, 
.
, \\. [^"] ,
"Hello \n", . 
 , ,
, 
, .
m a k u d o n a r u d o ( 280) . 
( PO
SIX )
. , 
, .

323

,
, 
, 
()* .
, 

, . 
, 
; , 
.
2. .
,
( 
). 
, 
( *)*,
(*)*.
, (\\.)*
. "[^\\"]*
( (\\.)* [^\\"]*)*" "Tubby ( ) 

[^\\"]* Tubby 
. ,
.
3. .
, , 
.
, 
Pascal {} . 
\{[^}]*\}, 
: (\{[^}]*\}|+)*.
:

\{[^}]*\}

*(
*)*, : (\{[^}]*\})*( + (\{[^}]*\})*)*. 
:
{comment}{another}


+, + (
) 
+, . 
m a k u d o n a r u d o.

324

6.

, 
,
,  ()* 
. 
.
, , 
+,
(, 
, ) 
(+)*
. , 
.
, ,
.
: , 
.
( ),

()*.
,
,
:

*( \{[^}]*\}*)*

, Pascal
, ,
.


(,
),
, 
.
( (*)*) , 
:

(Re:*)* Re:
(, 
Subject:Re:Re:Re:hey).

(*\$[09]+)* (, 
).

(.*\n)+ .
: ,
, ,
.

325

,
, 
. Re:,
\$, (
) \n.

2:
,
. 
, , 
. , 
( \\.|[^\\"]+)* 
. , 
, ,
[^\\"]+. \\.
, 
. 
, 
( )
.
, [^\\"]+
, 
, . ,
( ) 
[^\\"]+. , [^\\"]+ 
, 
, .
, 
, 1:

"[^\\"]+( \\.[^\\"]+)*"

, , , 
, , 
. , ,
.
, ,

. ,
.

3:
,
,
.

326

6.

( www.yahoo.com), 
, ,
, . 
( 255), 
( 
) [az]+.
[az]+ 
, , 
 . 

, .
, [az]+(\.[az]+)*. 
[az]+( \.[az]+)*,
!
, 
.
[^\\"], (
\\., , "",
. "[^\\"]+
( \\.[^\\"]+)*" , 
1.

, , 
; 
,
. 
, 
.
, . 1
,
,
,
. , ,
,
.
( 
) :
.
, . .
, ( ,
).
, , 
, 
, .
[^\\"]+ [^\\"]* .

327

,
.

. , "[^\\"]*(\\.
[^\\"]*)*" , .
:

. , 
, "([^\\"]|\\.)*"
. 
.

. "[^\\"]*(\\.[^\\"]*)*" 
, 
[^\\"]. 
.
:

.
( POSIX ). 
,
,
, 
.

. 

( 337). , 
,
. 
, 
.



"(\\.|[^\\"]+)*" 
, 
. , ;
, [^\\"]+ 
( ).
[]+ ( 303),
,
()* 
.
, "(\\.|[^\\"]+)*" ,
, .

328

6.

, 
( abc foo,
abc abc, , abc, abc abc). 
, , 
.
( ) 
: ( 180)
( 184).
, 
: "(\\.|[^\\"]+)*"
"([^\\"]+|\\.)*". ,
, .
, 
,
, , 
. , , 
( 
, ),

.



"([^\\"]+|\\.)*" .
,
. ? 
 , []+, 
[]+ 
. , 
.
()* 
, []+
,  , 
.
,
. 
? 
.
Java Sun,
, 
, .
,
, 
, Sun
.

329




"([^\\"]+|\\.)*".
: "(?>[^\\"]
+|\\.)*". ,
(?>|)* 
(|)*+
.
(|)*+
. , 
(?>|)* ,
. * 
, 
. 
,  
. , 
,
. , 
(|)*+ 
(?>(|)*).
, , (|)*+ (?>|)*,
,
(
. 220).


,
, , , 
.


4 . 213 :
<>
(
(?! </?> )
.
)*
</>

#
#
#
#
#
#

<>
...
<> </>...
...
( )
...

[^<], (?!</?B>)<, 
:
<>
(?> [*<]* )
(?>

# <>
# ""...
# ...

330

6.
(?! </?> ) # <> </>,
<
# ""
[^<]*
# ""
)*
#
</>
# </>

, 
.


, 
( 235), ^\w+=([^\n\\]|
\\.)*. , :
^ \w+ =
# '='
# ( ) ...
(
(?> [^ \n\\]* )
# ""*
(?> \\. [^\n\\]* )* # ( "" ""* )*
)

,
,
.

CSV
5 CSV.
. 270:
(?:^|,)
(?: # ( )
" # ( )
( (?: [^"] | "" )* )
" # ( )
|
# ...
...
( [^",]* )
)

\G
, 
, 
. , 
, , .
, CSV Microsoft, (?:[^"]
|"")* . , 
: [^"] "". ,
Perl 
:

331

while ($line =~ m{
\G(?:^|,)
(?:
# ( "")
" #
( (?> [^"]* ) (?> "" [^"]* )* )
" #
# ..
|
# , ....
( [^",]* )
)
}gx)
{
if (defined $2) {
$field = $2;
} else {
$field = $1;
$field =~ s/""/"/g;
}
print "[$field]"; #
$field...
}

, ,
.



. /*, 
*/ (
). ,
, 
, 
. ,
, 
.

...
, ,
90 . 
,
, 
.
Perl ,
: /\*.*?\*/ .
,
, ,

332

6.

. , 
, Perl

( ,
, 50% 3,6 ).
Perl 
,  
50% 5,5 .
, Perl
/\*.*?\*/.
, 
? 
,
. ,

: 
60 !
, 
, .
, 
, , ,
, .
, ,
*/ .
/\*[^*]*\*/ ,
/** some comment here **/, 
*. .

, , /\*[^*]*\*/
,  
, *, .
\ .
, , 
//, /**/.
/\*[^*]*\*/ /x[^x]*x/. 
,
.


5 ( 246) , 
:
1. .
2. ( , 
).

333

3. .
, /xx/ .
, , 
. 
,
, 
, .
, 
,
(?:(?!x/).)*. , 
(, x/)*.
, 
/x(?:(?!x/).)*x/. ,
( ,
, ). 
,
, ,
. ,
, /x.*?x/
.
. 
/?
. x 
. ,
x, x , 
/. , , 
:
, x: [^x]
x, /: x[^/]
([^x]|x[^/])*,
/x([^x]|x[^/])*x/. 
, .
, / 
, , x.
, ,
:
, /: [^/]
/, x: [^x]/
([^/]|[^x]/)*,
/x([^/]|[^x]/)*x/.
, .
/x([^x]|x[^/])*x/. /fooxx/
foo x x[^/], 

334

6.

. xx/, x
.
x/ ( ,
).
/x([^/]|[^x]/)*x/, /x/
foo// ( , 
). 
, / ( 
). , ,
, , /x([^/]|[^x]/)*x/

years = days /x divide //365; / assume nonleap year x/

( 
).


.
, x[^/] xx/
, /x([^x]|x+[^/])*x/. 
, + x+[^/] 
x, , /. 
,  , /
x. x+ 
x, ,
, .
,  :
/ / foo() / /

, , 
: .
x, , 
/, , , /,
x, : x+[^/x].
, xxx/ 
x , .
x, 
, xxx/ 
. , 
, 
x, +: x+/.
: /x([^x]|x+[^/x])*x+/.
! , ?
, x, 
: /\*([^*]|\*+[^/*])*\*+/. ,
.

335


. 332, 
, ,
:
x, /: x[^/]

/, x: [^x]/

. , ?
, , 
regex. x, 
,
x[^/]. ,
,
 , , regex. 
.
, 
x, / 
x(?!/). ,
x([^/]|$). 
, x, 
. 
, /, x, 
(?<!x)/. 
(^|[^x])/.
,
.


, 
. . 6.3 ,
.
, *
. , 
( ) .

.

, ,
.

336

6.

6.3.

*( *)*

 [^x]*x+
x

,  [^/x]
( x)

/x

, 
:

/x[^x]*x+( [^/x] [^x]*x+)*/

.
(
. 326): /x[^x]*x+,
()*. ,
, , x 
, . 
, .
(, x), ,

x. ,
.


/x[^x]*x+([^/x][^x]*x+)* 
. , ,
/**/, /xx/.
x \* (
x *):

/\*[^*]*\*+([^/*][^*]*\*+)*/

, 
. 
, . 
 ( egrep) 

. , , ,
, , 
.
.
, 

337

. , 
:
const char *cstart = "/*". *cend = "*/";



, , 
, 
. , Perl
:
$prog =~ s{/\*[^*]*\*+(?:[^/*][^*]*\*+)*/}{}g; # C
# ( !)

$prog,
, (. . ).
,

, :
char *CommentStart = "/*": /* */
char *CommentEnd = "*/"; /* */

, 
, , .

. 
( ),
.  
, ,
.
, 
, ... 
.


:
$COMMENT = qr{/\*[^*]*\*+(?:[^/*][^*]*\*+)*/}; #
#
$DOUBLE = qr{"(?:\\.|[^\\"])*"};
#
#
$text =~ s/$DOUBLE|$COMMENT//g;

. , 
 $DOUBLE|$COMMENT ,
Perl qr//.

338

6.

3 ( 135), , 
, .
Perl qr//,
, 
. 
,
. 2 ( 127) ,
. 
m// s///, 
( 101); .
, $DOUBLE
. 
, $DOUBLE,
. 

. ,
:

,
; ...
, 
; ...
.

.



. ,
, . 
, :
$COMMENT = qr{/\*[^*]*\*+(?:[^/*][^*]*\*+)*/}; #
#
$DOUBLE = qr{"(?:\\.|[^\\"])*"};
#
#
$text =~ s/($DOUBLE)|$COMMENT/$l/g;

$1 , 
. 
, $1 .


$1. ,
,
(
,
). , 

339

$1 , 
, , .1
, ,
('\t' . .). 
. 
C++/Java/C# ( //), 
//[^\n]*, :
$COMMENT = qr{/\*[^*]*\*+(?:[^/*][^*]*\*+)+/}; #
$COMMENT2 = qr{//[^\n]*};
# C++ //
$DOUBLE = qr{"(?:\\.|[^\\"])*"};
#
$SINGLE = qr{'(?:\\.|[^'\\])*'};
#
$text =~ s/($DOUBLE|$SINGLE)|$COMMENT|$COMMENT2/$1/g;

: 
( 
) . : 
16 , 
500 000 , Perl 
16.4 . ? , .

=
,
. 

. 

,
. 
,
. , 
, .
, ,  
, 
. 
, ,
1

$1 Perl , 
undef.
undef ,
, . Perl 
, , 
undef . 
, no warnings; 

Perl: $text =~ s/($DOUBLE)|$COMMENT/defined($1) ? $1 : ""/ge;

340

6.

. , 
, 
[^'"/] . 
, ,
[^'"/]+.
, ,
. , ()*
,
( , 
). 
:
$OTHER = qr{[^"'/]}; # ,
#
.
.
.
$text =~ s/($DOUBLE|$SINGLE|$OTHER+)|$COMMENT|$CQMMENT2/$1/g;

, , + 
$OTHER ( , $OTHER).
.
75%!

. 
, (,
/ 3.14).
.

.
, $OTHER+, 
. 
POSIX , 
,
. 
, ,
, , 
, ?
, ,
$OTHER,
.

$OTHER* , ,
$OTHER
/g.
. 
, 
,
,

341

. 
, 
.
, $OTHER,
,
, , 
$OTHER (, 
) . 
, , , $OTHER
, ,
. $OTHER
*, !
:

($OTHER+|$DQUBLE$OTHER*|$SINGLE$OTHER*)|$COMMENT|$COMMENT2

, regsub,
5%.

. $OTHER*
, $OTHER+ (
)
:
1. s///g,
.
2. .
:
2, $OTHER* ?
, ,

, (. . )
().
, $OTHER+ 
, ? ,
,
, .
.
, 
.
,
.

! ,
, ,

342

6.

,
. SINGLE DOUBLE :
$DOUBLE = qr{"[^\\"]*(?:\\.[^\\"]*)*"};
$SINGLE = qr{'[^'\\]*(?:\\.[^'\\]*)*'};

15% . 
16,4 2,3 
!
, 
.
( $DOUBLE) , 
. ,

(, ),
.
Perl qr//,
, 
. 
, ,
. 

3 . 135.

, . 
:
([^"'/]+|"[^\\"]*(?:\\.[^\\"]*)*"[^"'/]*|'[^'\\]*
(?:\\.[^'\\]*)*'[^"'/]*)|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^\n]*

: !
, ,

. GNU Emacs
,
dont, Im, well . ., .
,
\<\w+ Emacs '([tdm]|re|ll|ve).
, , \<\w+
\w. 
\w, \w+ ; 
, 
( 
). \w 10 .
, ,
. ,
.

7
Perl
Perl , .
, 
, , 
,
Windows, UNIX Mac.
Perl 
, 
. Perl Perl
. Perl 
 
,
. , , !
Perl!
.
100 000 , :
% perl pi e 's{([+]?\d+(\.\d*)?)F\b}{sprintf "%.0fC".($132)*5/9)eg' *.txt

*.txt
(
2).



Perl1,
, 
. , ,
, , , ,
1

Perl 5.8.8.

344

7. Perl

Perl ( 2, , 
, ).
, 
, ,
. ,
Perl
Programming Perl, OReilly.1
Perl . ,
. 
. Perl 
, ,
Perl, ,
,
.  ,
.
. ,
. 
, , 
.
,
:

Perl ( 347)
, 
Perl,
, .

rl ( 355)
Perl, 
. 
,
, 
.


. 
Perl :
qr// ( 366)
( 370)
( 383)
( 386)

Perl ( 392) 
Perl,

., ., . Perl, 3 
. . . .: , 2002.

345

Perl
.

Perl ( 416) ,
. Perl 
, 
, 6. , 
, Perl, 
, 
Perl; .

Perl
Perl :

2 Perl 
.

3 Perl ( 120), 
, 
Perl, , ( 140),
( 145) ( 149).

4 ,
Perl, 
Perl.

5 ,
4. Perl,
Perl.

6 Perl,
.

,
Perl,
, 
.
Perl.


Perl 
, . 
Perl 
, 
, Perl.
Perl 
,
, Perl

, . 7.1.

346

7. Perl

Perl , 

.
7.1. Perl

m// (370)
s/// (381)
qr// (366)
split() (386)

use charnames ':full'; (351)


use overload; (409)
use re 'eval'; (404)
use re 'debug'; (431)

lc lcfirst uc ucfirst (351)
pos (378)

quotemeta (351)

reset (372)

study (429)

/x /o

(354, 417)

/s /m /i

(354)

/g /c /e

(375, 380, 385)

(362)
$1, $2 . .

$^N $+

/ $1, $2...

@ @+

$` $& $'

, 
( .
Perl, 426)


$_

(372)

$^R

(365)

Perl

, , Perl. 
,
, 

Perl

347

, .
Programming Perl , Perl
, .... , m/
/ 
, ,
.

Perl

Perl
. , 
, 
,
, .1 , 

, , .
Programming Perl , 
, : ...
. Perl 
, 
.

Perl
. 7.2 
Perl. Perl , 
, 
Perl.
3, Perl 
, ( . 7.2
).
7.2. Perl

151 (C)

\ \b \e \f \n \r \t \ \x \x{} \c


155

: [] [^] ( 
POSIX [:alpha:] 166)

156

, : ( 
/s )

157

: \

, 
!

348

7. Perl
157

( 
): \C

158 (C)

: \w \d \s \W \D \S

159 (C)

, : \p{},
\P{}


169

/ : ^ \A

169

/ : $ \z \Z

379

: \G

174

: \b \B

175

: (?=) (?!) (?<=) (?<!)


176

: (?). 
: s m i (354)

177

(?:)

177

: (?#) # ( /x, 
# )

, ,
178

: () \1 \2 ...

178

: (?:)

180

: (?>)

181

: |

182

: (?if then|else) if 
,
()

183

: * + ? {n} {n,} {min,max}

184

: *? +? ?? {n}? {n,}? {min,max}?

393

: (?{})

393

: (??{})


351 (C)

: $ @

351 (C)

: \l \u

352 (C)

: \U \L \

352 (C)

: \Q \E

351 (C)

: \M{}

(C)

Perl

349

.
\b Backspace ()
; ( 174).
, .
\x
( , 
, ). \x{}
.
\w, \d \s .
\s ASCII 
( 151).
Perl 4.1.0.
.
Is,
( 164). In,
, 
.
\p{L&},

\p(Any}, \p{All}, \p{Assigned} \p{Unassigned}.


( \p{Letter}).
, ,
, , 
\p{Lowercase_Letter} \p{Lo
wercaseLetter} \p{LowercaseLetter}.

. 160.
\p{^} , \p{}.
.

.
, 
.
/x ASCII. 
/m , 
.
/i .
. 

, 
.

350

7. Perl


. 7.2 
.
m//,
. 
, / 
. Perl ,

,
. 

.
, (
. $num 20, 
m/:.{$num}:/ :.{20}:. 

.
; 
, \U\E 
. : m/abc\Uxyz\E/ 
abcXYZ. ,
abcXYZ, ,

: $tag
title, m{</\U$tag\E>} </TITLE>.
? 
.
:
$MatchFleld = "^Subject:"; #
.
.
.
if ($text =~ $MatchFleld) {
.
.
.

$MatchField =~,
. 
, 

\Q\E .
:
$text =~ $MatchFleld

$text =~ m/$MatchF1eld/

Perl

351

. 
,
$MatchField. 
, , 
, \U\E $var
( 
354).

( 
) . 
. 418.



:

. , 
$ @, .
,
, 
(, $", 
).
Perl %,
, 
% .

.
use charnames ':full';,
\N{}. , 
\N{LATIN SMALL LETTER SHARP S} .
, Perl, Uni(
codeData.txt unicore:
use Config;
print "$Config{privlib}/unicore/UnicodeData.txt\n";

use charnames ':full';


full \N{}
. , \N{}
(. ).

. 
\l \u
.

. , $title
mr., m/\u$title/ Mr.. 
Perl lcfirst()
ucfirst().

352

7. Perl

. 
\L \U 

\. , 
$title m/\U$title\E/ 
MR.. 
Perl lc() uc().

: m/\L\u$title\E/ ,
Mr. 
.
. \Q 
(. .
\) 
\. : 
, (
( , \U , , 
\). ,
\ (
, \F \H). \Q\E
unrecognized escape.
,
\Q\E 
, 
. , $title Mr.,
m/\Q$title\E/ Mr\.,
, $title ,
.

. ,
m/\Q$UserInput\E/i
, $UserInput ( ,
).
, \Q\E, 
Perl quotemeta().
. (overloading) 

.
, 
. , . 409.


( )
Perl ,


Perl

353

.
(m//, s/// qr//), 
,  
. .
m!!
m{}
m,,
m<>

s| | | m[]
qr##
m()
.

, 
(. .

).
, 
m() m[] , . ,
/x :
m{
#
#
};

,
( , 
, ). :
s{}{}
s{}!!
s<>()
s[]//


.
. 384.
( .) 
( 
). ,

( 372).
. 350, 

. 
, . 
m'' , 
(, \Q\E) , 
\N{}. m''
, @, 
.

354

7. Perl

/ ?,
m. , 
:
$text =~ m//;
$text =~ //;
m.



,
Perl 
. Perl ,
.
:
1. (/i
. .). /x, 
.
2. .
3. ,
. 
; 
.
,
\N{}.
4. . . (
\Q\E).
5. .
;

Perl. 2 
, , ,

this$|that$ .


Perl 
,
(, i m//i, s///i qr//i). 
,
. 7.3.
, 3, 

( 176) ( 177).


rl

355

, , 

, ( ,

, ).
7.3. ,

/i

145

/x

146

/s

146

/m

147

/o

418

, /o, 
. , . 418.

,
, .1 ,
/ 
m/<title>/i, m|<title>|i, m{<title>}i
m<<title>>i. 
/, /i.

rl
Perl,
.
.
. Perl ,
. , Perl , 
while , 
print . Perl 
,
.
. 
. Perl
1


, ,
. , learn/by/osmosis 
( , learn).
osmosis 
( /e !) ,
.

356

7. Perl

.

; , 
.
, $1 ,
, .


Perl,
. 
: , (
. , 
. , 
, 
.
. (
.
:
$s = _1;
@a = _2;

$s (. .
, ), _1, , 
. , 
@ , 
_2 .
,

. .
, localtime 
, , , , . .

Mon Jan 20 22:05:15 2003.
:  (, <MYDATA>)
,
() .
Perl 
, .
, m// 
/, .
.


,
Perl , ,
, 

rl

357

. 
, Perl 
. 
,
, . 
, @ = 42 @ = (42).
,
. , :
$var = ($this, &is, 0xA, 'list');

$var , 'list'.
$var = @array $var .



Perl ( ),

,
. , 
.
.


Perl : (priva(
te). my().
,
. 
, 
() . 
,
Perl, my 
, my.
,
, 
$1, $_ @ARGV.
, 
my, . Perl
, ,
. 
$Debug Acme::Widget 
$Acme::Widget::Debug, 
.
use strict;, ( )

, , 

358

7. Perl

our ( our , ;
Perl).


,
. ,
, . ,
Perl ,
, 
. 
(
, .
,
,
. , Acme::Widget 
,
$Acme::Widget::Debug.
:
..
.
{
local($Acme::Widget::Debug) = 1; #
# Acme::Widget
..
.
}
# $Acme::Widget::Debug
..
.

local
. 
, local .
, local :
1. .
2. (undef ,
local).
3. 
, local.
, 
, ,
.
.  ,
( 
). 
,

.

359

rl


, , , local. 
local , 
, . 7.4.
lo
cal($SomeVar);
$SomeVar undef. , 
, .
7.4. local
Perl

{
my $TempCopy = $SomeVar;
$SomeVar = undef;

local($SomeVar); #

$SomeVar = 'My Value';


..
.
} #
# SomeVar

$SomeVar = 'MyValue';
..
.
$SomeVar = $TempCopy;
}

, 
.
Use of uninitialized warnings. , 
Perl, w, , ,
. ,
, 
w?
$^W ( ^W
, W, Control+W):
{
local $^W = 0; # .
UnrulyFunction();
}
# $^W.

local 
$^W, . 
$^W . Unru
lyFunction Perl $^W,
. 
 .
, local . 
UnrulyFunction 
$^W.
.

360

7. Perl

,
(. 7.4), local .
, ,
local my.1 my ,
. 
, 
(. . my ).

,
.
, UnrulyFunction.
$^W ,
, 
( UnrulyFunction 
Perl 
$^W, , ).


local :
, 
. , , 
, .
. ,
, local.
,
. local Perl 
, 
, (. . 
).
, . local
,
, :
, !
, .




? . 

. ,
, , , $&
1

Perl my , 
.

rl

361

( ) $1 (,
). 
.
, , ,
, ,
. 
, , 
(. . ),
, .
:
if (m/()/)
{
DoSomeOtherStuff();
print "the matched text was $1.\n";
}

$1
, 
, DoSomeOtherStuff $1 .
, $1 , 
, , , ,
. , 
, print .

:
If ($result =~ m/ERROR=(.*)/) {
warn "Hey, tell $Config{perladmin} about $l!\n";
}

( Config 
%Config, $Config{perladmin}
Perl.) 
$1 , .
, %Config ;
, 
. , 
$Config{}, 
.
$1,
$1, 
. , ,
$Config{}, .



, local
. 

362

7. Perl

local, 
.
, my() 
. 

,
( , my 
local). : local , my ,
, .

,

, , 
. 
, 
, ,
.
(. . , ) 
( ,
). . 7.5.
, 
:
$&
, .
( $` $',
) 
( . 426). 
$&
, .
$`
,
(. . ). 
/g , $`
, . 
$` 
.
$'
, (. . 
). 
"$`$&$'"
.1 $`
.
1

,
, (
, ), "$`$&$'"
, . 
.

363

rl

7.5. ,


1 2

2 3

4 31

"Pi is 3.14159, roughly" =~ m/\b((tasty|fattening)|(\d+(\.\d*)?))\b/;


:

$`

Piis

$&

3.14159

$'

,roughly

$1

,  3.14159

$2

,  undef

$3

,  3.14159

$4

, .14159

$+

$1, $2 . . .14159

$^N

$1, $2 . ., 3.14159

@

 (6, 6, undef, 6, 7)

@+

(13, 13, undef, 13, 13)


$1,$2,$3,...
, , , . . 
( : $0
,
). 
, 
,
.
,
s///. , 

( 393). 
(

\1 ). (
$1 ? . 366.)

364

7. Perl


(\w+) (w)+. 
, , 
, . ,
tubby. 
$1 tubby, y:
+ ,
.
, (x)? (x?).

, $1
x, . 
(x?) 
. 
,  ,
 x? . 
, (x?) $1 x
. .

$1

$1

"::" =~ m/:(A?):/

"::" =~ m/:(\w*):/

"::" =~ m/:(A)?:/

undef

"::" =~ m/:(\w)*:/

undef

":A:" =~ m/:(A?):/

":Word:" =~ m/:(\w*):/ Word

":A:" =~ m/:(A)?:/

":Word:" =~ m/:(\w)*:/ d

, , 
, . 

(
),
, $1.
$+

$1, $2, ( ),
. :
$ur1 =~ m{
href \s* = \s*
(?: "([^"]*)"
| '([^']*)'
| ([^'"<>]+) )
}ix;

#
#
#
#

"href = ", ...


, ...
, ...
.

$+ 
$1, $2 $3 , undef.
(
), 
.

rl

$^N

365

$1, $2...,
, (. . 
$1, $2...,
). 
( ),
.
. 413.

@ @+
(
) . 
. 
; ,
@ ( $[0])
, . ,

$text = "Version 6 coming soon?";
.
..

$text =~ m/\d+/;

$[0] 8,
( Perl
).
@+ ( $+[0])
. 
9,
. , substr($text, $[0],
$+[0]  $[0]) $&, $text 
, ,
$& ( 426). @:
1 while $line =~ s/\t/' ' (8  $[0] % 8)/;


.1

. , $[1]
$+[1] $1, $[2]
$+[2] $2 . .
$^R. 

. 
1

:
. 
,
.
, (142).

366

7. Perl


: if (?if then|else)
( 182) $^R. 
, 
, , 
 , 
. , 
,
.

/g 
. , , $1 
s///g
.

$1 ?
Perl , \1 

( $1). $1
,
. , \1 
,
,
,
\1. , \1, 
, 
.
: $1 
? (

( 393),
. $1 
, :

. , $1
, 
.

qr//

2 ( 106) 6 ( 337)
qr//, 
. 
, 

qr//

367

split, 
.


, 
(
).
. 352,
, qr{} qr!!. , 
/i, /, /s, /m /.



, 
2 ( 106):
my $HostnameRegex = qr/[az09]+(?:\.[az09]+)*\.(?:com|edu|info)/i;
my $HttpUrl = qr{
http:// $HostnameRegex \b
#
(?:
/ [az09_:\@&?=+,.!/~*'%\$]* #
(?<![.,?!])
# [.,?!]
)?
}ix;



$HostnameRegex. 
HTTP URL, 
$HttpUrl. 
, :
if ($text =~ $HttpUrl) {
print "There is a URL\n";
}

HTTP URL :
while ($text =~ m/($HttpUrl)/g) {
print "Found URL: $1\n";
}

, , $Host
nameRegex 5 ( 256):
my $HostnameRegex = qr{
# , ...
(?: [az09]\. | [az09][az09]{0,61}[az09]\. )*
# ...

368

7. Perl
(?: com|edu|gov|int|mil|net|org|biz|info||aero|[az][az] )
}xi;


( ^ $, 
), , 
. 
$HtppUrl .


qr// , 
. 355. 

, 
m// . ,
:
my $WordRegex = qr/\b \w+ \b/; # / !
.
.
.
if (Stext =~ m/^($WordRegex)/x) (
print "found word at start of text: $1\n";
}

, /x 
$WordRegex, ,
( ) qr//
$WordRegex. , 
.
:
my $WordRegex = qr/\b \w+ \b/x; # !
.
.
.
if ($text =~ m/^($WordRegex)/) {
print "found word at start of text: $1\n";
}

:
my $WordRegex = '\b \w+ \b'; #
.
.
.
if ($text =~ m/^($WordRegex)/x) (
print "found word at start of text: $1\n";
}

, $WordRegex
. ,
$WordRegex , 
m//. 
,

qr//

369

(, , 
$WordRegex /x).
,

:
my $WordRegex = '(?x:\b \w+ \b)' ; #
.
.
.
if ($text =~ m/^($WordRegex)/) {
print "found word at start of text: $1\n";
}

m//
^((?x:\b\w+\b)), 
, .
, 
, 
(/i, /x, /m /s) ,
qr/\b\w+\b/x (?xism:\b\w+\b). 
: (?xism:) /x , 
/i, /s /m . ,
qr// ( ,
).



, 

( (?xism:) ).
, 
, Perl . :
% perl e 'print qr/\b \w+ \b/x, "\n"'
(?xism:\b \w+ \b)

$HttpUrl . 367:
(?ixsm:
http:// (?ixsm:
# , ...
(?: [az09]\. | [az09][az09]{0,61}[az09]\. )*
# ...
(?: com|edu|gov|int|mil|net|org|biz|info||aero|[az][az] )
) \b #
(?:
/ [az09_:\@&?=+,.!/~*'%\$]* #
(?<![.,?!]) # [.,?!]
)?
)

370

7. Perl


.





. 
6.
, 
/, qr// ( 418).



$text =~ m//


Perl. Perl , 
(
) .

, ( 356), .

, 

. ,
. :


,

,
,
:
=~

,
.
.



( ,

371

, ).
,
.


 
m// //.
m ,
/ ! (
, ). 
m, 
. , m 
( 352).
,
, . 355.
/g /c,
.


, 
qr//. :
my $regex = qr//;
..
.
if ($text =~ $regex) {
..
.


m//. : 

, ,
. if 
:
if ($text =~ m/$regex/) {
..
.

, 
/g 
( , 
m//, , 
, 
397).


, m// ( m/$Some
Var/, $SomeVar 

372

7. Perl

), Perl

.
, 
( 366).

??
?? 
. : 
m?? m??
, reset.
Perl 1 , 
, 
,
, Perl.
?? ( //) m 
: ?? m??.


, ,
=~, $text =~ m//. =~
; 
, 
, awk.
=~ m// 
, ,
. , :
$text =~ m// ; #  , .
if ($text =~ m // ) { #
#
..
.
$result = ( $text =~ m // ); # Sresult $text
$result = Stext =~ m// ;
# ; =~
# , =
$copy = $text;
# $text $result...
$copy =~ m // ;
# ... $result
( $copy = $text ) =~ m// ; #


$_, $_
=~ . , $_ 
.

373

,
$text =~ m//;

$text; 
, . 
~,
$text = m//;

: $_
; $text. 
, :
$text = m/ /;
$text = ($_ =~ m//);


, , ,
( ). 
, :
while (<>)
{
if (m//){
.
.
.
} elslf (m//){
.
.
.



.


=~ !~, 
. 
, ,
!~ 
(true false). :
if ($text !~ m//)
if (not $text =~ m//)
unless ($text =~ m//)

. 

$1. , !~ , 
....

374

7. Perl




, 
. ( 356) 
/g.


/g
(,
if) :
if ($target =~ m//) {
#
.
.
.
} else {
#
.
.
.
}


:
my $success = Starget =~ m//;
.
.
.
if ($success) {
.
.
.
}


/g
/g 
. 
, 
. 
69/8/31:
my ($year, $month. $day) = $date =~ m{^ (\d+) / (\d+) / (\d+) $}x;


( $1, $2 $3.).

; .
, . ,
m/(this)|(that)/ 

375

. 
undef. 
, 
/g (1).

, :
my @parts = $text =~ m/^(\d+)(\d+)(\d+)$/;


, (

). :
my ($word)
my $success

= $text =~ m/(\w+)/;
= $text =~ m/(\w+)/;

,
(
).
; , 
$success .
:
if ( my ($year, $month. $day) = Sdate =~ m{^ (\d+) / (\d+) / (\d+) $}x ) {
# ;
# $year .
} else {
# ...
}

(
my() =), 

$1, $2 . . 
( if), Perl 

. 0 0 (. .
) .


/g
, 
( 
, ),
, /g,
.
:
my @nums = $text =~ m/\d+/g;

376

7. Perl

$text IP 64.156.215.240, @nums


: 64, 156, 215 240.

IP (409cd7f0),
:
my $hex_ip = join '', map { sprintf("%02x", $_) } $ip =~ m/\d+/g;

:
my $ip = join '.', map { hex($_) } $hex_ip =~ m/../g

:

my @nums = $text =~ m/\d+(?:\.\d+)?|\.\d+/g;

,

. , ,
:
my @Tags = $Html =~ m/<(\w+)/g;

@Tags HTML,
$Html (,
<).

. , Unix
, :
alias Jeff
alias Perlbug
alias Prez

jfriedl@regex.info
perl5porters@perl.org
president@whitehouse.gov


m/^alias\s+(\S+)\s+(.+)/m ( 
/g).
('Jeff', 'jfriedl@regex.info'). 
, /g. :
( 'Jeff', 'jfriedl@regex.info', 'Perlbug',
'perl5porters@perl.org', 'Prez', 'president@whitehouse.gov' )

/,
,
().
my $alias = $text =~ m/^alias\s+(\S+)\s+(.+)/mg;

Jeff $alias{Jeff}.

377


/g
m//g 
, .
m//, ,
m//g 
. m//g 
. 
, .
:
$text = "WOW! This is a SILLY test.";
$text =~ m/\b([az]+\b)/g;
print "The first alllowercase word: $1\n";
$text =~ m/\b([AZ]+\b)/g;
print "The subsequent alluppercase word: $1\n";


/g, :
The first alllowercase word: is
The subsequent alluppercase word: SILLY

/g
:
, , 
, ,
. /g
, ,
/g,
WOW.
while. 
:
while ($ConfigData =~ m/^(\w+)=(.*)/mg) {
my($key, $value) = ($1, $2);
.
.
.
}

, (
, ) .
,
while .
/g, 
/g .
:
while ($text =~ m/(\d+)/) { # !

378

7. Perl
print "found: $l\n";
}

while ($text =~ m/(\d+)/g) {


print "found: $l\n";
}

/g,
. , $text IP
, ,
:
found:
found:
found:
found:

64
156
215
240

,
found: 64. /g 
(\d+) $text, 64, 
. /g

(\d+) $text 
.

pos()
Perl ,
.
 
. 
, 
, 
. /g 
.
Perl 
pos(). :
my $i = "64.156.215.240";
while ($ip =~ m/(\d+)/g) {
printf "found '$1' ending at location %d\n", pos($ip);
}

:
found:
found:
found:
found:

'64' ending at location 2


'156' ending at location 6
'215' ending at location 10
'240' ending at location 14

: , 
2 .

379

/g $+[0] ( @+ 365)
, pos .
pos() , 
$_.


pos() , 
. 
, (,
/g). , 
, Yahoo!, 
; 32 , 
.
, 
^.{32} :
if ($logline =~ m/^.{32}(\S+)/) {
$RequestedPage = $1;
}

. ,
32 
. , ,
:
pos($logline) = 32; # 32
# ...
if ($logline =~ m/(\S+)/g) {
$RequestedPage = $1;
}

, . (
,
. 
33 \S, 
, , 
, 
. , \S+, 
. ,
, .

\G
\G ,
.
, :
pos($logline) = 32; # 32 ,
# .
if (Slogline =~ m/\G(\S+)/g) {

380

7. Perl
$RequestedPage = $1;
}

\G :

, .
\G 
3 ( 171), 5
( 265).
Perl \G , 
,
. , 6
CSV ( 329)
\G(?:^|,). 
^ \G
, (?:|\G,).
, Perl , 
.1

/gc
m//g pos 
. /g 
/c,
. /c /g, 
/gc.
m//gc \G 
, . 
HTML $html:
while (not $html =~ m/\G\z/gc) # ...
{
if ($html =~ m/\G( <[^>]+>
)/xgc) { print "TAG: $1\n"
}
elsif ($html =~ m/\G( &\w+; )/xgc) { print "NAMED ENTITY: $1\n" }
elsif ($html =~ m/\G( &\#\d+; )/xgc) { print "NUMERIC ENTITY: $1\n" }
elsif ($html =~ m/\G( [^<>&\n]+ )/xgc) { print "TEXT: $1\n"
}
elsif ($html =~ m/\G \n
/xgc) { print "NEWLINE\n"
}
elsif ($html =~ m/\G( .
)/xgc) { print "ILLEGAL CHAR: $1\n" }
else {
die "$0: oops, this shouldn't happen!";
}
}

,
HTML ( ). 
1

\G, , 
\G
\G (. 302).

381

, (
/gc), (

\G). ,
. 
pos $html
, .
, m/\G\z/gc,
. . \G (\z).
,
. 
( ), 
,
pos $html.
else; 
, ( 
) , 
else . 
(, <>),
.

. , \G(.) .
, , 
<script>:
$html =~ m/\G ( <script[^>]*>.*?</script> )/xgcsi

(, !) 
, <[^>]+>, 
, <[^>]+> 
<script> .
/gc 
3 ( 172).

pos:

pos .

pos

pos

m//


( pos )

undef

undef

m//g

pos undef

m//gc

pos

pos 
undef ( , ).

382

7. Perl


,
Perl.



, , . , 
(. .
)
. 
, .
:

, 
, $1 @+ ( 362).


( 371).
m??, (
m??)
( , reset 372).

,
.

:

pos ( 378).

/o 
, 
( 421).

,

. ,
:

, (,
),
, 
.
pos()
pos ,
, , 

/g. , \G.

383



, ( 371).
study
study ,
( ) . 
study ( 429).
m?? reset
reset /
m?? ( 372).

!
,
.
while, if foreach
. , ?
while ("Larry Curly Moe" =~ m/\w+/g) {
print "WHILE stooge is $&.\n";
}
print "\n";
if ("Larry Curly Moe" =~ m/\w+/g) {
print "IF stooge is $&.\n";
}
print "\n";
foreach ("Larry Curly Moe" =~ m/\w+/g) {
print "FOREACH stooge is $&.\n";
}

. .


Perl, s///,
. :
$text =~ s///

, , 
. /g 
, 
, .
, 
=~ ,
$_. m
, s .

384

7. Perl

,
,
pos . 
:
( ), 
, , .
, 
. 355, 
: /g /.


s/// 
,

m//. 
(, <>),  
(
). , s{}{}, s[]//
s<>'' . 
, 
. 
/x /e:
$test =~ s{
... ...
} {
... Perl, ...
};


. 
,
( 352), . 
(
/g ), $1
. . .
,  
:
 ,
(. . ).
/e, 
,   Perl,
. 
, 
.

385

. 383.
:
WHILE stooge is Larry.
WHILE stooge is Curly.
WHILE stooge is Moe.
IF stooge is Larry.
FOREACH stooge is Moe.
FOREACH stooge is Moe.
FOREACH stooge is Moe.

: print foreach
$& $_,
while. ,
m//g, ('Larry', 'Curly', ''), 
.
$&, 
, m//g
.

/e
/e , 
Perl ( eval{}). 
Perl 
, .
 
,
. :
$text =~ s/time/localtime/ge;

time 
Perl localtime (. . 
Mon Sep 25 18:36:51 2006).
,

$1 . . , URL 
%, 
.
, :
$url =~ s/(["azAZO9])/sprintf('%%%02x', ord($1))/ge;

:
$url =~ s/%([09af][09af])/pack("C", hex($1))/ige;

386

7. Perl

sprintf('%%%02x', d()) 
, pack("C", ) 
; Perl.

/e

( ), /e
. , 
. , 
Perl,
.
, 
. ,
(,
). , $var , 
$var $var.
:
$data =~ s/(\$[azAZ_]\w*)/$1/eeg;

/e 
$var, . 
/e $1 ,
$var,
( ). 
/e ,
$var . 
.


, 
/g. 
^ 
, (
) .
(, if) 
,
, .


split ( (
) 
m//g ( 375). ,
, split 
, .

387

, $text =~ m/:/g $text, 


IO.SYS:225558:951003:ash:optional,
:
(':', ':', ':', ':' )

 . ,
split(/:/, $text) :
('IO.SYS', '225558', '951003', 'ash', 'optional')

: . split

, .
, 

. ,
@Paragraphs = split(m/\s*<p>\s*/I, $html);

HTML $html ,
<p> <P>, 
.
, ; :
@Lines = split(m/^/m, $lines);

.
, 
, split , .
, ,
 . 
, .
// ,
. , split(//, "short test") 
: ("s", "h", "", , "s", "t").
"" (, )
, m/\s+/
. 
, split("", "ashorttest")
a, short test.
,
split.


split :
split (, _, )

. 
( ).

388

7. Perl

split . 
:
($var1, $var2, $var3, ) = split();
@array = split();
for my $item (split() ) {
..
.
}

()
,
, 
. , //,
m{} ,
, .
, . 355.
,
(?:). , 
split .

( )
split 
. , 
$_.

()

, split . , 
split(/:/, $text, 3) :
('IO.SYS', '225558', '951003:ash:optional')

, split
/:/, 
. 
, 
.
,
,
; 
,
. , split(/:/, $text, 99) 
. , split(/:/,
$text) split(/:/, $text, 99) ,
.

389

, (
, .
,
:
('IO.SYS', '225558', '951003', 'ash:optional')

.
. , 
:
($filename, $size, $date) = split(/:/, $text);

Perl 
. 
, 
.


, , split
, ,
:




.


split, ,
, 
( , . . ""). 
:
@nums = split(m/:/, "12:34: :78");

:
("12", "34", "", "78")

: ,
. , 
.


. ,

@nums = split(m/:/, "12:34: :78: : :");

@nums :
("12", "34", "", "78")

390

7. Perl

, , 
.
split
. , Perl 
,
.




(
split ,
).
, , 
,
1,
: split(/:/, $text, 1)
, .
, 
, grep{length} split. grep
, (. . ):
my @NonEmpty = grep { length } split(/:/, $text);



:
@nums = split(m/:/, ":12: :34: :78");

@nums :
("", "12", "34", "", "78")

,
. :
(. . 
), / 
. : split(/\b/,
"a simple test")
simple test . ,
, : ("", "", "simple", "",
"test"). ,
@Lines= split(m/^/m, $lines) . 387.

split
split
, ,
:

391

, split, 

, .
split, ,
split(//, "short test") : ("s",
"h", "", , "s", "t").

, ( 
!), , .
/\s+/, 
.
awk,
, ,
.

, m/
\s+/ . , 
1 .

,
, ( 
). , split 
split('', $_, 0).

^ 
/m ( 
).  $ . 
m/^/m , 
. split m/^/m 
.


split , 
, . 
split 
.
split $&, $', $1 . .
, split
.1

, , 
, ,
. (void) split
@_ (
,
split ). use warnings
w
split .

392

7. Perl

split
split
split. 
, , 
, . 
, , split 
, .
, HTML split(/(<[^>]*>)/)

and<B>very<FONTcolor=red>very</FONT>much</B>effort

( 'and', '<B>', 'very', '<FONTcolor=red>',
'very', '</FONT>', 'much', '</B>', 'effort' )

, split(/<[^>]*>/)
:
( 'and', 'very', 'very, 'much', 'effort' )


( , 
, ).

.
,
undef.

Perl
, 
,  Perl.
, (
) , ,
, \A, \Z \z), 
, \G .
Perl .
Perl ,
, Perl. 

. 
Perl,
.
, 
Perl.

Perl

393

(??{ perl })

, Perl. ( 
, 
) .
^(\d+)(??{ "{$1}" })$ 
. 
,
X .
, 3 12XXXXXXXXXXXX, 3X
7XXXX.
3XXX, ,
(\d+) 3XXX,
$1 3. 
,
"X{$1}" X{3}. 
X{$1}, 
3XXX. 
$ 3XXX 
.
, 

.
(?{ perl })
, ,
Perl ,
,  
. 
( 
$^R 365).
, , ,
: 
if (? if then | else) ( 182).
, 
then, else.
,
. , 

; :
"have a nice day" =~ m{
(?{ print "Starting match.\n" })
\b(?: the | an | a )\b
}x;

394

7. Perl

,
.
, 

.




. 
Perl. , Perl 
(
\b), 
\< \> 
, Perl.

, .
( 
) .
Perl (
) 
, ,
my ( 405).




( 
, 
).
.
, 
, ,
.
,
, : \(([^()])*\).
, 
( ,
).
:
my $Level0 = qr/ \( ( [^()] )* \) /x; #
.
.
.
if ($text =~ m/\b( \w+$Level0 )/x) {

Perl

395

print "found function call: $1\n";


}

substr($str, 0, 3),
substr($str, 0, (3+2))  
. ,
, . .
.
, 
, .
, 
( [^()]), 
, . ,
: $Level0. 
, :
my $Level0 = qr/ \( ( [^()]
)* \) /; #
my $Levell = qr/ \( ( [^()]| $Level0 )* \) /: #

$Level0 , 
$Level1, 
$Level0. 
.
,
$Level2,
$Level1 (, , $Level0):
my $Level0 = qr/ \( ( [^()]
)* \) /; #
my $Levell = qr/ \( ( [^()]| $Level0 )* \) /: #
my $Level2 = qr/ \( ( [^()]| $Level1 )* \) /: #

:
my $Level3 = qr/ \( ( [^()]| $Level2 )* \) /; #
my $Level4 = qr/ \( ( [^()]| $Level3 )* \) /: #
my $Level5 = qr/ \( ( [^()]| $Level4 )* \) /: #

. 7.1 
.
, 
. $Level3:
\(([^()]|\(([^()]|\(([^()]|\(([^()])*\))*\))*\))*\)

.
, (
). 
Level ,
: ,
$Level. 
( :

396

7. Perl

\(

( [ ^ ( ) ]|

)* \)

\(

( [ ^ ( ) ]|

)* \)

\(

( [ ^ ( ) ]|

)* \)

\(

( [^()]

)* \)

. 7.1.

X ,
+1 ).

.
, $Level 
:
.
,
. 
, 
,
.

. 
$Level, 
(
Perl , 
; , , 

, ). $Level
$LevelN,
(??{$LevelN}):
my $LevelN; # ,
# .
$LevelN = qr/ \ (( [^()] | (??{ $LevelN }) )* \) /;


, $Level0 
:

Perl

397

if ($text =~ m/\b( \w+$LevelN )/x) {


print "found function call: $1\n";
}

, ? , 
, , 
.

. 
( , 
), [^()] [^()]+ ( 
, 
279).
, \( \) , 
. 

, . 
:
$LevelN = qr/ (?> [^()]+ | \( (??{ $LevelN }) \) )* /;

\(\) ,
$LevelN.
: 
. , (
, ,
:
if ($text =~ m/\b( \w+ \( $LevelN \) )/x) {
print "found function call: $1\n";
}
if (not $text =~ m/^ $LevelN $/x) {
print "mismatched parentheses !\n";
}

$LevelN . 411.



. 
,
POSIX. ,
(, ,
POSIX), 
.

.

398

7. Perl

"abcdefgh" =~ m{
(?{ print "starting match at [$`|$']\n" })
(?:d|e|f)
}x;

:
starting
starting
starting
starting

match
match
match
match

at
at
at
at

[|abcdefgh]
[a|bcdefgh]
[ab|cdefgh]
[abc|defgh]



print "starting match at [$`|$']\n"

, 
. $` $' ( 362)1,
, 
|, , 
. , 
( 191).

(?{ print "matched at [$`<$&>$']\n" })

:
matched at [abc<d>efgh]

,
,

(?:d|e|f) [def]:
"abcdefgh" =~ m{
(?{ print "starting match at [$`;$']\n" })
[def]
}x;

, 
:
starting match at [abc|defgh]

? Perl ,
[def] 
1

$`, $& $',


(426),
.

399

Perl

( 303) ,
. 
, , . 
.

panic: top_env
, 
,
panic: top_env

, .
Perl 
. , 
.



Perl , 
, 
. 
Perl 
. , , 
oneself . 225:
"oneselfsufficient" =~ m{
one(self)?(selfsufficient)?
(?{ print "matched at [$`<$&>$']\n" })
}x;

,
matched at [<oneself>sufficient]

,
oneselfsufficient.
: print 
, .
, 
. ,
, 
.
(?!) ? 
(?!) . 
( 
matched),
. 

400

7. Perl

,
, :
matched at [<oneself>sufficient]
matched at [<oneselfsufficient>]
matched at [<one>selfsufficient]

,
. (?!) Perl 
, 
.
, ?
"123" =~ m{
\d+
(?{ print "matched at [$`<$&>$']\n" })
(?!)
}x;

:
matched
matched
matched
matched
matched
matched

at
at
at
at
at
at

[<123>]
[<12>3]
[<1>23]
[1<23>]
[1<2>3]
[12<3>]

,
. , 
. (?!) 
.
,
, 
( 
4). 
, , .
, (?!) ,
. 
, ; .



. 
, . 

.
oneself:
$longest_match = undef; #

Perl

401

"oneselfsufficient" =~ m{
one(self)?(selfsufficient)?
(?{
# , ($&)
#
if (not defined($longest_match)
or
length($&) > length($longest_match))
{
$longest_match = $&;
}
})
(?!) # ,
#
}x;
# ,
if (defined($longest_match)) {
print "longest match=[$longest_match]\n";
} else {
print "no match\n";
}

, longest match=
[oneselfsufficient]. ,
, 
(?!) :
my $RecordPossibleMatch = qr{
(?{
# , ($&)
#
if (not defined($longest_match)
or
length($&) > length($longest_match))
{
$longest_match = $&;
}
})
(?!) # ,
#
}x;

9938,
:
$longest_match = undef; #
"8009989938" =~ m{ \d+ $RecordPossibleMatch }x;
# ,
if (defined($longest_match)) {
print "longest match=[$longest_match]\n";
} else {

402

7. Perl
print "no match\n";
}

,

,

, .
POSIX ( 225). 

. 
, 
( ),
, 
.
Perl 
, ,

, $longest_match 
. (?{ defined
$longest_match}), ,
.
.


, if

(? if then | else) ( 182). 


, then
(?!) ( else
, ). 
:
my $BailIfAnyMatch = qr/(?(?{ defined $1ongest_match})(?!))/;

if , then .

$RecordPossibleMatch:
"8009989938" =~ m{ $BailIfAnyMatch \d+ $RecordPossibleMatch }x;

800 ,
( POSIX).

local
local 
. ,

Perl

403

( 357) 
4, 
, 
( 202). (
) . 
, \w+ \s+,
, \w+ \d+\b:
my $Count = 0;
Stext =~ m{
^ (?> \d+ (?{ $Count++ }) \b | \w+ | \s+ )* $
}x;

123abc739271xyz, 
$Count 3. 123abc73xyz
2, 1.
, $Count 
73 ( \d+), 
 , 
\b. , 
, 

.

(?>) ( 180) , ,

( 328)
. ,
\b \d+
.
\b
$Count, , 
. 
local,
Perl . :
our $Count = 0;
$text =~ m{
^ (?> \d+ (?{ local ($Count) = $Count + 1 }) \b | \w+ | \s+ )* $
}x;

, $Count 
my (
use strict, , 

our).

404

7. Perl

Perl
Perl 
(?{}) 
(??{}) (
, $RecordPossib
leMatch . 400). ,
m{ (?{ print "starting\n" }) };
, :
my $ShowStart = '(?{ print "starting\n" })';
..
.
m{ $ShowStart };

,
, 
,
Perl, .
.
, 

use re 'eval';

( use re 
; 431.)



, ,
Perl 
. 
\(\s*\?+[p{]. 
, 
. \s+ ,
/x
( , , 
). \?, 
. , p
(?p{},
(??{}).
, Perl  
, 

, 
.

Perl

405

$Count. 
: (
, , local, (
( (
). , 1cal($Count) = $Count+1
73 \d+, 
$Count 2,
, 
local. \b ,
local,
$Count 1. 
.
, local , $Count

. (?{ print
"Final count is $Count.\n" }), 
. $Count ,

( , 
, ).
:
my $Count = undef;
our $TmpCount = 0;
$text =~ m{
^ (?> \d+ (?{ local($TmpCount) = $TmpCount + 1 }) \b | \w+ | \s+ )* $
(?{ $Count = $TmpCount }) # $Count
#
}x;
if (defined $Count) {
print "Count is $Count.\n";
} else {
print "no match\n";
}


, ,

. 
. 413.

my
my 
, 
, ,
. 
, ,

406

7. Perl

, 
, .
:
.
:
sub CheckOptimizer
{
my $text = shift; # 
my $start = undef; #
my $match = $text =~ m{
(?{ $start = $[0] if not defined $start}) #
#
\d #
}x;
if (not defined $start) {
print "The whole match was optimized away.\n";
if ($match) {
# !
print "Whoa, but it matched! How can this happen!?\n";
}
} elsif ($start == 0) {
print "The match start was not optimized.\n";
} else {
print "The optimizer started the match at character $start.\n"
}
}

my,
$start (
). $start 
, ; 
, 
$start , , 
.
$[0] ( @ 365).
,
CheckOptimizer("test 123");

:
The optimizer started the match at character 5.

, , 
:
The whole match was optimized away.
Whoa, but it matched! How can this happen!?

( 
), . , 

Perl

407

. ? 
$start,
, .
, $start, , 
,
my .
, my 
(, )
my, 
(
. 418). CheckOptimizer (
$start, $start 
, 
. , $start,
, , 
.
(closure).
Programming Perl Object Oriented Perl ,
. ,
Perl .
.
? my 
, ,
, , 
, . ,
my $NestedStuffRegex SimpleConvert,
. 413, , ,
$NestedStuffRegex.
my , , 

.


. 394 ,

. , 
,
, .
: 
, , 
. 
.
, 
:

408

7. Perl
my $NestedGuts = qr{
(?>
(?:
# ,
[^()]+
#
| \(
#
| \)
)*
)
)x;


([]+|)* ( 280),
$NestedGuts , 
. , $Nested
Guts m/^\( $NestedGuts \)$/x
(thisismissingtheclose, 
,
.
:

:
(?{ local $OpenParens = 0 })


:
(?{ $OpenParens++ })


, 1 (
1 ). 0, 
, 
,

(?!), :
(?(?{ $OpenParens }) (?{ $OpenParens }) | (?!) )

(? if then |
else ) ( 182), if 
.
, 
. , 
, 
:
(?(?{ $OpenParens != 0 })(?!))

Perl

409

, :
my $NestedGuts = qr{
(?{ local $OpenParens = 0 }) #
(?> #
(?:
# ,
[^()]+
#
| \( (?{ $OpenParens++ })
#
#
| \) (?(?{ $OpenParens != 0 }) (?{ $OpenParens }) | (?!) )
)*
)
(?(?{ $OpenParens != 0 })(?!)) #
# ,
}x;

$LevelN
( 396).
local 
, $OpenParens 
.
local , 

. 
,
, 
, $OpenParens 
.


(overloading) 
.

.


Perl \< \>, 
. , ,

\b . 
. 
\< \> 
(?<!\w)(?=\w) (?<=\w)(?!\w) .
(, MungeRegexLiteral),
:

410

7. Perl
sub MungeRegexLiteral($)
{
my ($RegexLiteral) = @_; #
$RegexLiteral =~ s/\\</(?<!\\w)(?=\\w)/g; # \<
$RegexLiteral =~ s/\\>/(?<=\\w)(?!\\w)/g; # \>
return $RegexLiteral; # (, )
}

\<,
(?<!\w)(?=\w). ,
s/// ,
, \w, \\w.

, (
, MyRegexStuff.pm) Perl:
package MyRegexStuff; #
use strict; #
use warnings; #
use overload; # Perl
#
sub import { overload;:constant qr => \&MungeRegexLiteral }
sub MungeRegexLiteral($)
{
my ($RegexLiteral) = @_; #
$RegexLiteral =~ s/\\</(?<!\\w)(?=\\w)/g; # \<
$RegexLiteral =~ s/\\>/(?<=\\w)(?!\\w)/g; # \>
return $RegexLiteral; # (, )
}
1; # . , use
# true

MyRegexStuff.pm Perl (.
PERLLIB Perl),
Perl, . 
,

use lib '.';
#
use MyRegexStuff; # !
.
.
.
$text =~ s/\s+\</ /g; #
# .
.
.
.

use MyRegexStuff ,

, MyRegexStuff.pm 

Perl

411

(, MyRegexStuff.pm
, 
use MyRegexStuff , ).


MyRegexStuff.pm 
x++ ( 184). 

, , (. .
) . 
,
+
. , *+ (?>(
* ) ( 220).
, 
, \w \x{1234}, 
. , 
?+, *+
++ . $LevelN
(. 396) MungeRegexLiteral :
$RegexLiteral =~ s/( \( $LevelN \)[*+?] )\+/(?>$1)/gx;

! 

:
$text =~ s/"(\\.|[^"])*+"//; #


,
. :
$RegexLiteral =~ s{
(
# ...
(?: \\[\\abCdDefnrsStwWX] # \n, \w .
| \\. # \
| \\x[\dafAF]{1,2} # \xFF
| \\x\{[\dafAF]*\) # \{1234}
| \\[]\{[^{}]+\} # \p{Letter}
| \[\]?[^]]+\]
#
| \\\W
# \*
| \( $LevelN \)
# ()
| [^()*+?\\]
#
)
# ......
(?: [*+?] | \{\d+(?:,\d*)?\} )
)
\+ # ... '+' .
}{(?>$1)}gx;

412

7. Perl

:
, + 
(?>). , 
Perl. 
,
. , 
,
Perl. , 
\(blah\)++, 
, ++  , \).
. ,
,
(
, . 172). 
,
. , 

, 
. 
, Perl
( ),  
, .




( , ), , ,
. , 
,
. , m/
($MyStuff)*+/ MungeRegexLiteral 
(()
()*+). $MyStuff . 
,
.
\< \>, ,
,
. 

, \< \>

. , ,

.
, \>, 

Perl

413

\\>, \, 
>.

,
. ,
/x, 
.
, 
(

\N{} 351).


,
, 
. ( 180) Perl ,

$^N ( 365),
(, 
Perl $^N , 
).
:

href\s*=\s*($HttpUrl)(?{ $url = $^N })

$Http
Url, . 367.
, $HttpUrl, $url.
$^N $1 (
) ;
$1 . , ,

:
my $SaveUrl = qr{
($HttpUrl)
# HTTP URL ...
(?{ $url = $^N }) # ... $url
};
$text =~ m{
http \s*=\s* ($SaveUrl)
| src \s*=\s* ($SaveUrl)
}xi;

$url URL.
(,
$+ 364), $SaveUrl
,
, URL
.

414

7. Perl

, ,
$url, , 
, . ,


, . 406.

. (?<Num>\d+) ,
\d+, %^N $^N{Num}.
Perl 
%^N, , 
.


package MyRegexStuff;
use strict;
use warnings;
use overload;
sub import { overload::constant('qr' => \&MungeRegexLiteral) }
my $NestedStuffRegex; #
# . .
$NestedStuffRegex = qr{
(?>
(?: # , '#' '\' ...
[^()\#\\]+
# ...
| (?s: \\. )
# ...
| \#.*\n
# ,
# ...
| \( (??{ $NestedStuffRegex }) \)
)*
)
}x;
sub SimpleConvert($); # ,
#
sub SimpleConvert($)
{
my $re = shift; #
$re =~ s{
\(\?
# "(?"
< ( (?>\w+) ) >
# <$1 > $1 
( $NestedStuffRegex ) # $2 
\)
# ")"
}{

Perl

415

my $id = $1;
my $guts = SimpleConvert($2);
#
#
(?<id>guts)
#
#
(?: (guts) #
#
(?{
#
local($^N{$id}) = $guts #
#
# %^T
#
})
#
)
"(?:($guts)(?{ local(\$^T{'$id'}) = \$^N }))"
}xeog;
return $re; #
}
sub MungeRegexLiteral($)
{
my ($RegexLiteral) = @_; # 
# print "BEFORE: $RegexLiteral\n"; #
my $new = SimpleConvert($RegexLiteral);
if ($new ne $RegexLiteral)
{
my $before = q/(?{ local(%^T) = () })/; #
#
my $after = q/(?{ %^N = %^T })/; #
# ""
$RegexLiteral = "$before(?:$new)$after";
}
# print "AFTER: $RegexLiteral\n"; #
return $RegexLiteral;
}
1;

%NamedCapture, %^N, 
. , $^N. 
, our 
use strict. , , 
Perl ,
, %^N 
. , %^N 
, 
, ( 362).
, 
.
,
,

416

7. Perl

( 
. .).

Perl
Perl ,

, 6: 
, . Perl.
, , Perl. 
.
. Perl 
, 
. 
Perl,

. 

; 
.
, qr//, /o !
. 
. 
/o, ,
(qr//) 

, .
$&. $`, $& $'
. ,

. , 
,
.
Study. study() Perl .
,  , study 
, ,
. , .
. , 
. 

, ,
, . Perl
, ,
, .
, , ,

Perl

417


.
. 
Perl ,

. , , 
Perl .



,
,
Perl. 
IP (18.181.0.24) ,
(018.181.000.024). 
:
$i = sprintf ("%03d.%03d.%03d.%03d", split(/\./, $ip)):

, . . 7.6

( ). ,
, 
,
. , 
 .
IP ,
,
.
, . ,
,
. ,
, 1 13 ( 
, 
). 3, 4 ( 1) 8 ( 13).
.
? 
? 
( 4),
Perl ( 6) Perl (
, sprintf ). /e 
, 
.
: 34 814.

. $& 

418

7. Perl

8, , 
. . 425.
7.6. IP(

1.

1.0

$ip = sprintf("%03d.%03d.%03d.%03d", split(m/\./, $ip));

2.

1.3

substr($ip,
substr($ip,
substr($ip,
substr($ip,
substr($ip,
substr($ip,
substr($ip,

3.

1.6

$ip = sprintf("%03d.%03d.%03d.%03d", $ip =~ m/\d+/g);

4.

1.8

$ip = sprintf("%03d.%03d.%03d.%03d", $ip =~ m/(\d+)/g);

5.

1.8

$ip = sprintf("%03d.%03d.%03d.%03d",
$ip =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/);

6.

2.3

$ip =~ s/\b(?=\d\b)/00/g;
$ip =~ s/\b(?=\d\d\b)/0/g;

7.

3.0

$ip =~ s/\b(\d(\d?)\b)/$2 eq '' ? "00$1" : "0$1"/eg;

8.

3.3

$ip =~ s/\d+/sprintf("%03d", $&)/eg;

9.

3.4

$ip =~ s/(?:(?<=\.)|^)(?=\d\b)/00/g;
$ip =~ s/(?:(?<=\.)|^)(?=\d\d\b)/0/g;

10.

3.4

$ip =~ s/\b(\d\d?\b)/'0' x (3length($1)) . $1/eg;

11.

3.4

$ip =~ s/\b(\d\b)/00$1/g;
$ip =~ s/\b(\d\d\b)/0$1/g;

12.

3.4

$ip =~ s/\b(\d\d?\b)/sprintf("%03d", $1)/eg;

13.

3.5

$ip =~ s/\b(\d{1,2}\b)/sprintf("%03d", $1)/eg;

14.

3.5

$ip =~ s/(\d+)/sprintf("%03d", $1)/eg;

15.

3.6

$ip =~ s/\b(\d\d?(?!\d))/sprintf("%03d", $1)/eg;

16.

4.0

$ip =~ s/(?:(?<=\.)|^)(\d\d?(?!\d))/sprintf("%03d", $1)/eg;

0, 0) = '0' if substr($ip, 1, 1) eq '.';


0, 0) = '0' if substr($ip, 2, 1) eq '.';
4, 0) = '0' if substr($ip, 5, 1) eq '.';
4, 0) = '0' if substr($ip, 6, 1) eq '.';
8, 0) = '0' if substr($ip, 9, 1) eq '.';
8, 0) = '0' if substr($ip, 10, 1) eq '.';
12, 0) = '0' while length($ip) < 15;

,
/, qr//
,
Perl, 
, Perl , 
.
.

Perl

419

,
m//, s// qr//. Perl
, ,
. , 
, .



6 ( 296), Perl
.
Perl .
1. . 
, 
, 
( 354). , 
.
2. . 
, 
, 
( 
).
Perl
. 
46.

. 

, 
(, ), Perl
. 
, ,
, 
.


Perl

:
.


, Perl ,
, 

420

7. Perl

() 
, 
.
, 
. , ,
,
.

,
, 
, . 
my ,
, 
. 405.

.

 . 
:
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
# $today ("Mon", "Tue" ..)
while (<LOGFILE>) {
if (m/^$today:/i) {
.
.
.

m/^$today:/ ,
,
. 
,
Perl 
.
,
, ,
. ,
. ,

.
? 
.
$HttpUrl . 367 (
$HostnameRegex). , 

(, , 

Perl

421

), 
, .
. 
( 
m//) 
.
( ) 25
. ( 
) 
1000 !
,
, ,
1000 
, 0,00026 (
3846 , 
3,7 ).
,
 
. , 
.

/o
/o ,
, 
, 
. /o
, 
. 

, 
. 
, .
/o:
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
while (<LOGFILE>) {
if (m/^$today:/io) {
.
.
.

, 
$today , . 

, 
Perl 
: $today , Perl 

422

7. Perl

.
/o,

, ,
, 
.
/o
/o
. ,
:
sub CheckLogfileForToday()
{
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
while (<LOGFILE>) {
if (m/^$today:/io) { # !
.
.
.
}
}
}

/o ,  
. CheckLogfileForToday() 
, , . 
, $today
, ; 

.
, , 
, 
.


, ,
, 

. (
, 
, 
.
qr// ( 366).

:
sub CheckLogfileForToday()
{

Perl

423

my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];


my $RegexObj = qr/^$today:/i; #
while (<LOGFILE>) {
if ($_ =~ $RegexObj) {
.
.
.
}
}
}


,
.
, , 
.
,
.
;
,
, .
; ,

( ), 
, /o 
, 
CheckLogfileForToday().
,
. qr// 
, , . 

=~, .
m//

if ($_ =~ $RegexObj) {


if (m/$RegexObj/) {

,
. , 
, 
. .
, , m//
. , 
$_, , 

424

7. Perl

, .
, /g 
.
/o qr//
/o qr// ( 
, , ).
, qr//o
, 
, $RegexObj 

$today.
m//o . 422.


( 371) 
,
.
. :
sub CheckLogfileForToday()
{
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
# ,
# .
"Sun:" =~ m/^$today:/i or
"Mon:" =~ m/^$today:/i or
"Tue:" =~ m/^$today:/i or
"Wed:" =~ m/^$today:/i or
"Thu:" =~ m/^$today:/i or
"Fri:" =~ m/^$today:/i or
"Sat:" =~ m/^$today:/i;
while (<LOGFILE>) {
if (m//) { #
.
.
.
}
}
}

, 
(

$today). ,
, 
.

Perl

425


Perl 
. 
, , 
. , ,

(
).
,
Perl ,

, .

$1, $&, $', $+, ...


Perl , 
, $1, $& 
, ( 362). Perl
,
( ) . Perl 
, , 
, ,
$1 , .
, ,
.
$1 
, Perl
. ? Perl 
? :
$Subject =~ s/^(?:Re:\s*)+//;

$& , 
$Subject. , $Subject
$& .
:
if ($Subject =~ m/^SPAM:(.+)/i) {
$Subject = " spam subject removed ";
$SpamCount{$1}++;
}

$1 $Subject .
, Perl 
.

426

7. Perl



$1, $2, $3 . . 
?
$1 , 
. , 
?
...
$` $& $'
$`, $& $' .
, 

. Perl 
, 
, 
.
, , 
Perl ,
( ), 
. , !
$`, $& $' !
, !
! 
$`, $& $' ,
.  .


, m/c/ 
130 000 Perl.
,
, , 

. : 
. , 
, .
,

40%. ,
.
, ( ) 
.
, .
,

Perl

427

. ,
130 000 
3,5 .

. 
, c 
. , .
,
, . 
7000 !
,
.


, , Perl 
.
, . 
, Perl
, . 
, Perl , 

.
. 
. ,
$`, $& $'
. $& 

$1. , HTML
s/<\w+>/\L$&\E/g
s/(<\w+>)/\L$1\E/g.
$` $'
.

:

$`

substr( , 0, $[0])

$&

substr( , $[0], $+[0]  $[0])

$'

substr( , $+[0])

@ @+ ( 365)
, , .
$&. 
$1, 

428

7. Perl

. $& 
,
, 
. $& 
,
.
. ,
$`, $& $',
, . , 
Perl, .
English; 
, :
use English 'no_match_vars';

. 
CPAN ,
, . 

.


$&
,
$`, $& $' 
. ,
 Mre=debug (431)
Enabling $` $& $' support
Omitting $` $& $' support. , 
.
, 

eval, Perl . 
Devel::SawAmper
sand CPAN (http://www.cpan.org):
END {
require Devel::SawAmpersand;
if (Devel::SawAmpersand::sawampersand) {
print "Naughty variable was used!\n";
}
}

Devel::SawAmpersand Devel::FindAmper
sand, , 
. , Perl

Perl

429

. , 
,
(
http://regex.info/).
,
:
use Time::HiRes;
sub CheckNaughtiness()
{
my $text = 'x' x 10_000; # .
# .
my $start = Time::HiRes::time();
for (my $i = 0; $i < 5_000; $i++) { }
my $overhead = Time::HiRes::time()  $start;
# .
$start = Time::HiRes::time();
for (my $i = 0; $i < 5_000; $i++) { $text =~ m/^/ }
my $delta = Time::HiRes::time()  $start;
# $delta $overhead 5 ,
# ( ).
printf "It seems your code is %s (overhead=%.2f, delta=%.2f)\n",
($delta > $overhead+5) ? "naughty" : "clean", $overhead, $delta;
}

study
study() ,
.
( ) 
.
study :
while (<>)
{
study($_); # $_
#
if (m/ 1/) { }
if (m/ 2/) { }
if (m/ /) { }
if (m/ 4/) { }

study ; 
, 
. ,
. 

430

7. Perl

Perl , 
, (
, , ).
study Perl 
,
.
.
study
, .

, study .
study 
, Perl 
. , study 
(m/foo/);
10 000 ! /i ,
/i study (
).

study

study , 
/i,
(?i) (?i:),
study.
study ,

( 303). ?
. 
, , 
, 
study. ,
study .
study 
( , 
study ).
, ,
study, . 
, study,
.
study , 
( 312).
, 
, study . ,
, study index,
.

Perl

431

study
study ,
,
.
, .
, 
SGML (
troff, , , PostScript).

( 475 ). 
,
.

, study.


. Perl Benchmark, 
(perldoc Benchmark). , 
 , 
.
use Time::HiRes 'time';

:
my $start = time;
.
.
.
my $delta = time  $start;
printf "took %.1f seconds\n", $delta;

, 

, 
, .
6 ( 286). , 
,
.



, Perl . 

6 ( 294),
. 
, 

432

7. Perl

(
).
Perl , 
. 
Perl ,
. 
,
.
, .
use re 'debug'; 
no re 'debug';. 
, 
( use re
404).
, 
Mre=debug.

. ( , 
, ):

% perl cw Mre=debug e 'm/^Subject: (.*)/'


Compiling REx `^Subject: (.*)'
rarest char j at 3
1: BOL(2)
2: EXACT <Subject: >(6)
.
.
.
12: END(0)
anchored `Subject: ' at 0 (checking anchored) anchored(BOL) minlen 9
Omitting $` $& $' support.

Perl c (
), w (
, Perl;
) Mre=debug ( ).
e , , m/^Subject:(.*)/, 
 Perl,
.
( ,
Perl) 
. Perl
(, 
/ 300).

, Perl. , 
. ,
.

Perl

433

. 
, :
anchored '' at
, (
.
'' $, .
floating '' at ..
, (
, .
'' $, .
strclass ''
, .
anchored(MBOL), anchored(BOL), anchored(SBOL)
^. MBOL 
/m, BOL SBOL (
BOL SBOL Perl; SBOL 
$*, ).
anchored(GPOS)
\G.
implicit
Perl anchored(MBOL),
.*.
minlen
.
with eval
(?{}) (??{}).
. 
, Perl 
DDEBUGGING.
Perl , $&
( 426).



( 398), 
Perl . c,
Perl .
Match rejected by optimizer, 
, 
, 
, . :

434

7. Perl
% perl w Mre=debug e '"this is a test" =~ m/^Subject:/; '
.
.
.
Did not find anchored substr "Subject:'
Match rejected by optimizer


, . :
% perl w Mre=debug e 'use warnings'
... ...
.
.
.

warnings,  

.



use re 'debug'; Mre=debug. de
bug debugcolor, 
ANSI, 
, .
Perl 
, Mre=debug 
Dr.


, , 
Perl, , ,
. , , Perl, 
.
, 
Perl.
, Perl
, . 
, ,
, . ,

( 180), .
, 
; 
. Perl 
( 164),
( 166).

435

( 184). Perl 
,
, , , 

,
. , , 
. 
(, \v), 
,
( x+\v x++
(?>x+)). 
. : 
, 
. , \V.
\V. ,
, 
. 
. 402.
, . 404, 

.
Perl 
, .

8
Java
Java
java.util.regex 1.4, 2002
.
, 
,
,
,
CharSequence.
java.util.regex 
. , 
,
, .
Java 1.4 Java 1.4.2, 
Java 1.5.0 ( Java 5.0), Java 1.6.0
(Java 6.0, Mustang) .
Java 1.5.0, 
Java 1.4.2 1.6 ,
. ( , 
477).1
, , 
, 16. , 
Java,
1

Java 1.5.0 Update 7. 


Java 1.5 , 
. 
Update 2, Java 1.5,
. Java 1.6 
, 59g.

437
, 
. 1, 2 3 ,
, 
, 4, 5 6 ,

java.util.regex. 
, ,
,
, , 
.
8.1.

454 appendReplacement

446 matcher

452 replaceFirst

454 appendTail

449 matches (Matcher)

465 requireEnd

444 compile

470 matches (Pattern)

468 reset

450 end

468 pattern (Matcher)

470 split

448 find

470 pattern (Pattern)

450 start

470 flags

470 quote

469 text

450 group

452 quoteReplacement

450 toMatchResult

450 groupCount

460 region

469 toString (Matcher)

463 hasAnchoringBounds

460 regionEnd

469 toString (Pattern)

462 hasTransparentBounds

460 regionStart

463 useAnchoringBounds

465 hitEnd

451 replaceAll

468 usePattern

449 lookingAt

455 replaceAllRegion

462 useTransparentBounds

8.1 . API 
. 443.

, , 
, . 439 . 149 160 3,
.
.
java.util.regex
( 112, 128, 131, 271, 289), 
.
,
,
java.util.regex, ,
.

438

8. Java


java.util.regex , 
, 4,
5 6. . 8.2 . 
, 
(?) (?:)
. 8.3, . 440.
,
, 
. 8.2.
\b,
(backspace) . 
( 174).
\ 
\\,
Java. ,
\n
"\\n" Java. . 
( 135).
\##
(, \xFCber ber).
\u#### 
(, \u00FCber ber,
a \u20AC ).
\0 
.
\c
0x40
. , 
, ,
\ca \cA .
\01, 
. \ca \21 !.
\w, \d \s ( ) 
ASCII 
. , \d
[09], \w [09azAZ_],
a \s [\t\n\f\r\x0B] (\0 ASCII
).

( 159): \w \{L}, \d
\p{Nd}, \s \p{Z} ( \W, \D \S
\{} ).

439

8.2.
java.util.regex

151 (C)

\a [\b] \e \f \n \r \t \0 \x## \u#### \c


155 (C)

: [] [^] (
164)

156

: (
)

158 (C)

: \w \d \s \W \D \S

157 (C)

: \{} \{}


442

/ : ^ \A

442

/ : $ \z \Z

171

: \G

174

: \b \

175

: (?=) (?!) (?<=) (?<!)


176

: (?(). 
: d s m i u

177

: (?(:)

177

: # ( )

177

: \Q\E


178

: () \1 \2 ...

178

: (?:)

180

: (?>)

181

: |

183

: * + ? {n} {n,} {min, max}

184

: *? +? ?? {n}? {n,}? {min, max}?

184

: *+ ++ ?+ {n}+ {n,}+ {min, max}+

(C) . .

\p{} \P{}
, Java. 
. .

440

8. Java

\b \B
, \w \W. ,
\w \W ASCII.

, 
,
. , , ?
, a * +
. 3 . 175.
# NL
Pattern.COMMENTS ( 440) 
x. ( 
, . 476.) 
ANSI . :
, , 
.
\Q\E ,
Java 1.6.
8.3. java.regex.util

(?)

Pattern.UNIX_LINES


^ (442)

Pattern.DOTALL


(146)

Pattern.MULTILINE

^ $ (442)

Pattern.COMMENTS


(102). ( 
)

Pattern.CASE_INSENSITIVE

ASCII

Pattern.UNICODE_CASE

,
ASCII

Pattern.CANON_EQ


( 

143)

Pattern.LITERAL


,

441

\p{} \P{} Java


\p{} \P{} 
, Java 
. 4.0.0. (Java 1.4.2 
3.0.0.)


, \p{Lu}. (
. 159.)
\p{Lowercase_Letter} .
\pL 
\p{L}.
Java 1.5 Pf Pi
, ,
\p{P}. ( Java 1.6.)
\p{C} ,
\p{Cn}.
\p{L&} .
\p{all}, (?s:).
\p{assigned} \p{unassigned} : 
\P{Cn} \p{Cn}.


In.
, 
\p{} \P{} , 
. 479.
Java 1.5 
, 3.0.0 4.0.0, 
, : 
4.0.0 Combining Diacritical Marks for Symbols Greek
and Coptic Combining Marks
for Symbols Greek.
Java 1.5 , Java 1.4.2
Arabic Presentation Forms!B Latin Ex!
tended!B ( 480).

Java
Java 1.5.0 \p{} \P{} 
isSomething java.lang.Charac
ter.
\p{} \P{} is
java. , , ja

442

8. Java

va.lang.Character.isJavaIdentifierStart,
\p{javaJavaIdentifierStart}. ( 
java.lang.Character.)




(ASCII LF) ., ^, $ \Z. Java
( 144) 
.
Java, , 
.

U+000A

LF

\n

(ASCII)

U+000D

CR

\r

(ASCII)

U+000D U+000A

CR/LF

\r\n

/ (ASCII)

U+0085

NEL

()

U+2028

LS

()

U+2029

PS

()


., ^, $ \Z 
( 440):

, !

UNIX_LINES

^.$ \Z


.

MULTILINE

^ $


,
^ 
$.



.

DOTALL

CR/LF 
. (. . UNIX_LI
NES) CR/LF 
, . . 
.
, $ \Z
. ASCII LF

java.util.regex

443

, $ \Z ,
CR/LF (. .
LF CR).
$ ^ MULTILINE 
: ^ CR
, CR LF, $
LF, 
CR.
DOTALL 
CR/LF ( DOTALL
.,
), UNIX_LINES
( LF , 
, ).

java.util.regex
java.util.re
gex . 
,
:
java.util.regex.Pattern
java.util.regex.Matcher
java.util.regex.MatchResult
java.util.regex.PatternSyntaxException

Pattern Matcher.
.
Pattern , 
,
Matcher , 
.
Java 1.5 MatchResult 
. 
Matcher,
,
MatchResult.
PatternSyntaxException 
(
, [oops)). java.lang.IllegalArgument
Exception.
:
public class SimpleRegexTest {
public static void main(String[] args)
{

444

8. Java
String myText = "this is my 1st test string";
String myRegex = "\\d+\\w+"; // \d+\w+
java.util.regex.Pattern p = java.util.regex.Pattern.compile(myRegex);
java.util.regex.Matcher m = p.matcher(myText);
if (m.find()) {
String matchedText = m.group();
int
matchedFrom = m.start();
int
matchedTo = m.end();
System.out.println("matched [" + matchedText + "] " +
"from " + matchedFrom +
" to " + matchedTo + ".");
} else {
System.out.println("didn't match");
}
}
}

matched [1st] from 12 to 15..


, .
, , , 
( 3 128)
import java.util.regex.*;


. 
import.
java.regex.util 
, . 
m, Matcher, 
Pattern , 
( find), 
( group, start end).
,
.

Pattern.compile()
Pattern Pattern.com
pile. , 
( 135).
, . 8.3 . 440.
Pattern ,
myRegex, :
Pattern pat = Pattern.compile(myRegex,
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);

Pattern.compile()

445

, (
, Pattern.CASE_INSENSITIVE), 
1, 
( 145). . 451 (?x) 
(?s) (?i) . 475.
,

. ,
. 458 Pattern.compile

Pattern.UNIX_LINES | Pattern.CASE_INSENSITIVE

(?id) .
, 
. 
6 ( 296), 
,
.
, 
; 
, .
, 
, , 
, (, 
)
Pattern.
Pattern.compile 
: 
PatternSyntaxException, 
IllegalArgumentException.

Pattern.matcher()
Pattern , 
( 470), 
matcher. 
, .2 
, 
Pattern .
matcher Matcher.
1

, 
!

, java.util.regex 
, 
CharSequence (, String, StringBuffer CharBuffer).

446

8. Java

Matcher
Matcher,
, 
. 
, m Matcher, m.find() 

. , m.group() 
, .

Matcher, ,
. 

. ,
,
.
, :

Matcher 
Pattern.
usePattern() ( 468). Pattern 
pattern().

Matcher 
( , 
CharSequence). 
reset() ( 468).

( 458).
, 
region.
( ) 
.

regionStart regionEnd ( 460). reset
( 468)
, , 
reset ( 468).

.
, 

(\A ^ $ \z \Z).
true, 
useAnchoringBounds ( 463)
hasAnchoringBounds . reset 
.

Matcher

447

.
, 
( 
, ) 
.
false, 
useTransparentBounds
( 462) hasTransparentBounds . reset 
.

, :
. 
groupCount ( 450).
, . 

( find 448).

, ,
( 453).
,
,
.
hitEnd ( 465).
. , 
, 
( 450). : 
( group()), 
( start() end()) 
,
( group(), start() end()).

MatchResult, 
toMatchResult. MatchResult
group, start end, 
Matcher ( 450).
, ,
, 
( ).
true,
.
requireEnd ( 465).
, ,

, , 
. 

448

8. Java


, ( 437).


Matcher
:
boolean find()
, , 
( 458)
.
.
:
String regex = "\\w+"; // \w+
String text = "Mastering Regular Expressions";
Matcher m = Pattern.compile(regex).matcher(text);
if (m.find ())
System.out.println("match [" + m.group() + "]");

:
match [Mastering]

if
while,
while (m.find())
System.out.println("match [" + m.group() + "]");

:
match [Mastering]
match [Regular]
match [Expressions]

boolean find(int )
find ,
( )
. ,
IndexOutOfBoundsException.
find ,
( reset
468) , 
.
find
, . 478 ( , 
. 476).

Matcher

449

boolean matches()

( 458).

(
). 
, matches 
, \A(?:)\z, 
.
( 458), 
matches 
, ( 463).
, , 
, 
CharBuffer, , 
. , 
,
m.usePattern(urlPattern).matches() , 
URL ( URL,
).
String matches:
"1234".matches("\\d+"); //
"123!".matches("\\d+"); //

boolean lookingAt()
, ,
, 
. lookingAt
matches, , 

.


Matcher
.
, 
IllegalStateException. , 
(
), IndexOutOfBoundsException 
.
: start end 

, 
.

450

8. Java

, 
.
String group()
, 
.
int groupCount()

. , 
, group, start end,
.1
String group(int )
, 
, null, . 
, group(0) 
group().
int start(int )
( )
. 
, 1.
int start()
; 
start(0).
int end(int )
( )
. 
, 1.
int end()
;
end(0).
MatchResult toMatchResult()
Java 1.5.0. MatchResult, 
.
group, start, end groupCount,
Matcher.

, toMatchResult 
IllegalStateException.
1

groupCount ,
, ,
, .

451

Matcher


, 
, .
URL :
(http https), ( ):
String url = "http://regex.info/blog";
String regex = "(?x) ^(https?):// ([^/:]+) (?:(\\d+))?";
Matcher m = Pattern.compile(regex).matcher(url);
if (m.find())
{
System.out.print(
"Overall [" + m.group() + "]" +
" (from "
+ m.start() + " to " + m.end() + ")\n" +
"Protocol [" + m.group(1) + "]" +
" (from "
+ m.start(1) + " to " + m.end(1) + ")\n" +
"Hostname [" + m.group(2) + "]" +
" (from "
+ m.start(2) + " to " + m.end(2) + ")\n"
);
// ,
//
if (m.group(3) == null)
System.out.println("No port; default of '80' is assumed");
else {
System.out.print("Port is [" + m.group(3) + "] " +
"(from "
+ m.start(3) + " to " + m.end(3) + ")\n");
}
}

:
Overall
Protocol
Hostname
No port;

[http://regex.info] (from 0 to 17)


[http] (from 0 to 4)
[regex.info] (from 7 to 17)
default of '80' is assumed


, 
, Matcher
.
String replaceAll(String )

, ,
,
. 453.
(
reset). . 455 
, 
.

452

8. Java

replaceAll
String.
string.replaceAll(, );

Pattern.compile().matcher().replaceAll()

String replaceFirst(String )
replaceAll,
( ).
replaceFirst
String.
static String quoteReplacement(String )
, Java 1.5, 
, 
.
,
,
. (
Matcher.quoteReplacement.)


Java 1.5
Java 5.0:
String text = "Before Java 1.5 was Java 1.4.2. After Java 1.5 is Java 1.6";
String regex = "\\bJava\\s*1\\.5\\b";
Matcher m = Pattern.compile(regex).matcher(text);
String result = m.replaceAll("Java 5.0");
System.out.println(result);

:
Before Java 5.0 was Java 1.4.2. After Java 5.0 is Java 1.6


Matcher , 
, result:
Pattern.compile("\\bJava\\s*1\\.5\\b").matcher(text).replaceAll("Java 5.0")

( 
, 
Pattern 444.)
( ,
) , 
Java 1.6 Java 6.0.
Pattern.compile("\\bJava\\s*1\\.([56])\\b").matcher(text).
replaceAll("Java $1.0")

Matcher

453

:
Before Java 5.0 was Java 1.4.2. After Java 5.0 is Java 6.0


replaceAll replaceFirst. 
, replaceFirst 
. 
replaceFirst ,
,
. ( ,

.)


replaceAll replaceFirst ( appendReplacement,
) ,
,
:
$1, $2 . . ,
($0 
).
, $, ASCII, 
IllegalArgumentException.
$ , 
. , 
, $25
$2, 5.
$6 IndexOutOf
BoundsException.
\ ,
$ \$. 
\\ \. ( 
Java, \
"\\\\".) 
, , 12
, ,
2, $1\2.
, 
Matcher.quoteReplacement, 
, . 
, uRegex, 
uRepl,
:
Pattern.compile(uRegex).matcher(text).replaceAll(Matcher.
quoteReplacement(uRepl))

454

8. Java


Matcher
.

StringBuffer. ,
, 
.
, 
.
Matcher appendReplacement(StringBuffer , String )

(, find). 
:
, ,
, 
, .
, m Matcher, 
\w+ >one+test<.
while
while (m.find())
m.appendReplacement(sb, "XXX")

find >one+test<.
appendReplacement sb ,
, . . >, 
sb XXX.
find >one+test<.
appendReplacement ,
+, XXX.
sb >+, 
m >one+test <.
appendTail, 
.
StringBuffer appendTail(StringBuffer )
, ( 

). 
.

m.appendTail(sb);

sb <. 
>+<.

Matcher

455


replaceAll 
( ,
).
public static String replaceAll(Matcher m, String replacement)
{
m.reset(); // Matcher
StringBuffer result = new StringBuffer(); //
//
while (m.find())
m.appendReplacement(result, replacement);
m.appendTail(result);
return result.toString(); //
}

, replaceAll 
( 458), 
.
replaceAll,

.
:
public static String replaceAllRegion(Matcher m, String replacement)
{
Integer start = m.regionStart();
Integer end = m.regionEnd();
m.reset().region(start, end); // Matcher,
//
StringBuffer result = new StringBuffer(); //
//
while (m.find())
m.appendReplacement(result, replacement);
m.appendTail(result);
return result.toString(); //
}

reset region (
, . 463.
metric, 
, :
// Matcher ""
// "Metric"
// : (\d+(?:\.\d*)? )C\b
Matcher m = Pattern.compile("(\\d+(?:\\.\\d*)?)C\\b").matcher(metric);
StringBuffer result = new StringBuffer();//
//
while (m.find())

456

8. Java
{
float celsius = Float.parseFloat(m.group(1)); //
int fahrenheit = (int)(celsius + 9/5 + 32); //
//
m.appendReplacement(result, fahrenheit + "F");//
}
m.appendTail(result);
System.out.println(result.toString());

//

, metric from
40.1C., from 97F to 104F..

36.3

to


java.util.regex
String, Matcher
, CharSequence,

.
, CharSequence, 
StringBuffer StringBuilder,
, 
. 
String, String 
. 
StringBuilder,
StringBuffer.
,
, ,
, ,
StringBuilder:1
StringBuilder text = new StringBuilder("It's SO very RUDE to shout!");
Matcher m = Pattern.compile("\\b[\\p{Lu}\\p{Lt}]+\\b").matcher(text);
while (m.find())
text.replace(m.start(), m.end(), m.group().toLowerCase());
System.out.println(text);

:
It's so very rude to shout!

text.replace.
,
1

\b[\p{Lu} \p{Lt}]+\b.
3 , \p{Lu} 
, \p{Lt} 
. ASCII :

\b[AZ]+\b.

Matcher

457

(
), ( 
, ).

, , 
. 
, ,
.


, 
. ,
, 
Matcher, ( 
, find )
.

, find , 
. 
, <b></b>
:
StringBuilder text = new StringBuilder("It's SO very RUDE to shout!");
Matcher m = Pattern.compile("\\b[\\p{Lu}\\p{Lt}]+\\b").matcher(text);
int matchPointer = 0;//
while (m.find(matchPointer)) {
matchPointer = m.end(); // ,
//
text.replace(m.start(), m.end(), "<b>"+ m.group().toLowerCase() +"</b>");
matchPointer += 7;
// '<b>' '</b>'
}
System.out.println(text);

:
It's <b>so</b> very <b>rude</b> to shout!

Matcher
Java 1.5 Matcher 
, 
. , 
region 
.
HTML,
<img>, ALT.
Matcher,
( HTML), 
: <img>, ALT.

458

8. Java

Matcher , 
, ,
<img> 
ALT. , start
end <img>, 
Matcher, ALT, 
find.
, ALT 

, .
// <img>. 'html' HTML
Matcher mImg = Pattern.compile("(?id)<IMG\\s+(.*?)/?>").matcher(html);
// ALT
// ( ALT IMG 'html')
Matcher mAlt = Pattern.compile("(?ix)\\b ALT \\s* =").matcher(html);
// <img>...
while (mImg.find()) {
// ALT
mAlt.region( mImg.start(1), mImg.end(1) );
// , ALT ,
// ,
if (! mAlt.find())
System.out.println("Missing ALT attribute in: " + mImg.group());
}

,
(, mAlt),
( mAlt.region).
: mAlt 
, mAlt.reset(html).region().
reset 
, ,
, , .
,
mAlt , find
, 
ALT= HTML.

HTML, <img> ALT. 
HTML , 
<img>, .
:
// <img>. 'html' HTML
Matcher mImg = Pattern.compile("(?id)<IMG\\s+(.*?)/?>").matcher(html);

Matcher

459

// ALT
// ( ALT IMG 'html')
Matcher mAlt = Pattern.compile("(?ix)\\b ALT \\s* =").matcher(html);
// Matcher
Matcher mLine = Pattern.compile("\\n").matcher(html);
// <img>...
while (mImg.find()) {
// ALT
mAlt.region( mImg.start(1), mImg.end(1) );
// , ALT ,
// ,
if (! mAlt.find()) {
//
mLine.region(0, mImg.start());
int lineNum = 1; // 1
while (mLine.find())
lineNum++; //
// 1
System.out.println("Missing ALT attribute on line " + lineNum);
}
}

, mAlt 
start(1), , 
HTML <img>.
start() 
,
<img> ( 
).


, ,
, ,
reset
, .

:
matches
lookingAt
find() ( )

Matcher 
:
find() ( )
replaceAll
replaceFirst
reset ( )

460

8. Java

, 
(. . , start end)
,
.



:
Matcher region(int , int )

, 
. ,
Matcher
, find 
.
, 
reset (
, reset 468).
Matcher, 
( 463).

, , IndexOut
OfBoundsException.
int regionStart()

Matcher. 0.
int regionEnd()

Matcher. .
region (
, , 
. . 8.4
.
8.4.

Java

m.region(, m.regionEnd());
m.region(m.regionStart(), );
 m.reset().region(, m.region
End());


m.region(0, );

Matcher

461


, 
, 
. , 
, ^ 
, .

.

( ,
) , (
, 
/ ( 
).
, ,
,
.
, , 
, ,
.
,
, CharBuffer. ,
,
, 
. , 
. ,
:
Madagas car is much too large to see on foot, so you'll need a car.

\bcar\b 
automobile. 
( 
) ... , 
, Mada
gascar. ,
( false)
\b , 
. , 
. ,
\b s c
, \b.

462

8. Java

Matcher useTransparentBounds(boolean b)
true false,
.
false.
Matcher, 
( 463).
boolean hasTransparentBounds()
true, 
, false.
Matcher 
false, 
, ,
. , , 
,
.1 ,
, \b
, 
, .

false ( ):
String regex = "\\bcar\\b"; // \bcar\b
String text = "Madagascar is best seen by car or bike.";
Matcher m = Pattern.compile(regex).matcher(text);
m.region(7, text.length());
m.find();
System.out.println("Matches starting at character " + m.start());

:
Matches starting at character 7

, 
, Mada
gas car, , 
. .
:
m.useTransparentBounds(true);
1

Java 1.5 Update 7 ,


,
Sun. Pattern.MULTILINE ^ ( 
,
, )
,
 , 
, .

Matcher

463

find, :
Matches starting at character 27

, 
s
\b . 
bycarorbike.
,
,
. , reset 
.


:
Matcher useAnchoringBounds(boolean b)
true false
. true.
Matcher, 
( 463).
boolean hasAnchoringBounds()
true, 
, false .
Matcher 
true, (^ \A $ \z \Z) 
, 
. false

,
.

,

, 
.
, 
, 
.
, reset .


, 
Matcher
:

464

8. Java
Pattern p = Pattern.compile(regex); //
Matcher m = p.matcher(text);
//
//
//
m.region(5, text.length());
//
//
m.useAnchoringBounds(false);
//
//
m.useTransparentBounds(true);
//
//

.


Matcher.

.
^
.

.

,
,
Matcher ( ),
:
Matcher m = Pattern.compile(regex).matcher(text);
m.region(5, text.length());
//
// .
m.useAnchoringBounds(false);
// ^
// .
m.useTransparentBounds(true);
//
// .

,
Matcher, ( ,
,
):
Matcher m = Pattern.compile(regex).matcher(text).region(5,text.length())
.useAnchoringBounds(false).useTransparentBounds(true);

,
. 

, 
, , , ,
.
. 475.


Java 1.5 Matcher hitEnd requireEnd,
.
.
, , ,
var<34 
IDENTIFIER LESS_THAN INTEGER.
, 
. 
, true, ,

Matcher

465

, 
. , (, , 
)
<. ,
=,
LESS_THAN LESS_THEN_OR_EQUAL.
, 
, , ,
. 
hitEnd Java 1.5,
. , Java 1.6
, , , Java 1.5 
, .
, 

, 
. ( , 
java.util.Scanner.)
boolean hitEnd()
( Java 1.5;
. 467.)


( , 
). 
, \b $.
hitEnd true, 
( 
,
). , false 
,
, 
, 
, , .
, true,
hitEnd , ,
,
.
hitEnd true, ,
,
.
boolean requireEnd()
,
,

466

8. Java

,
. , requireEnd true,
, 
, .
, true,
requireEnd, ,
, .
hitEnd requireEnd 
.

,
hitEnd requireEnd
. 8.5 , hitEnd requ
ireEnd lookingAt.

,
.
8.5. , hitEnd requireEnd
lookingAt

hitEnd() requireEnd()

\d+\b | [><]=?

1234

1234

true

true

\d+\b | [><]=?

1234>567

1234 >567

false

false

\d+\b | [><]=?

>

>

true

false

\d+\b | [><]=?

>567

> 567

false

false

\d+\b | [><]=?

>=

>=

false

false

\d+\b | [><]=?

>=567

>= 567

false

false

\d+\b | [><]=?

oops

false

(set | setup )\b

se

true

(set | setup )\b

set

set

true

10 (set | setup )\b

setu

true

11 (set | setup )\b

setup

setup

true

true

12 (set | setup )\b

setx=3

set x=3

false

false

13 (set | setup )\b

setupx

setup x

false

false

14 (set | setup )\b

self

false

15 (set | setup )\b

oops

false

true

. 8.5 
: ,

Matcher

467

,  . 
, set setup.
, , , 
.
, 5 . 8.5: ,
, 
hitEnd false. , 
, 
( ,
 ).

hitEnd
hitEnd Java 1.5 ( Java 1.6)1 
, hitEnd
: , 

(, 
).
, >=?
(
) 
, = . ,

a|an|the (
) 
, a 
, , 
.
: values?
\r?\n\r?\n.

,
:
( , ), 
 , .
, >=? (?i:>=?),
( 145)

( ).
1

, Sun ,
5.0u9, . . Java 1.5 Update 9. ( 
, . 436 ,
Java 1.5 Update 7.) , Java 1.6 beta 
.

468

8. Java

, a|an|the

[aA]|an|the, 
, Pattern.CASE_INSENSITIVE.

Matcher
Matcher ,
:
Matcher reset()

Matcher:
, 

,
( 458). 
( 463).
reset Match
er: replaceAll, replaceFirst find ( ), 
.
Matcher, 
( 463).
Matcher reset(CharSequence )

Matcher, reset(), , ,
String ( ,
CharSequence).

(, 
),
reset ,
Matcher.
Matcher, 
( 463).
Pattern pattern()
pattern Pattern, 
Matcher. ,
m.pattern().pattern(), 
pattern Pattern ( 
, ).
Matcher usePattern(Pattern p)
, Java 1.5, Pattern, 
Matcher, , p.
reset Matcher, 


Matcher

469

Matcher. 
usePattern . 475.
Matcher, 
( 463).
String toString()
, Java 1.5, , 
Matcher, 
. 
, , 
, Java 1.6 :
Matcher m = Pattern.compile("(\\w+)").matcher("ABC 123");
System.out.println(m.toString());
m.find();
System.out.println(m.toString());

:
java.util.regex.Matcher[pattern=(\w+) region=0,7 lastmatch=]
java.util.regex.Matcher[pattern=(\w+) region=0,7 lastmatch=ABC]

Java 1.4.2 Matcher toString,


java.lang.Object,
: java.util.regex.Matcher@480457.

Matcher
Matcher ,
,
, :
// ,
// .
static final Pattern pNeverFail = Pattern.compile("^");
// , Matcher.
public static String text(Matcher m)
{
// , .
Integer regionStart = m.regionStart();
Integer regionEnd = m.regionEnd();
Pattern pattern = m.pattern();
// .
String text = m.usePattern(pNeverFail).replaceFirst("");
// ,
// ( ).
m.usePattern(pattern).region(regionStart, regionEnd);
//
return text;
}

470

8. Java


String, replaceFirst, 
.
Matcher, 
. ( ;
, String, 
); 
, Sun 
.

Pattern
compile, Pattern 
:
split
,
.
String pattern()
, 
Pattern.
String toString()
Java 1.5.
pattern.
int flags()
( ), 
compile Pattern.
static String quote(String )
Java 1.5. 
,
Pattern.compile. , Pat
tern.quote("main()") \Qmain()\E,
\Qmain()\E, 
main().
static boolean matches(String , CharSequence )
, 
, (,
matcher, , 
CharSequence, String). 
:
Pattern.compile(regex).matcher(text).matches();


( 

Pattern

471

), ,
.
(, 
,
), , 

Pattern, ,
.

split Pattern
String[] split(CharSequence )
(CharSequence) ,
.
split String.
:
String[] result = Pattern.compile("\\.").split("209.204.146.22"):

(209, 204, 146 22),


\. .
,
.
, 
, 
:
String[] result = Pattern.compile("\\W+").split(Text):

What's up, Doc (What,


s, up, Doc), 
. \W+ 
\P{L}+ [^\p{L}\p{N}_] ( 438).


, 
, split, (. . 
, ). , 
,
, , 
. :
String[] result = Pattern.compile("\\s*,\\s*").split(", one, two , ,, 3");


. 
: , one, two, 3.
, , 
:

472

8. Java
String[] result = Pattern.compile(":").split(":xx:");

: xx. 
split 
, .

split Pattern
String[] split(CharSequence , int )
split 
Pattern,
.
.

split
, 
. 
,
String[] result = Pattern.compile(":").split(":xx:", 1);

( , xx
).

split
,
split (. . ).

split
split ,
. ,
1 (, 
3 , ).
1 , 
,
.
,
Friedl,Jeffrey,Eric Francis,America,Ohio,Rootstown

,
( ).
Stnng[] Namelnfo = Pattern compile(",") split(Text, 4);
// NameInfo[0] 
// NameInfo[l] 
// NameInfo[2]  ( )
// NameInfo[3]  . ,
//

473



, , . .,
? 
.


WIDTH HEIGHT <img>
, 
<img> HTML
WIDTH HEIGHT. (HTML
StringBuilder, StringBuffer 
, CharSequence.)
, <img>
, .
,
, 
. 

, .1
<img>
SRC, WIDTH HEIGHT .  
(WIDTH HEIGHT) , 
,
.
WIDTH
HEIGHT,
.
, ,
, ,
. (, 
WIDTH , 
, HEIGHT 

; 
.)
, 
.
1

Yahoo! :
<img> !. 
, , 
, <img>
.

474

8. Java

( 458)
( 463). :
// Matcher <img>
Matcher mImg = Pattern.compile("(?id)<IMG\\s+(.*?)/?>").matcher(html);
// Matcher SRC, WIDTH HEIGHT
// ( )
Matcher mSrc
= Pattern.compile("(?ix)\\bSRC =(\\S+)").matcher(html);
Matcher mWidth = Pattern.compile("(?ix)\\bWIDTH =(\\S+)").matcher(html);
Matcher mHeight = Pattern.compile("(?ix)\\bHEIGHT=(\\S+)").matcher(html);
int imgMatchPointer = 0; //
while (mImg.find(imgMatchPointer))
{
imgMatchPointer = mImg.end(); // <img>
// ,
//
Boolean hasSrc
=
mSrc.region( mImg.start(1), mImg.end(1) ).find();
Boolean hasHeight = mHeight.region( mImg.start(1), mImg.end(1) ).find();
Boolean hasWidth = mWidth.region( mImg.start(1), mImg.end(1) ).find();
// SRC WIDTH / HEIGHT...
if (hasSrc && (! hasWidth || ! hasHeight))
{
java.awt.image.BufferedImage i = //
javax.imageio.ImageIO.read(new java.net.URL(mSrc.group(1)));
String size; // WIDTH / HEIGHT
if (hasWidth)
// ,
//
size = "height='" + (int)(Integer.parseInt(mWidth.group(1)) *
i.getHeight() / i.getWidth()) + "' ";
else if (hasHeight)
// , ,
//
size = "width='" + (int)(Integer.parseInt(mHeight.group(1)) *
i.getWidth() / i.getHeight()) + "' ";
else
//
//
size = "width='" + i.getWidth() + "' " +
"height='" + i.getHeight() + "' ";
html.insert(mImg.start(1), size); // HTML
imgMatchPointer += size.length(); //
}
}

, 
. 
,
, 

475

. , 
, 
. ( . 253 
Perl
HTML.) 
URL ,
, 
.
,
.

HTML

Matcher
Java Perl, 
HTML ( 172). 
usePattern .

, \G. 
. 172.
Pattern
Pattern
Pattern
Pattern
Pattern
Pattern
Pattern

pAtEnd
pWord
pNonHtml
pImgTag
pLink
pLinkX
pEntity

=
=
=
=
=
=
=

Pattern.compile("\\G\\z");
Pattern.compile("\\G\\w+");
Pattern.compile("\\G[^\\w<>&]+");
Pattern.compile("\\G(?i)<img\\s+([^>]+)>");
Pattern.compile("\\G(?i)<A\\s+([^>]+)>");
Pattern.compile("\\G(?i)</A>");
Pattern.compile("\\G&(#\\d+;\\w+);");

Boolean needClose = false;


Matcher m = pAtEnd.matcher(html); // Matcher
// Pattern
while (! m.usePattern(pAtEnd).find())
{
if (m.usePattern(pWord).find()) {
... m.group(),
...
} else if (m.usePattern(pImgTag).find()) {
... <img> ...
} else if (! needClose && m.usePattern(pLink).find()) {
... ...
needClose = true;
} else if (needClose && m.usePattern(pLinkX).find()) {
System.out.println("/LINK [" + m.group() + "]");
needClose = false;
} else if (m.usePattern(pEntity).find()) {
// , &gt; &#123;
} else if (m.usePattern(pNonHtml).find()) {

476

8. Java
// ( ), HTML 
//
} else {
// ,
// . .
// ,
//
m.usePattern(Pattern.compile("\\G(?s).{1,12}")).find();
System.out.println("Bad char before '" + m.group() + "'");
System.exit(1);
}
}
if (needClose) {
System.out.println("Missing Final </A>");
System.exit(1);
}

java.util.regex , ,
pNonHtml 
, 
, 
. ,
, 
, 
. Sun.

find, ? ,
.

CSV
CSV 6
(330) java.util.regex. 
:
(184) .
String regex = // (1),
//  (2)
" \\G(?:^|,)
" (?:
"
# ...
"
\" #
"
( [^\"]*+ (?: \"\" [^\"]*+ )*+ )
"
\" #
" | # ......
"
# ...
"
([^\",]*+)
" )

\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n";

// Matcher CSV ,
// , .

Java

477

Matcher mMain = Pattern.compile(regex, Pattern.COMMENTS).matcher(line);


// Matcher "", .
Matcher mQuote = Pattern.compile("\"\"").matcher("");
while (mMain.find())
{
String field;
if (mMain.start(2) >= 0)
field = mMain.group(2); //

else
// ,
// .
field = mQuote.reset(mMain.group(1)).replaceAll("\"");
// ...
System.out.println("Field [" + field + "]");
}

, . 273,
; , 
( 6, . 330),
, Matcher
.

Java
,
Java 1.5.0. 
Java 1.4.2  Java 1.6
( , ).
, ,
, 1.4.2 1.5.0
(Update 7) 1.5.0 59g 
Java 1.6.

1.4.2 1.5.0
Java 1.5.0 Java 1.4.2 
.
Matcher. 
, . 
.1

(?1), Ja
va 1.4.2, Java 1.5.0. 
PCRE (564), 
Java.

478

8. Java

Java 1.5.0
Matcher,
Java 1.4.2:
region
regionStart
regionEnd
useAnchoringBounds
hasAnchoringBounds
useTransparentBounds
hasTransparentBounds


find,
. 476.
java.util.regex,
. 476, 
,
find .

find , , 
.
, . 476,
:
Pattern
Pattern
Pattern
Pattern
Pattern
Pattern
Boolean
Matcher

pWord
= Pattern.compile("\\G\\w+");
pNonHtml = Pattern.compile("\\G[^\\w<>&]+");
pImgTag = Pattern.compile("\\G(?i)<img\\s+([^>]+)>");
pLink
= Pattern.compile("\\G(?i)<A\\s+([^>]+)>");
pLinkX = Pattern.compile("\\G(?i)</A>");
pEntity = Pattern.compile("\\G&(#\\d+;\\w+);");
needClose = false;
m = pWord.matcher(html); // Matcher
// Pattern
Integer currentLoc = 0;
//
while (currentLoc < html.length())
{
if (m.usePattern(pWord).find(currentLoc)) {
... m.group(),
...
} else if (m.usePattern(pImgTag).find(currentLoc)) {
... <img> ...
} else if (! needClose && m.usePattern(pLink).find(currentLoc)) {
... ...
needClose = true;

Java

479

} else if (needClose && m.usePattern(pLinkX).find(currentLoc)) {


System.out.println("/LINK [" + m.group() + "]");
needClose = false;
} else if (m.usePattern(pEntity).find(currentLoc)) {
// , &gt; &#123;
} else {
// ,
// . .
// ,
// .
m.usePattern(Pattern.compile("\\G(?s).{1,12}")).find(cur
rentLoc);
System.out.println("Bad char at '" + m.group() + "'");
System.exit(1);
}
currentLoc = m.end(); // " " ,
//
}
if (needClose) {
System.out.println("Missing Final </A>");
System.exit(1);
}


find, 
Matcher, ,
.
,
region
find, :
m.usePattern(pWord).region(start,end).find(currentLoc)

Matcher, Java 1.4.2:

toMatchResult

hitEnd

requireEnd

usePattern

toString

Java 1.4.2:

Pattern.quote

1.4.2 1.5.0
Java 1.4.2 Java 1.5.0, 
:

480

8. Java

Java 1.5.0 4.0.0, Ja


va 1.4.2 3.0.0.
, (, 3.0.0
\uFFFF),
.
,
\p{} \P{}. (
Java, Charac
ter.UnicodeBlock.)
Java 1.4.2 : 
In. ,
: \p{InHangulJamo} \p{InAra
bicPresentationFormsA}.
Java 1.5.0 
. , In 
; , \p{InHan
gulJamo} \p{InArabicPresentationFormsA}. ,
In Java (
,
): \p{In
Hangul_Jamo} \p{InArabic_Presentation_Forms_A}.
Java 1.5.0 , 
1.4.2,  Arabic
Presentation Forms!B Latin Extended!B, 
B Bound, . .
\p{InArabicPresentationFormsBound} \p{InLatinExtendedBound}.
Java 1.5.0 Character isSomet
hing ( 441).

1.5.0 1.6.0
Java 1.6, (
), 
1.5.0, :
Java 1.6 Pi Pf, 
.
Java 1.6 \Q\E,
.

9
.NET
Microsoft .NET Framework, Visual Basic,
C# C++ ( ), 
,
. 
, , 
.1

, ,

.NET. Visual Basic.

,
,
16. , ,
.NET, , 
( 
) . 1, 2 3
, ,
, 4, 5 6
, 
.NET.
, ,

, , ,
.
1

.NET 2.0 ( 
Visual Studio 2005).

482

9. .NET

, , ,
. 483, , . 149 160 3, 
.
.
.NET.
, ,

.NET. 
.NET, 
, 
, .


(shared assembly).

.NET
.NET
, , 
4, 5 6. . 9.1 
.NET.
3.
( 145),
, 
, 

(?) (?:). 
. 9.2 . 485.

\w. 
VB.NET ("\w") (ver
batim) C# (@"\w"). , 
(, C++),
\ \
("\\w"). . 
( 135).
. 9.1:
\b Backspace ;
( 174).
\x##
(, \xFCber ber).
\u#### 
(, \u00FCber ber,
a \u20AC ).

483

.NET

.NET Framework 2.0


, , [az[aeiou]] ASCII
( 163). , 
, ,
, , .
9.1. .NET

151 (C)

\a [\b] \e \f \n \r \t \0 \x## \u#### \c


155
: [] [^]
156

: (
)

158 (C) : \w \d \s \W \D \S
157 (C) : \{} \{}

169

/ : ^ \A

169

/ : $ \Z \z

171

: \G

174

: \b \

175

: (?=) (?!) (?<=) (?<!)


176
177

: (?). 
: s m i n (485)
: (?:)

177

: (?#)


178

: () \1 \2 ...

515

: (?<>)

485
178

: (?<>)
\k<>
: (?:)

180

: (?>)

181

: |

183

: * + ? {n} {n,} {min, max}

184

: *? +? ?? {n}? {n,}? {min, max}?

486

: (?if then | else) if


, () ()

(C) .

484

9. .NET

\w, \d \s ( )

, RegexOptions.ECMAScript ( 489)
ASCII.
\w \p{Ll}, \p{Lu},
\p{Lt}, \p{Lo}, \p{Nd} \{}. : \p{Lm}
( . 160).
\s [\f\n\r\t\v\x85\p{Z}]. U+0085 
, a \p{Z}
 ( 159).
\{} \P{} 
4.0.1. 
.

Is ( . 164) 

. , \p{Is_Greek_Extended} \p{Is Greek Exten
ded} ; \p{IsGreekEx
tended}.
, \p{Lu};
\p{Lowercase_Letter} .

(. . \pL \{L} 
) ( . 159 160).
\p{L&}, 
\p{All}, \p{Assigned} \p{Unassigned} 
, (?s:), \P{Cn}
\p{Cn} .
\G , 
, 
( 171).

.
.NET 
, 
( 175).
RegexOptions.ExplicitCapture, 
(?n), 
(). (
(?<num>\d+)) ( 180). 

()

(?:).

485

.NET

9.2. .NET
RegexOptions (?)
.Singleline


(146)

.Multiline


^ $ (147)

.IgnorePatternWhitespace


( 103).

.IgnoreCase

ASCII

.ExplicitCapture

(), 

(?<>)

.ECMAScript

\w, \d \s
ASCII; ,
(489).

.RightToLeft


, 
(. .
).
, 
(488)

.Compiled


,
( 486)

.NET

.


.NET ( 180)
(?<>) (?''). 
; , 
<> , 
.

\k<> \k''.
( , Match 
.NET . 494)
Groups()
Match ( # Groups[]).

486

9. .NET

( 502)
${}.

, , 
. 
, :

1 1 3

3 2

(\w) (?<Num>\d+) (\s+)

, \d+, 
Groups("Num") Groups(3). 
( ).


. 
, 
.
Split
( 504) $+
( 502).


if (? if then | else) ( 182) 
,
( ). 
( ) 
(. .
(?=)). 
: , 
(?<Num>), (Num)
(?(Num) then | else) (?=Num), . . 
Num). ,
if .

. (?=) 
,
, 
if .


, ,

, . .NET
(parsing), 

487

.NET

, 
.
:

. 
,
, 
. 
( 296).

. RegexOptions.Compiled
, 
. , 

,
MSIL (Microsoft Intermediate Language), , ,
JIT
.
,
. 

.

.
( ) Regex ,
DLL. 
; 
. 
( 513).

RegexOptions.Com
piled ,
, :



RegexOptions.Compiled RegexOptions.Compiled

( 60 )

(515
)

10

( , 
RegexOptions.Compiled) .
550 
1500 . Regex
Options.Compiled 25 
, 10 
. , 

488

9. .NET


.
, RegexOpti
ons.Compiled , 
, 
. ,
,
. 

.

DLL
Regex. ,
,
( 
DLL, 
). , 


,
. 514.


(. . ,
) 
. , 
, .
? 
? ,
, 

?
\d+
123and456. ,
123, ,
456. ,
, 

.
( ),
\d+ 456 .
, 45 6, 
6, \d+. ,
6.

.NET

489

RegexOptions.RightToLeft 
.NET. ?
... , 
. (
123and456)
456.
, ,
.
, RegexOptions.RightToLeft 
, ,
.

\+
\, ,
, .
, Regex
Options.ECMAScript. 
,

\k<>, (, \08).

RegexOptions.ECMAScript.
RegexOptions.ECMAScript , 
\1\9 
,
, , 
(, \012 ASCII
).
, (. .

). , \000
\377, . , \12
, 
12 ,
.
RegexOptions.ECMAScript
.

ECMAScript
ECMAScript JavaScript1,

1

European Computer Manufacturers Associati


on, . . . 
60
.

490

9. .NET

. .NET, RegexOpti
ons.ECMAScript, . 
, ECMAScript 
, .
RegexOptions?
RegexOptions.ECMAScript
:
RegexOptlons.lgnoreCase;
RegexOptions.Multiline;
RegexOptions.Compiled.

\w, \d \s ( \W, \D \S) 


ASCII.
\+,

, 
. 
, ()\10 \10
, 0.

.NET
.NET ,
, 
. Microsoft 
, . 
, , ,
. ,
. , 
.NET.


.NET, .

, , 

 . , ; 
.
,
( 492),
Imports System.Text.RegularExpressions

, 
.

.NET

491

, 
TestStr String. ,
.


:
If Regex.IsMatch (TestStr, "^\s*$")
Console.WriteLine("line is empty")
Else
Console.WriteLine("line is not empty")
End If

:
If Regex.IsMatch(TestStr, "^subject:", RegexOptions.IgnoreCase)
Console.WriteLine("line is a subject line")
Else
Console.WriteLine("line is not a subject line")
End If


, 
. , TheNum
.
Dim TheNum as String = Regex.Match (TestStr, "\d+").Value
If TheNum <> ""
Console.WriteLine("Number is:" & TheNum}
End If

:
Dim ImgTag as String = Regex.Match(TestStr, "<img\b[^>]*>", _
RegexOptions.IgnoreCase).Value
If ImgTag <> ""
Console.WriteLine("Image tag:" & ImgTag)
End If


(. . $1)
:
Dim Subject as String = _
Regex.Match(TestStr, "^Subject: (.*)").Groups(l).Value
If Subject <> ""
Console.WriteLine("Subject is:" & Subject)
End If

C# Groups(1) Groups[1].
To :

492

9. .NET
Dim Subject as String = _
Regex.Match(TestStr, "^subject: (.*)", _
RegexOptions.IgnoreCase).Groups(l).Value
If Subject <> ""
Console.WriteLine("Subject is: " & Subject)
End If

To , :
Dim Subject as String = _
Regex.Match(TestStr, "^subject: (?<Subj>.*)", _
RegexOptions.IgnoreCase).Groups("Subj").Value
If Subject <> ""
Console.WriteLine("Subject is: " & Subject)
End If



HTML, HTML 
:
TestStr = Regex.Replace(TestStr, "&", "&amp;")
TestStr = Regex.Replace(TestStr, "<", "&lt;")
TestStr = Regex.Replace(TestStr, ">". "&gt;")
Console.WriteLine("Now safe in HTML: " & TestStr)

( ) ,
. 502. , $& 
, . 
, ,
<B></B>:
TestStr = Regex.Replace(TestStr, "\b[AZ]\w*", "<B>$&</B>")
Console.WriteLine("Modified string: " & TestStr)

<B></B> ( ) 
<I></I>:
TestStr = Regex.Replace(TestStr, "<b>(.*?)</b>", "<I>$1</I>", _
RegexOptlons.IgnoreCase)
Console.WriteLine("Modified string: " & TestStr)


.NET

. ,
,
:
Option Explicit On '
Option Strict On ' , .

.NET

493

' ,
' .
Imports System.Text.RegularExpressions
Module SimpleTest
Sub Main()
Dim SampleText as String = "this is the 1st test string"
Dim R as Regex = New Regex("\d+\w+") ' .
Dim M as Match = R.match(SampleText) ' .
If not M.Success
Console.WriteLine("no match")
Else
Dim MatchedText as String = M.Value ' ...
Dim MatchedFrom as Integer = M.Index
Dim MatchedLen as Integer = M.Length
Console.WriteLine("matched [" & MatchedText & "]" & _
" from char#" & MatchedFrom.ToString() & _
" for " & MatchedLen.ToString() & " chars.")
End If
End Sub
End Module


\d+\w+ :
matched [1st] from char#12 for 3 chars.


Imports.System.Text.RegularExpressions
? VB, 
,
.
# :
using System.Text.RegularExpressions; // #

.
:
Dim R as Regex = New Regex("\d+\w+") ' .
Dim M as Match = R.match(SampleText) ' .

:
Dim M as Match = Regex.Match(SampleText, "\d+\w+") '
' no .

, 
,
. 
( 511). 
, 

494

9. .NET

, 
Regex.Match , .
, 
, :
Option Explicit On
Option Strict On
Imports System.Text.RegularExpressions

VB,
. 129, 132, 257, 273 291.


, 
.NET. 
, 
. .NET

, , 
. 9.1.
\s+(\d+) y16, 1998.

"\s+(\d+)"

Regex
Match ( "Mar 16, 1998")
NextMatch()

Match
Index
Length
Value
Groups (0)
Success

Groups.Count
Groups (1)
2

Group

Group

true

"16"

Index
Length
Value
Groups (0)
Success

true

Match.Empty

Groups.Count
Success
Groups ( 1)
2

false

Group

Group

Index
Index
Length
Length
Value
Value
Success
Success
3
4
3
2

" 16"

NextMatch()

Match

Index
Index
Length
Length
Value
Value
Success
Success
7
8
5
4

" 1998"

true

"1998"

true

. 9.1. .NET

.NET

495

Regex
Regex:
Dim R as Regex = New Regex("\s+(\d+)")

,
\s+(\d+), R.
Regex Match(),
:
Dim M as Match = R.Match("May 16, 1998")

Match
Match() Regex
Match. Match 
, Success ( 
) Value ( ).
Match .
Match , 
. 
Perl , 
, $1. .NET
.
Groups Match, Groups(1).Value 
Perl $1 ( : #
, Groups[1].Value).
Result ( . 508).

Group
Groups(1) 
Group, .Value Value 
(. . , ).
Group;
, 
.
, MatchObj.Value MatchObj.Groups(0)
. ,
,
,
MatchObj.Groups.Count ( , 
Match).

\s+(\d+) MatchObj.Groups.Count ( 
$1).

Capture
Capture .
. 516.

496

9. .NET



( , . .) 
Match .
Match, Group (
), .



. , Regex, 
Match
Group.

Regex, , , 
.NET
Regex.
, .NET.
, 
Object.

Regex
Regex , 
( ),
( 
). :
Dim StripTrailWS = new Regex("\s+$") '

Regex 
; .
:
Dim GetSubject = new Regex("^subject: (.*)", RegexOptions.IgnoreCase)

RegexOptions,
, 
OR:
Dim GetSubject = new Regex("^subject: (.*)", _
RegexOptions.IgnoreCase OR RegexOptions.Multiline)



ArgumentException. 
, , 
,

497

(, 
). :
Dim R As Regex
Try
R = New Regex(SearchRegex)
Catch e As ArgumentException
Console.WriteLine("*ERROR* bad regex: " & e.ToStrlng)
Exit Sub
End Try

, 
.

Regex
Regex :
RegexOptions.IgnoreCase
, 
( 145).
RegexOptions.IgnorePatternWhitespace

( 146). #,

, 
.
VB.NET 
chr(10), :
Dim R as Regex = New Regex( _
"# ...
" \d+(?:\.\d*)? # ...
" |
# ...
" \.\d+
#
RegexOptlons.IgnorePatternWhitespace)

" & chr(lO) & _


" & chr(lO) & _
" & chr(lO) & _
", _

; VB.NET 
(?#).
Dim R as Regex = New Regex( _
"(?# ...
" \d+(?:\.\d*)? (?# ...
" |
(?# ...
" \.\d+
(?#
RegexOptions.IgnorePatternWhitespace)

)" &
)" &
)" &
) ",

_
_
"
_

RegexOptions.Multiline
, 
. 
^ $ 
, .

498

9. .NET

RegexOptions.Singleline
, 
( 146).
, 
.
RegexOptions.ExplicitCapture
(), 
, ()
(?:). 
(?<>).

,
RegexOptions.ExplicitCapture
.
RegexOptions.RightToLeft

( 488).
RegexOptions.Compiled

, .
, 

.
( 
), RegexOptions.Compi
led ,
Regex.
, , 
.
, . 291,
. 
. 513.
RegexOptions.ECMAScript

ECMAScript ( 489). ,
ECMAScript , 
.
RegexOptions.None
;
RegexOptions.
,
OR.

499

Regex
,
.
Regex,
.
RegexObj.IsMatch()
RegexObj.IsMatch(, )


: Boolean

IsMatch , 
, 
. :
Dim R as RegexObj = New Regex("^\s*$")
..
.
If R.IsMatch(Line) Then
' ...
..
.
Endif

IsMatch , 

.
RegexObj.Match()
RegexObj.Match(, )
RegexObj.Match(, , _)


: Match

Match , 
, Match.
Match (
, . .)
.
Match . 507.
Match ,

.
_,

, .
, 
, ^ , a $
. ,
.
Match , 

 .

500

9. .NET

, 
_.

RegexObj,

\d\d

^\d\d

^\d\d$

RegexObj.Match("May 16, 1998")

16

RegexObj.Match("May 16, 1998", 9)

99

RegexObj.Match("May 16, 1998", 9, 2)

99 99 99

RegexObj.Matches()
RegexObj.Matches(, )

:
MatchCollection

Matches Match,
Match, ,
Match, . 
MatchCollection.
, :
Dim R as New Regex("\w+")
Dim Target as String = "a few words"


Dim BunchOfMatches as MatchCollection = R.Matches (Target)
Dim I as Integer
For I = 0 to BunchOfMatches.Count  1
Dim MatchObj as Match = BunchOfMatches.Item(I)
Console.WriteLine("Match: " & MatchObj.Value)
Next

:
Match: a
Match: few
Match: words

, ,
MatchCollection:
Dim MatchObj as Match
For Each MatchObj in R.Matches(Target)
Console.WriteLine("Match: " & MatchObj.Value)
Next

, ,
Match ( Matches):
Dim MatchObj as Match = R.Match(Target)
While MatchObj.Success
Console.WriteLine("Match: " & MatchObj.Value)
MatchObj = MatchObj.NextMatch()
End While

501

RegexObj.Replace(, )
RegexObj.Replace(, , )
RegexObj.Replace(, , , )


: String

Replace 
(, ).
, Regex,
Match .
. ,
MatchEvaluator. 
,
, . :
Dim R_CapWord as New Regex("\b[AZ]\w*")
..
.
Text = R_CapWord.Replace(Text, "<B>$0</B>")

, ,
<B></B>.
Replace , 
( 
). , 
, 1.
, 
, Replace
. 1 
( ,
).
Replace , 

. 
.
, (. . 
)
:
Dim AnyWS as New Regex("\s+")
..
.
Target = AnyWS.Replace (Target, " ")

somerandomspacing 
somerandomspacing. 
, :
Dim AnyWS
as New Regex("\s+")
Dim LeadingWS as New Regex("^\s+")
..
.
Target = AnyWS.Replace(Target, " ", 1. LeadingWS.Match (Target).Length)

502

9. .NET

somerandomspacing so
merandomspacing. , LeadingWS, 
( , ) 
.
Match, LeadingWS.Match(Target):
Length 
, .
Length ; , AnyWS 
.


Regex.Replace Match.Result ,
. 

:

$&

( $0)

$1, $2, ...

, 

${}

$`

, 

$'

$$

$_

$+

(. )

$+
. 
Perl $+, 
,
( . 253). .NET

, .


(485).
$ 
, , 
.

503



. (
).
, .
, 

.
MatchEvaluator
. Match,
, 
, 
.
, 
:
Target = R.Replace(Target, "<<$&>>"))
Function MatchFunc(ByVal M as Match) as String
return M.Result("<<$&>>")
End Function
Dim Evaluator as MatchEvaluator = New MatchEvaluator(AddressOf MatchFunc)
.
.
.
Target = R.Replace(Target, Evaluator)


<<>>. 
, 
. Match
Func :
Function MatchFunc(ByVal M as Match) as String
' $1
'
Dim Celsius as Double = Double.Parse(M.Groups(1).Value)
Dim Fahrenheit as Double = Celsius * 9/5 + 32
Return Fahrenheit & "F" ' "F"
End Function
Dim Evaluator as MatchEvaluator = New MatchEvaluator(AddressOf MatchFunc)
.
.
.
Dim R_Temp as Regex = New Regex("(\d+)C\b", RegexOptions.IgnoreCase)
Target = R_Temp.Replace(Target, Evaluator)

Target Temp is 37C., 


Temp is 98.6F..

504

9. .NET

RegexObj.Split()
RegexObj.Split(, )
RegexObj.Split(, , )


:
String

Split , 
, , (
. :
Dim R as New Regex("\.")
Dim Parts as String() = R.Split("209.204.146.22")

R.Split
(209, 204, 146 22), \. .
Replace ,
(
,
). , Split
, . 
,
:
Dim R as New Regex("\.")
Dim Parts as String() = R.Split("209.204.146.22", 2)

Parts : 209 204.


146.22.
Split , 

. 
( RegexOptions:RightToLeft
).

Split

, ,
( , ,
). :
20061231 04/12/2007
[/]:
Dim R as New Regex("[/]")
Dim Parts as String() = R.Split(MyDate)

( ). 

([/,]) Split : MyDate
20061231, 2006, , 12, 
31. 
($1), .

505


, (
, 
485).
Split 
, 
. .NET : 
 ,
.
, . 
, 
, 
. 
(\s+)?,(\s+)?. Split this,that
: this, , that. 
this,that
, ( 
) ,
, this that. ,
, 
.

(\s*),(\s*) (
).
.
RegexObj.GetGroupNames()
RegexObj.GetGroupNumbers()
RegexObj.GroupNameFromNumber()
Rgbj.GroupNumberFromName()
( 
)
. 
, 
.
.
RegexObj.ToString()
RegexObj.RightToLeft
RegexObj.Options
Regex
.
ToString() , .
RightToLeft , 
, Re

506

9. .NET

gexOptions.RightToLeft. Options Regex


Options, . 
,
:
0 None

16 Singleline

1 IgnoreCase

32 IgnorePatternWhitespace

2 Multiline

64 RightToLeft

4 ExplicitCapture

256 ECMAScript

8 Compiled

128 Microsoft 
.
.



Regex,
R:
' Regex, R
Console.WriteLine("Regex is: " & R.ToString())
Console.WriteLine("Options are: " & R.Options)
If R.RightToLeft
Console.WriteLine("Is RightToLeft: True")
Else
Console.WriteLine("Is RightToLeft: False")
End If
Dim S as String
For Each S in R.GetGroupNames()
Console.WriteLine("Name """ & S & """ is Num #" & _
R.GroupNumberFromName(S))
Next
Console.WriteLine("")
Dim I as Integer
For Each I in R.GetGroupNumbers()
Console.WriteLine("Num #" & I & " is Name """ & _
R.GroupNameFromNumber(I) & """")
Next

Regex, 

New Regex("^(\w+)://[^/]+)(/\S*)")
New Regex("^(?<proto>\w+)://(?<host>[^/]+)(?<page>/\S*)",
RegexOptions.Compiled)

507

:
Regex is: ^(\w+)://([^/]+)(/\S*)
Option are: 0
Is RightToLeft: False
Name "0" is Num #0
Name "1" is Num #1
Name "2" is Num #2
Name "3" is Num #3

Num #0 is Name "0"
Num #1 is Name "1"
Num #2 is Name "2"
Num #3 is Name "3"

Regex is: ^(?<proto>\w+)://(?<host>


Option are: 8
Is RightToLeft: False
Name "0" is Num #0
Name "proto" is Num #1
Name "host" is Num #2
Name "page" is Num #3

Num #0 is Name "0"
Num #1 is Name "proto"
Num #2 is Name "host"
Num #3 is Name "page"

Match
Match Match Regex,
Regex.Match (. ) NextMatch
Match. ,
.
Match:
MatchObj.Success
.
, Match.Empty
( 489).
MatchObj.Value
MatchObj.ToString()
.
MatchObj.Length
.
MatchObj.Index
, ,
. , 
( )
( ) .
, , 
Match, RegexOptions.RightToLeft.
MatchObj.Groups
GroupCollection 
Group. ,

508

9. .NET

Count Item, 
Group. ,
M.Groups(3) Group, 
, M.Groups("HostName")
HostName (,
(?<HostName>) ).
# M.Groups[3] M.Groups["HostName"].
, MatchObj.Gro
ups(0).Value MatchObj.Value.
MatchObj.NextMatch()
NextMatch() 
Match.
MatchObj.Result()
, 
, . 502. 
:
Dim M as Match = Regex.Match(SomeString, "\w+")
Console.WriteLine(M.Result("The first word is '$&'"))


:
M.Result("$`") '
M.Result("$'") '


M.Result("[$`<$&>$']"))

Match, \d+
May 16, 1998, May <16>, 1998, 
.
MatchObj.Synchronized()
Match, , 
.
MatchObj.Captures
Captures ,
. 516.

Group
Group 
(
). Group.

509

GroupObj.Success
.
; ,
(this)|(that) 
. . 181.
GroupObj.Value
GroupObj.ToString()
, . 
.
GroupObj.Length
, .
.
GroupObj.Index
, , 
. ,
( )
( ) .
, 
, Match, RegexOpti
ons.RightToLeft.
GroupObj.Captures
Group Captures, . 516.



. 490, 
Regex.
:
Regex.IsMatch(, )
Regex.IsMatch(, , )
Regex.Match(, )
Regex.Match(, , )
Regex.Matches(, )
Regex.Matches(, , )
Regex.Replace(, , )
Regex.Replace(, , , )
Regex.Split(, )
Regex.Split(, , )


Regex , . 

510

9. .NET

Regex, 
( ,
, ).
, :
If Regex.IsMatch(Line, "^\s*$")
.
.
.

Dim TemporaryRegex = New Regex("^\s*$")


If TemporaryRegex.IsMatch(Line)
.
.
.

, :
If New Regex("^\s*$").IsMatch(Line)
.
.
.

, 

.  
( 127). 

( ).

, 
.
(,
) 
( 296).

Regex 
. ,
, .NET ,
: 
 .


Regex 
, ,
. 
, , .NET,
. 
Regex,
, 
. 

511

,
.
, .NET 
, 
. 
6 ( 299), ,

, Regex
,
Regex.
15
. 15 , 
, , , 


Regex .
(15 ) 
,
:
Regex.CacheSize = 123

, 
.


, , 
:
Regex.Escape()
Regex.Escape() 
.

.
, SearchTerm ,
. 
:
Dim UserRegex as Regex = New Regex("^" & Regex.Escape(SearchTerm) & "$", _
RegexOptions.IgnoreCase)

, 
, 
. Escape 
SearchTerm, :), 
ArgumentException ( 496).

512

9. .NET

Regex.Unescape()
, 
, 
\ . , \:\\)
:).
Unescape 
. \n, 

. \u1234, 
. 
, Unescape,
. 483.
Match.Empty
Match, 
. , 
Match, 
( ,
). :
Dim SubMatch as Match = Match.Empty ' ,
'
..
.
Dim Line as String
For Each Line in EmailHeaderLines
' , ...
Dim ThisMatch as Match = Regex.Match(Line, "^Subject:\s*(.*)", _
RegexOptions.IgnoreCase)
If ThisMatch.Success
SubMatch = ThisMatch
End If
..
.
Next
..
.
If SubMatch.Success
Console.WriteLine(SubMatch.Result("The subject is: $1"))
Else
Console.WriteLine("No subject!")
End If

EmailHeaderLines 
( Subject), 
SubMatch , , 
, SubMatch
. 
Match.Empty 
.

.NET

513

Regex.CompileToAssembly()
, Regex, 
.

.NET

.NET: 
, 
, .NET, Capture.


.NET Regex . 

. , , ,
.
bin 
JfriedlsRegexLibrary.DLL.
.
Visual Studio .NET ,
Project > Add Reference.
,

Imports jfriedl

, 
:
Dim FieldRegex as CSV.GetField = New CSV.GetField '
Regex
.
.
.
Dim FieldMatch as Match = FieldReges.Match(Line) '
'
While FieldMatch.Success
Dim Field as String
If FieldMatch.Groups(l) Success
Field = FieldMatch.Groups("QuotedField").Value
Field = Regex.Replace(Field, """""", """") '
Else
Field = FieldMatch.Groups("UnquotedField").Value
End If
Console.WriteLine("[" & Field & "]" )
' Field
FieldMatch = FieldMatch.NextMatch
End While

514

9. .NET


jfriedl, 
jfriedl.CSV. Regex 
:
Dim FieldRegex as GetField = New GetField ' Regex

.

:
Dim FieldRegex as jfriedl.CSV.GetField = New jfriedl.CSV.GetField

, 
. ,
.



. ,
(DLL) Regex:
jfriedl.Mail.Subject, jfriedl.Mail.From jfriedl.CSV.GetField.
. 
, 
.
: RegexOptions.Compiled ,
.
Option Explicit On
Option Strict On
Imports System.Text.RegularExpressions
Imports System.Reflection
Module BuildMyLibrary
Sub Main()
' RegexCompilationInfo
' , ,
' . ,
' , ,
' "jfriedl.Mail.Subject".
Dim RCInfo() as RegexCompilatlonInfo = {
New RegexCompilationlnfo(
"^Subject:\s*(.*)", RegexOptions.IgnoreCase,
"Subject", "jfriedl.Mail", true),
New RegexCompilationInfo(
"^From:\s*(.*)", RegexOptions.IgnoreCase,
"From", "jfriedl.Mail", true),
New RegexCompilationInfo(

_
_
_
_
_
_
_
_

515

.NET

"\G(?:^|,)
" &
"(?:
" &
" (?# ... )
" &
" "" (?# )
" &
" (?<QuotedField> (?> [^""]+ | """" )* )
" &
" "" (?# )
" &
" (?# ...... )
" &
" |
" &
" (?# ...
... )
" &
" (?<UnquotedField> [^"",]* )
" &
")",
RegexOptions.IgnorePatternWhitespace,
"GetField", "jfriedl.CSV", true)
'
Dim AN as AssemblyName = new AssemblyName()
AN.Name = "JfriedlsRegexLibrary"
' DLL
AN.Version = New Version("l.0.0.0")
Regex.CompileToAssembly(RCInfo, AN) '
End Sub
End Module

_
_
_
_
_
_
_
_
_
_
_
_
_


Microsoft 

( , 
). 
, .
,
:
Dim R As Regex = New Regex(" \(
" (?>
"
[^()]+
"
|
"
\( (?<DEPTH>)
"
|
"
\) (?<DEPTH>)
" )*
" (?(DEPTH)(?!))
" \)
RegexOptions.IgnorePatternWhitespace)

" &
" &
" &
" &
" &
" &
" &
" &
" &
",

_
_
_
_
_
_
_
_
_
_


(, before (nope (yes
(here) okay) after. ,
.

516

9. .NET

:
1. ( (?<DEPTH>) 
1 (
\( 
).
2. ) (?<DEPTH>)
1 .
3. (?(DEPTH)(?!)) , 
, 
\).

. (?<DEPTH>)

() , .
\(, 
( ) 
.
, DEPTH, 
, . ,
,
. 
.NET (?<DEPTH>), 
DEPTH. 
, (?<DEPTH>) , 
.
, (?(DEPTH)(?!)) , (?!)
DEPTH.
, 
,
(?<DEPTH>).
( ),
(?!) (
,
).
, 
.NET.

Capture
.NET
Capture, . , 
,
Capture .
Capture Group;
,

517

.NET

. Group, Value (
), Length ( ) Index ( 
).
Group Capture ,
Group Captures, 

, , .
^(..)+ abc
defghijk:
Dim M as Match = Regex.Match("abcdefghijk", "^(..)+")

(..), . .
: abcdefghijk.
, +
, 
ij (. . M.Groups(1).Value ij). , M.Groups(1)
Captures 
ab, cd, ef, gh ij, :
M.Groups(1).Captures(0).Value
M.Groups(1).Captures(1).Value
M.Groups(1).Captures(2).Value
M.Groups(1).Captures(3).Value
M.Groups(1).Captures(4).Value
M.Groups(1).Captures.Count is

is
is
is
is
is
5

'ab'
'cd'
'ef'
'gh'
'ij'

: ij,
, M.Groups(1).Value. , Group.Value

. M.Groups(1).Value 
:
.Groups(1).Captures( M.Groups(1).Captures.Count  1 ).Value


Capture:
M.Groups(1).Captures CaptureCollection
Items Count.
Items ,
M.Groups(1).Captures(3) (M.Groups[1].Captures[3]
#).
Capture Success;
Success Group.
, Capture 
Group. Match Cap
tures, . .Captures 
Capture (
, .Captures M.Group(0).Captures). 

518

9. .NET

,
, Cap
ture. M.Captures M.Group(0).Captures
Group, 
.
Capture ,
, ,  
. 
.NET , ,
. , 
. 
Capture, , , ,
.
,
, 
; 
, .
Capture 
, , , Group Capture ( 
GroupCollection CaptureCollection)
Match. ,
, . ,
Cap
ture .

10
PHP
90 , ,
Web boom, PHP 
, .
, 

. , 
, PHP
, , , 
. , PHP
,
,
.
PHP preg, ereg mb_ereg.
preg.
, 
. (
preg .)

,
,
16. , ,
PHP, , 
(
) . 1, 2 3
, ,
, 4, 5 6
, 
preg

520

10. PHP

PHP. ,
, 
, , ,
.
, , ,
. 522 . 149 160 3, 
.
.

preg, 
, .
preg, 
, ,
.
preg
preg preg,
, ,
: Perl Regular Expressions ( Perl).
(Andrei Zmievski), 
,
ereg. ( ereg extended re
gular expressions, . . PO
SIX , 
, 
.)
preg,
PCRE (Perl Compatible Regular Expressions 
, Perl) 
,

Perl , .
PCRE, 
Perl, , 
PHP. 
, , , ,
, .
Perl, 
,
,
.
, (Philip Hazel) 
, 
Perl, PCRE
( 3 . 123).

PHP

521

, ,
. ,
, 
,
PHP.
Perl 
PCRE, PHP.
PHP 4.4.3 5.1.4.
PCRE 6.6.1
, PHP, 
, 4.x 5.x,
PHP 5.x . 
PHP
, , PHP
5.x PCRE, 
PHP 4.x.

PHP
. 10.1 
preg. . 10.1:
\b backspace ()
; ( 174).
, 8 ,
.

\0, , NUL.
\x
. \x{} 
.
\x{FF}
u ( 528).
\x{FF} .
UTF8 (
u) , \w,
ASCII. 
\w
\pL ( 159), \d 
\pN, \s \pZ.
1

PHP PCRE,
, ,
PHP 4.4.3 5.1.4 (
). 
.

522

10. PHP

10.1. preg PHP



151 (C) \ [\b] \e \f \n \r \t \ \x \x{} \c

155

: [] [^] (
POSIX [:alpha:] 166)

156

, : ( 
s )

157 (U) : \
158 (C) : \w \d \s \W \D \S ( 8
)
159 (C) ,
(U) \P{}
157

\p{},

( 
): \C


169

/ : ^ \A

169

/ : $ \Z \z

171

: \G

174

: \b \B ( 8 )

175

: (?=) (?!) (?<=) (?<!)


527

: (?(). :
s m i X U

527

: (?(:)

177

: (?#) ( x, #
)

, ,
178

: () \1 \2 ...

180

: (?P<>) (?P=)

178

: (?:)

180

: (?>)

181

: |

563

: (?R) (?) (?P>)

182

: (?if then|else) if 
, (R) ()

183

: * + ? {n} {n,} {min, max}

184

: *? +? ?? {n}? {n,}? {min, max}?

PHP

523


184

: *+ ++ ?+ {n}+ {n,}+ {min, max}+


177 (C) : \Q \E
(C)
(U) u 528

( PCRE, 
preg 123.)

PHP 4.1.0.

Is In, : \p{Cyrillic} ( 159).

, \p{Lu}, \p{L} \pL ( 159). 
\p{Letter} .
preg 
, \C
, (?s:.) s
.. , u, 
preg 
UTF8, . . ,
6 . \C 
.
. 157.
\z \Z
, \Z 
.
$ m D
( 527): \$ \Z (

, ); 
m 
; D \z (
). 
, m D, D .
, 
, 

( 174).
x ( 
) ASCII. 
.

524

10. PHP

preg
PHP 
( 127), 
, . 10.2.
,
,
.
10.2. PHP
Preg

531 preg_match

536 preg_match_all

542 preg_replace

548 preg_replace_callback

551 preg_split

556 preg_grep

, 
/ 

557 preg_quote

;
.
538 reg_match

preg_match,
,

558 preg_regex_to_pattern

preg 

562 preg_pattern_error

preg

562 preg_regex_error


, , 
. 
, ,
,
PHP:
/* , HTML <table> */
if (preg_match('/^<table\b/i', $tag))

preg

525

print "tag is a table tag\n";


/* , */
if (preg_match('/^?\d+$/', $user_input))
print "user input is an integer\n";
/* HTML <title> */
if (preg_match('{<title>(.*?)</title>}si', $html, $matches))
print "page title: $matches[1]\n";
/* ,
*/
$metric = preg_replace('/(?\d+(?:\.\d+)?)/e',
/* */
'floor(($132)*5/9 + 0.5)', /* , */
$string);
/* , CSV */
$values_array = preg_split('!\s*,\s*!', $comma_separated_values);

, Lar
ry,Curly,Moe, : Lar
ry, Curly Moe.


preg
, ,
, 
. '/<table\b/i',
<table\b,
(), 
i ( ).

PHP

, 
PHP . 
3 ( 137), 
, 


. PHP
, , 
. \' \\,
' \ .
\\
,
. 

526

10. PHP

\ \\, ,

\\, \\\\. , 
!
( 
. 560.)
,

Windows, C:\. 
: ^[AZ]:\\$,
: '^[AZ]:\\\\$'.
. 240 ( 5) ,
^.*\\ '/^.*\\\/'
. , ,
:
print
print
print
print

'/^.*\/';
'/^.*\\/';
'/^.*\\\/';
'/^.*\\\\/';

:
:
:
:

/^.*\/
/^.*\/
/^.*\\/
/^.*\\/

,
. \/, 
, , 
, 
. 
\\
\. ,
,
\/, 
. , , 
.
, PHP 
, ,

, .

preg , 
, 
Perl ,
. 
,
 ,
, , . (
. 530.)
, 
 ,

preg

527

ASCII .

! #.

:
{ ( < [


:
} ) > ]


, . .
: '((\d+))'.
, 
, 
.
: '/(\d+)/'.
, , 
. ,
'/<B>(.*?)<\/B>/i' , 
'!<B>(.*?)</B>!i', 
!!, '{<B>(.*?)</B>}i',
{}.


, 
, 

( PHP ).

i, .
:
!

i
m
s
x

145

147

146

146

(?i)
(?m)
(?s)
(?x)

528 

u
X

(?X)

528
PCRE

528

10. PHP
!

543 
PHP ( preg_replace)

567
(study)

(?U)

528 
* *?,

528 

528 $ (EOS),
. (
m.)




(, (?i)
, (?i) 176).

, , 
.
(
( 177), (?i:)
, (?sm:) 
s m .


, : si

:
if (preg_match('{<title>(.*?)</title>}si', $html, $captures))

, PHP

3 ( 145). e
preg_replace
( 543).
u 
, 
UTF8.

preg

529

, 
. (. . u), preg 
8
( 119). u,
, 
UTF8,
. UTF8,
ASCII, ,
u , 
.
X
PCRE, :

, .
, \k 
PCRE, k (
, 
). X
unrecognized character follows \ (
\).
, PHP
PCRE,
. 
( ) 
, 
. X 
, 
.
S study PCRE,
,

.
,
, . 566.

:

A ,
,
\G.
4, 
.

D , $
\z ( 147), . . $ 

.

530

10. PHP

U 
: * *? , + 
+? , . . ,
, , 
.


, , 
, un
known modifier ( ). 
,

.
, HTML :
preg_match('<(\w+)([^>]*)>', $html)

, < 
, preg_match 
, 
( , ). 
< (\w+)([^ >]*)>, 
,
, , 
.
(\w+)([^
,
, 
]*)> 
. , 
,
:
Warning: Unknown modifier ']'

, , 
:
preg_match('/<(\w+)(.*?)>/', $html)


PHP,
, ,
, . ,
, ; ,
, .

531

preg

, PHP 5
:
Warning: preg_match(): Unknown modifier ']'


,
.
,  , 
, (
. :
preg_match('<(\w+)(.*?)>', $html)

,
(\w+)(.*?) 
. , ,
, .
.

preg

, , preg_match,
:
?.

preg_match

preg_match(, [, [, [, ]]])

, 
,
( 525).

, .

,
.

,
. 
PREG_OFFSET_CAPTURE ( 535).

, 
.
( 536).

532

10. PHP


true, 
false.

,
preg_match($pattern, $subject)

true, $pattern
$subject.
:
if (preg_match('/\.(jpe?g|png|gif|bmp)$/i', $url)) {
/* URL */
}
if (preg_match('{^https?://}', $uri)) {
/* http https */
}
if (preg_match('/\b MSIE \b/x', $_SERVER['HTTP_USER_AGENT'])) {
/* IE */
}


preg_match 
, . 
,
$matches. , $matches 
, 
,
preg_match.
preg_match true, 
$matches :
$matches[0]
$matches[1] ,
$matches[2] ,
.
.
.


( 
).
, 5
( 241):
/* */
if (preg_match('{ / ([^/]+) $}x', $WholePath, $matches))
$FileName = $matches[1];

533

preg

$matches ( ,
)
, preg_match true. false 
(, 
). 
$matches , 
, , 
,
$matches preg_match.
, 
:
/* , URL */
if (preg_match('{^(https?):// ([^/:]+) (?: :(\d+) )? }x', $url, $matches))
{
$proto = $matches[1];
$host = $matches[2];
$port = $matches[3] ? $matches[3] : ($proto == "http" ? 80 : 443);
print "Protocol: $proto\n";
print "Host : $host\n";
print "Port : $port\n";
}

,

, ,

$matches.1 , (
, ,
$matches. 
, (\d+) ,
$matches[3] , $match
es[3] .



( 180). 
, :
/* , URL */
if (preg_match('{^(?P<proto> https? ) ://
(?P<host> [^/:]+ )
(?: : (?P<port> \d+
) )? }x', $url, $matches))
1

, 
NULL,
. 538.

534

10. PHP
{
$proto = $matches['proto'];
$host = $matches['host'];
$port = $matches['port'] ? $matches['port'] : ($proto== "http" ? 80 : 443);
print "Protocol: $proto\n";
print "Host
: $host\n";
print "Port
: $port\n";
}

, ,
$matches 
. , 
, , $match
es. ,
:
/* , URL */
if (preg_match('{^(?P<proto> https? )://
(?P<host> [^/:]+ )
(?: : (?P<port> \d+
) )? }x', $url, $UrlInfo))
{
if (! $UrlInfo['port'])
$UrlInfo['port'] = ($UrlInfo['proto'] == "http" ? 80 : 443);
echo "Protocol: ", $UrlInfo['proto'], "\n";
echo "Host
: ", $UrlInfo['host'], "\n";
echo "Port
: ", $UrlInfo['port'], "\n";
}

$matches 
, . ,
$url, http://re
gex.info/, $UrlInfo :
array
(
0
'proto'
1
'host'
2
)

=>
=>
=>
=>
=>

'http://regex.info',
'http',
'http',
'regex.info',
'regex.info'

,
,
. 

$matches, 
$matches[0], .
: 3 port 
,

535

preg

(
533).
, 
, : (?P<2>).
PHP 4 PHP 5  
, , .

.

:
PREG_OFFSET_CAPTURE
preg_match ,
PREG_OFFSET_CAPTURE ( 
preg_match), $matches
, 
. 
, ,
( 1, 
).
, .
preg_match $offset 
,  ,
.
, 
u ( 528).
HREF
<a>. HTML ,
 .
,
, :
preg_match('/href \s*=\s* (?: "([^"]*)" ; \'([^\']*)\' ; ([^\s\'">]+) )/ix',
$tag,
$matches,
PREG_OFFSET_CAPTURE);

, $tag
<a name=bloglink href='http://regex.info/blog/' rel="nofollow">

$matches :
array
(
/* */
0 => array ( 0 => "href='http://regex.info/blog/'",
1 => 17 ),
/* */
1 => array ( 0 => "",

536

10. PHP
1 => 1 ),
/* */
2 => array ( 0 => "http://regex.info/blog/",
1 => 23 )
)

$matches[0][0] 
, $matches[0][1] 
.
,
$matches[0][0], :
substr($tag, $matches[0][1], strlen($matches[0][0]));

$matches[1][1] 1, , 
.
, , 
( 533), ,
, $matches.


preg_match , 

(, , 
).
(. . ).
: 
, u. 
( 
) , 
.
, ^ 
, 
, 
. , ,
, .

preg_match_all

preg_match_all(, , [, [, ]])

, 
,
( 525).

, .

537

preg

,
.

,
:
PREG_OFFSET_CAPTURE ( 540).

/ :
PREG_PATTERN_ORDER ( 538)
PREG_SET_ORDER ( 539)

, 
. .
( , preg_match
536.)


preg_match_all .

preg_match_all preg_match, 
, 
, .
, 

,
.
:
if (preg_match_all('/<title>/i', $html, $all_matches) > 1)
print "whoa, document has more than one <title>!\n";

(, 
) preg_match_all ,
preg_match.
preg_match_all , ,
.


preg_match , 
. preg_match
, 
$matches . preg_match_all 
,
, $matches,
. , 
preg_match_all
$all_matches, $matches,
preg_match.

538

10. PHP


preg_match $matches
,
( ,
,
$matches). 
, ,
, 
, NULL.
preg_match (
reg_match), PREG_OFFSET_CAPTURE 

, 
NULL
$matches:
function reg_match($regex, $subject, &$matches, $offset = 0)
{
$result = preg_match($regex, $subject, $matches,
PREG_OFFSET_CAPTURE, $offset);
if ($result) {
$f = create_function('&$X', '$X = $X[1] < 0 ? NULL : $X[0];');
array_walk($matches, $f);
}
return $result;
}

, reg_match,
, preg_match ,
, , (
, , 
NULL.

preg_match_all
$all_matches, 
: PREG_PATTERN_ORDER PREG_SET_ORDER.
PREG_PATTERN_ORDER
,
PREG_PATTERN_ORDER (
). , 
,
, :
$subject = "
Jack A. Smith

539

preg
Mary B. Miller";
/*
PREG_PATTERN_ORDER */
preg_match_all('/^(\w+) (\w\.) (\w+)$/m', $subject, $all_matches);

$all_matches :
array
(
/* $all_matches[0]  */
0 => array ( 0 => "Jack A. Smith", /* */
1 => "Mary B. Miller" /* */ ),
/* $all_matches[1]  , */
1 => array ( 0 => "Jack", /* */
1 => "Mary" /* */ ),
/* $all_matches[2]  , */
2 => array ( 0 => "A.", /* */
1 => "B."
/+ */ ),
/* $all_matches[3]  , */
3 => array ( 0 => "Smith", /* */
1 => "Miller" /* */ )
)

, 

. 
,
($all_matches[0]),

($all_matches[1]) . .

$all_matches, PREG_SET_ORDER
.
PREG_SET_ORDER
PREG_SET_ORDER 
. , 
, $all_matches[0],
, , $all_matches[1],
. . ,
preg_match,
$matches $all_matches.
, 
PREG_SET_ORDER:
$subject = "
Jack A. Smith

540

10. PHP
Mary B. Miller";
preg_match_all('/^(\w+) (\w\.) (\w+)$/m', $subject,
$all_matches, PREG_SET_ORDER);

$all_matches :
array
(
/* $all_matches[0]  , $matches,
preg_match */
0 => array ( 0 => "Jack A. Smith", /* */
1 => "Jack", /* */
2 => "A.",
/* */
3 => "Smith" /* */,
/* $all_matches[1]  , $matches,
preg_match */
1 => array ( 0 => "Mary B. Miller", /* */
1 => "Mary", /* */
2 => "B.",
/* */
3 => "Miller" /* */,
)

PREG_PATTERN_ORDER


.
$all_matches[__][_]

PREG_SET_ORDER

.
$all_matches[_][__]

preg_match_all PREG_OFFSET_CAPTURE
preg_match_all, preg_match,
PREG_OFFSET_CAPTURE, 
(leaf element) $all_matches
( ), . . $all_matches
( ).
: PREG_OFFSET_CAPTURE PREG_SET_
ORDER, :
preg_match_all($pattern, $subject, $all_matches,
PREG_OFFSET_CAPTURE | PREG_SET_ORDER);

preg_match_all
$all_matches 
, 
( preg_match
533).

541

preg
$subject = "
Jack A. Smith
Mary B. Miller";
/*
PREG_PATTERN_ORDER */
preg_match_all('/^(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+)$/m',
$subject, $all_matches);

$all_matches :
array
(
0

=> array ( 0 => "Jack A. Smith", 1 => "Mary B. Miller" ),

"Given" => array ( 0 => "Jack",


1
=> array ( 0 => "Jack",

1 => "Mary" ),
1 => "Mary" ),

"Middle" => array ( 0 => "A.",


2
=> array ( 0 => "A.",

1 => "B." ),
1 => "B." ),

"Family" => array ( 0 => "Smith",


3
=> array ( 0 => "Smith",

1 => "Miller" ),
1 => "Miller" )

PREG_SET_ORDER:
$subject = "
Jack A. Smith
Mary B. Miller";
preg_match_all('/^(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+)$/m',
$subject, $all_matches, PREG_SET_ORDER);

$all_matches :
array
(
0 => array ( 0

=> "Jack A. Smith",

Given => "Jack",


1
=> "Jack",
Middle => "A.",
2
=> "A.",
Family => "Smith",
3
=> "Smith" ),
1 => array ( 0

=> "Mary B. Miller",

Given => "Mary",


1
=> "Mary",
Middle => "B.",
2
=> "B.",
Family => "Miller",
3
=> "Miller" )
)

542

10. PHP

, 

, . . , 
.  
, , .

preg_replace

preg_replace(, , [, [, ]])


, 
, .
,
.

, , 

. e 
( )
PHP (543).

, . 
(
).
, 
( 544).
,
( PHP 5 544).


, 
( , 
). , 
(
, ).

PHP
. ,
, str_rep
lace str_ireplace, 
preg_replace.

HTML. 
:
? (

preg

543

) , 
,

, 
?1 ,
:
$card_number = preg_replace('/\D+/', '', $card_number);
/* $card_number
*/

. 
, preg_replace
$card_number,
( ), ,
, $card_number.

preg_replace
,
(, ) 
, . ,
, preg_repla
ce ,
, ,
,
.
$0 
, $1
, $2 . . : 
$ ,
, ,

preg_replace. ,
, , : ${0} ${1},
,
.
, ,
, HTML
<b></b>:
$html = preg_replace('/\b[AZ]{2,}\b/', '<b>$0</b>', $html);

e (
preg_replace),
1

, , 
No Dashes Or Spaces .
: http://www.unixwiz.net/ndos(shame.html.

544

10. PHP

PHP,
, 
. 
, , <b></b>, 
:
$html = preg_replace('/\b[AZ]{2,}\b/e',
'strtolower("<b>$0</b>")',
$html);

, HEY,
$0. 
: strtolower("<b>HEY</b>"),
PHP,
<b>hey</b> .
e
:
. 

PHP.
e 
, 
, 
.
PHP
htmlspecialchars():
$replacement = array ('&'
'<'
'>'
'"'

=>
=>
=>
=>

'&amp;',
'&lt;',
'&gt;',
'&quot;');

$new_subject = pregRreplace('/[&<">]/eS', '$replacement["$0"]', $subject);

,
$replacement PHP
, preg_rep
lace . 
, PHP ,
preg_replace.
S 
( 567).
preg_replace (
, (
, 
). 1, 
.
( 
PHP 4), ,

545

preg

preg_replace .
, ,
, 
(
.

,
,
,
. 
.
, .
,
,
.
, .

, 


, ,

, , 
,
, .
:
,
. 
, , 
, .
preg_replace, 
.
PHP
htmlspecialchars(), 
HTML:
$cooked = preg_replace(
/* */ array('/&/', '/</', '/>/', '/"/' ),
/*
*/ array('&amp;', '&lt;', '&gt;', '&quot;'),
/* */ $text
);

546

10. PHP

$text :
AT&T > "baby Bells"

$cooked :
AT&amp;T &gt; &quot;baby Bells&quot;

,  .
( 
):
$patterns
= array('/&/', '/</', '/>/', '/"/' );
$replacements = array('&amp;', '&lt;', '&gt;', '&quot;');
$cooked = preg_replace($patterns, $replacements, $text);

, preg_replace
(

), 
. , , . 

, PHP,
.
,
:
$result_array = preg_replace($regex_array, $replace_array, $subject_array);

:
$result_array = array();
foreach ($subject_array as $subject)
{
reset($regex_array); // ,
reset($replace_array); //
while (list(,$regex) = each($regex_array))
{
list(,$replacement) = each($replace_array);
// ,
//
$subject = preg_replace($regex, $replacement, $subject);
}
//
//
$result_array[] = $subject; // ... .
}


, , ,


preg

547

, . . ,
(,
, , (
, . .). , , 
array(), , ,
, :
$subject = "this has 7 words and 31 letters";
$result = preg_replace(array('/[az]+/', '/\d+/'),
array('word<$0>', 'num<$0>'),
$subject);
print "result: $result\n";

[az]+ words<$0>,
\d+ num<$0>, :
result: word<this> word<has> num<7> word<words> word<and> num<31>
word<letters>

, 
, 

(. . , ).
, preg_replace
,
each, 
, .

,
ksort(),
.
,
, ,
.

, , 
. , , 
(
) ? , 
?
$subject = "this has 7 words and 31 letters";
$result = preg_replace(array('/\d+/', '/[az]+/'),
array('num<\0>', 'word<\0>'),
$subject);
print "result: $result\n";

, .

548

10. PHP

preg_replace_callback

preg_replace_callback(, , [, [, ]])

, 
,
( 525). , 
.

,
,
.

, . 
(
).

, 
( 544).
, 
( PHP 5.1.0
544).

, 
( , 
). , 
(
, ).

preg_replace_callback preg_re
place, , ( ) 
preg_replace_callback .
preg_replace, e
( 543), (
, 
).

PHP, , 
, 

. preg_replace_callback 

, 
( $matches).
, .

549

preg

. 547.
. 547 ( ,
):
result: word<this> word<has> word<num><7> word<words>
word<and> word<num><31> word<letters>

, ,
, , preg_replace
(. ., (
) 
, .
/
num<>,
num , num
word<num>, 
.
,

preg_replace.

.
: , 
. :
PHP create_function. 
. 
, 
,  
,
( ).
. 544, 
preg_replace_callback . 
:
$replacement = array ('&' => '&amp;',
'<' => '&lt;',
'>' => '&gt;',
'"' => '&quot;');
/*
* $matches ,
* $matches[0]  ,
* HTML.
* HTML.
* ,

550

10. PHP
*
* .
*/
function text2html_callback($matches)
{
global $replacement;
return $replacement[$matches[0]];
}
$new_subject = preg_replace_callback('/[&<">]/S', /* */
"text2html_callback", /* */
$subject);

$subject
"AT&T" sounds like "ATNT"

$new_subject :
&quot;AT&amp;T&quot; sounds like &quot;ATNT&quot;

text2html_callback PHP,

preg_replace_callback, text2html_
callback $matches (
, , 
$matches).
, 
( PHP
create_function). ,
$replacement , .
,
, 
preg_replace_callback:
$new_subject = preg_replace_callback('/[&<">]/S',
create_function('$matches',
'global $replacement;
return $replacement[$matches[0]];'),
$subject);

e

e preg_replace, preg_replace_callback.
, , 
e ,
, 
PHP. ,
preg_replace_callback (
).

551

preg

preg_split

preg_split(, [, , [ ]])


, 
,
( 525).

, .
, 
, .

,
.
:

PREG_SPLIT_NO_EMPTY
PREG_SPLIT_DELIM_CAPTURE
PREG_SPLIT_OFFSET_CAPTURE

. 554. 

( . 539).

.

preg_split
. (
(
).
, .
preg_split 
preg_match_all, preg_split , 
. ,
preg_split , ,
,
. preg_split
( 
) explode.

, , 
.
explode:
$tickers = explode(' ', $input);

552

10. PHP

, 

. preg_split
\s+ :
$tickers = preg_split('/\s+/', $input);

, ,
: , 
(
), , YHOO, MSFT, GOOG. 
:
$tickers = preg_split('/[\s,]+/', $input);

$tickers
: YHOO, MSFT GOOG.
, , 
( Web 2.0), 
\s*,\s*,
:
$tags = preg_split('/\s*,\s*/', $input);

\s*,\s*
[\s,]+ .
( ),
, .
$input, 123,,,456,
( ),
: 123,
456.
, [\s,]+ 
, .
123,,,456
, :
123 456.


preg_split,
, .
, 
,
.
HTTP
.

\r\n\r\n, ,
\n\n. , preg_split 

preg

553

. ,
$response,
$parts = preg_split('/\r? \n \r? \n/x', $response, 2);

$parts[0], parts[1].
( S 
567.)
, 2, ,
, .
, ( , 
) . 
(. . ),
,
.
( 1 (
, ) preg_split 
, , 
. 
,
, , 
( 
PREG_SPLIT_DELIM_CAPTURE ,
).
, 
. : 
, 
. (
) (). 
, ,
2.
, , 
, ,
preg_split. , ,
$data , , 
\s*,\s* ( , , . .),
.
3, 
, :
$fields = preg_split('/ \s*,\s*/x', $data, 3);


, ar
ray_pop .
preg_split 
( )

554

10. PHP

,
1, . C ,
1 
. 0 
, 1, 
.

preg_split
preg_split , 
. 
( 
. 538).
PREG_SPLIT_OFFSET_CAPTURE
PREG_OFFSET_CAPTURE, 
preg_match preg_match_all,
, 
.
PREG_SPLIT_NO_EMPTY
preg_split , 

.
,
.
Web 2.0 ( 552),
$input party,,fun,

$tags = preg_split('/ \s* , \s*/x', $input);

$tags : party,
fun. 
.
PREG_SPLIT_NO_
EMPTY,
$tags = preg_split('/ \s* , \s*/x', $input, 1, PREG_SPLIT_NO_EMPTY);

party
fun.
PREG_SPLIT_DELIM_CAPTURE

,
, . :
, 

preg

555

, and or , 
:
DLSR camera and Nikon D200 or Canon EOS 30D

PREG_SPLIT_DELIM_CAPTURE
$parts = preg_split('/ \s+ (and|or) \s+ /x', $input);

$parts :
array ('DLSR camera', 'Nikon D200', 'Canon EOS 30D')

, , . 
PREG_SPLIT_DELIM_CAPTURE ( 1 (
):
$parts = preg_split('/ \s+ (and;or) \s+ /x', $input, 1,
PREG_SPLIT_DELIM_CAPTURE);

$parts ,
:
array ('DLSR camera', 'and', 'Nikon D200', 'or', 'Canon EOS 30D')


, 
. $parts
and or.
,
(, '/\s+
(?:and|or)\s+/') PREG_SPLIT_DELIM_CAPTURE 
,
.
, 
. 552:
$tickers = preg_split('/[\s,]+/', $input);

PREG_SPLIT_
DELIM_CAPTURE,
$tickers = preg_split('/([\s,]+)/', $input, 1, PREG_SPLIT_DELIM_CAPTURE);

$input , 
$tickers. 
$tickers , 
([\s,]+). 
, , ,
, 

.
, 
, 

556

10. PHP

PREG_SPLIT_DELIM_CAPTURE. ,

( 
, 
).
, ,

. , ,
( . 533),
( ) 
. , 
, 
, . :
PREG_SPLIT_NO_EMPTY , . . 
preg_split , 
.

preg_grep

preg_grep(, _ [, ])


, 
, .
_
, 
, .

PREG_GREP_INVERT .


, _, 
( , , ,
PREG_GREP_INVERT).

preg_grep (
_, ,
( , PREG_
GREP_INVERT) . , 
, .
:
preg_grep('/\s/', $input);


$input, . 
:

preg

557

preg_grep('/\s/', $input, PREG_GREP_INVERT);

$in
put, . :
:
preg_grep('/^\S+$/', $input);

,
( ).

preg_quote

preg_quote( [, ])


, 
( 525).
, 

.


preg_quote , 

. , 
.

, 
,
preg_quote 
, .
preg_quote ,

, .
preg_quote ,
, 
:
/*
$MailSubject */
$pattern = '/^Subject:\s+(Re:\s*)*' . preg_quote($MailSubject, '/') . '/mi';

, $MailSubject :
**Super Deal** (Act Now!)

$pattern :
/^Subject:\s+(Re:\s*)*\*\*Super Deal\*\* \(Act Now\!\)/mi

558

10. PHP

preg.
{, 
(. . }),
.
, # , 

x.
preg_quote 

.
, 
,
preg.
.

preg
PHP preg
, 
.
preg_match ( 538).
, 
, , 

, 
(,
, ). 
, 

.
, 
. 
.
,
,
, : http://regex.info/.

preg_regex_to_pattern

(, 
),
preg, , 
.

preg

559



. , ,
[az]+ /[az]+/, 
.

, . 
, , 
^http://([^/:]+),
/^http://([^/:]+)/,
Unknown modifier /.
. 530,
, 
, ( 
, ) 
.

,
. 
, 
.
.
{} . 524, 532 532 (
).
( ) 
, ,

, . ,
,
:
,
.
,
, 
. , 
, 
.
, 

, ,
preg. (
PHP
)

560

10. PHP

; . (
, 
. 525.)
/*
* (, ,
* ), ,
* preg.
* ,
* .
*/
function preg_regex_to_pattern($raw_regex, $modifiers = "")
{
/*
* ,
* (
* ) .
*
* ,
* ,
* .
*
*
* ,
* . ,
* '\/',
* '\\/',
* : '/\\//'.
*
*
* : ,
* ( ) .
* ,
* .
*/
if (! preg_match('{\\\\(?:/|$)}', $raw_regex)) /* '\',
'/' */
{
/* ,
* ,
* */
$cooked = preg_replace('!/!', '\/', $raw)regex);
}
else
{
/* $raw_regex.
* , ,
* . */
$pattern = '{ [^\\\\/]+ | \\\\. | ( / | \\\\$ ) }sx';
/* $pattern $raw_regex
* . $matches[1]

preg
*
*
*
$f

561

,
. ,
. */
= create_function('$matches', '
//
if (empty($matches[1])) // ,
return $matches[0]; //
else
//
return "\\\\" . $matches[1]; // .

');
/* $pattern $raw_regex,
* $cooked */
$cooked = preg_replace_callback($pattern, $f, $rawRregex);
}
/* $cooked ,
* */
return "/$cooked/$modifiers";
}


, , 
.
( , 
preg).

, preg_replace_callback, 
, 
, 

.


,
,

.
,
*.txt , , 
( 27),
preg_regex_to_pattern /*.txt/. 
, 
( 
):
Compilation failed: nothing to repeat at offset 0

PHP
,
.

562

10. PHP

preg_pattern_error 
preg_match .

, 
preg_match.
/*
* ,
* .
* ( ) false.
*/
function preg_pattern_error($pattern)
{
/* ,
* .
* ,
* (,
* $php_errormsg). ,
* 'track_errors' , $php_errormsg
* .
* 'track_errors' , (
* ), .
*/
if ($old_track = ini_get("track_errors"))
$old_message = isset($php_errormsg) ? $php_errormsg : false;
else
ini_set('track_errors', 1);
/* track_errors . */
unset($php_errormsg);
@ preg_match($pattern, ""); /* ! */
$return_value = isset($php_errormsg) ? $php_errormsg : false;
/* , ,
* . */
if ($old_track)
$php_errormsg = isset($old_message) ? $old_message : false;
else
ini_set('track_errors', 0);
return $return_value;
}



, 
( 
). 
,
, false.
/*
* ,

563

* .
* false.
*/
function preg_regex_error($regex)
{
return preg_pattern_error(preg_regex_to_pattern($regex));
}



3,

.
(?R)
, (?) 

. 
(?P>).

. , 

(. 570).


, , ,
.
: (?: [^()]++ | \( (?R) \) )*.
. 
[^()]++ , 
. 
+, 
( 280)  (?:)*.
, \((?R)\), 
.
(
) .
, ,
(?R) 
.

,
, , 
, 
(?R).

564

10. PHP


. 
^$, 
.
, 

.


(?R) 
, , (?), 
. (?)
, 
(
.1 (?), ,
(?0) (?R).

:
^$,
, (?R) (?1). 
, (?1),
, 
.

^ $ ,
: ^((?:[^()]++|\((?1)\))*)$.

, 
, (?1).

PHP, ,
$text :
if (preg_match('/^ ( (?: [^()]++ | \( (?1) \) )* ) $/x', $text))
echo "text is balanced\n";
else
echo "text is unbalanced\n";


, 
( 180), 
1

, (?) (
, . . (?)
, 
.
.

565

(?) 
(?P>).
:

^(?P<stuff> (?: [^()]++|\((?P>stuff)\) )* )$

, 
x
( 527):
$pattern = '{
# ...
^
(?P<stuff>
# , , "stuff."
(?:
[^()]++ # ,
|
\( (?P>stuff) \) # . , " " .
)*
)
$
# .
}x'; # 'x'  .
if (preg_match($pattern, $text))
echo "text is balanced\n";
else
echo "text is unbalanced\n";



.
(?:)* 
, [^()]++ 
. 
, , ( )
. 

( 317) ,
: (?:[^()]|\((?R)\))*.
, 
. 
, , 
6 ( 319), [^()]*(?:\((?R)\)
[^()]*)*.

566

10. PHP



preg , , 
,
( 317). ,
, 
( ).
,
, 
,
. , , 
.



. , , ,
( 
): \((?:[^()]++|(?R))*\).
, , 
. ,

, 
(?R)
, (?1) (
).

PHP
preg PHP PCRE
. 

, 46, 
,
, 
. PHP 
6 ( 288).
, ,
, 
e (551),

.
, 
, PHP
4096 ( 297),

PHP

567


.
S: 
(studies) 
. (
study Perl, ,
429).

S: Study
S 

1 ,
. 
, 
.
, 
S , 
: , 6 (
( 303).
, , 

. S
,

.

S
<(\w+). 
, <.
( preg
)
, < 
( 
<,
).
,
, 
.
, . 
1

.

, 
S, .

568

10. PHP

, 
,
. , <i>|</i>|<b>|</b> 
, <(\w+),

, .
, .


S
preg 
, ,
. S 
, 
, 
.
,
,
S :

<(\w+) | &(\w+);

< &

[Rr]e:

R r

(Jan|Feb||Dec)\b

A D F J M N O S

(Re:\s*)? SPAM

R S

\s*,\s*

\x09 \x0A \x0C \x0D \x20

[&<">]

& < " >

\r?\n\r?\n

\r \n


, S 
:
, ( ^
\b),
. , 
\b, 
.
, , 
\s*.
, ( )
, (?: [^()]++ | \( (?R) \) )*,

569

. 566. 
, ), 

, .
, 
, .


S 

, 
. 

.


, .

CVS PHP
PHP CSV
(, ) 6 ( 330). 

( 184) ,
.
:
$csv_regex = '{
\G(?:^|,)
(?:
# ...
" #
( [^"]*+ (?: "" [^"]*+ )*+ )
" #
| # ......
# ... , ...
( [^",]*+ )
)
}x';

CSV, 
$line:
/* , $all_matches
*/
preg_match_all($csv_regex, $line, $all_matches);
/* $Result , $all_matches */

570

10. PHP
$Result = array ();
/* ... */
for ($i = 0; $i < count($all_matches[0]); $i++)
{
/* 
* */
if (strlen($all_matches[2][$i]) > 0)
array_push($Result, $all_matches[2][$i]);
else
{
/* ,
* */
array_push($Result, preg_replace('/""/', '"', $all_matches[1][$i]));
}
}
/* $Result */




: XML (XHTML 
) 
. 
:
,
(
, <br/> XML),
, .
:

^((?:<(\w++)[^>]*+(?<!/)>(?1)</\2>|[^<>]++|<\w[^>]*+/>)*+)$

, 
( ).
, 

.

^()$, ,
. 
, 
.


( 
),
(?:)*+, 

571

. ,
.

(. . 
, ).
, 
, 

* ,
. 
, 
, .

, ,
, , ( 317).
...
:
, : [^<>]++.
, . 

, , (?:)*+
.
, ,
. (
,
. ,
317.)
:
, <\w[^>]*+/>, 
, <br/> <img /> (
/,
). , 
, .
:
: <(\w++)
[^>]*+(?<!/)>(?1)</\2>.
( )
: (\w++) 
.
( (\w++)
, .)

572

10. PHP

(?<!/)
( 175), /
.
>,
, , <hr/> (
,
).
, 
(?1) 
. 
, .
,
, , 
(
). </ </\2>
, ,
\2 .

HTML ,
, 
(?i) i.
!



\w++ <(\w++)
[^>]*+(?<!/)>.
,

( 180), \b (\w+),
: <(\w+)\b[^>]*(?<!/)>.
\b , , 
li <link></li>.
nk 
, 
\2, .
, \w+
.

, ,
\w+ 
,
<link></li>. \b 
.

573

, preg PHP 
, 
\w++ 
, \b,
.

XML
XML , 
. 
XML, CDATA 
.
XML
<!.*?> (?s) 
s,
.
CDATA, <![CDATA[]]>, 
<!\[CDATA\[.*?]]>, (
XML, <?xmlversion="1.0"?>, 
<\?.*?\?>.
, <!ENTITY>, 
<!ENTITY\b.*?>. XML 
, 
<!ENTITY\b.*?> <![AZ].*?>.
, , ,
XML.
, :
$xmlRregex = '{
^(
(?: <(\w++) [^>]*+ (?<!/)> (?1) </\2> #
|[^<>]++
#
| <\w[^>]*+/>
#
| <!.*?>
#
| <!\[CDATA\[.*?]]>
# cdata
| <\?.*?\?>
#
| <![AZ].*?>
# .
)*+
)$
}sx';
if (preg_match($xml_regex, $xml_string))
echo "block structure seems valid\n";
else
echo "block structure seems invalid\n";

574

10. PHP

HTML?
HTML ,
, :
, 
< >. HTML
,
, <script>.
HTML XML
<!.*?> s.
<script> , 
< >, , 

<script> </script>.
<script\b[^>]*>.*?</script>.
, 
< >, ,
. <script> 
, , 
.
PHP 
HTML:
$htmlRregex = '{
^(
(?: <(\w++) [^>]*+ (?<!/)> (?1) </\2> #
|[^<>]++
#
| <\w[^>]*+/>
#
| <!.*?>
#
| <script\b[^>]*>.*?</script>
#
)*+
)$
}isx';
if (preg_match($html_regex, $html_string))
echo "block structure seems valid\n";
else
echo "block structure seems invalid\n";


\0, 153
\1, 366
Perl, 67
?
, 42
, 206
?!, 329
??, 372

(?i), 528

(?i), 527

(?m), 527

(?s), 527

(?U), 528

(?X), 527

(?x), 527
(?!), 399, 401
(?#), 150, 177
(?1), 564
Java, 477
PCRE, 564
PHP, 564, 572
(?n), 484
(?P<>), 535
(?R)
PCRE, 563
PHP, 563
\(\), 178
//, 387
\ \ \ \, 240
@
, Perl, 351
, 107
@, 363, 365
@+, 363, 365
@"", 137
., Java, 442
=~, , 65

*
, 43
, 207
.*
, 84
, 196
, 302
+
, 43
, 207
++, 572
\+, , 119
\<\>, 47, 78, 174, 193
egrep, 39
Emacs, 135
<>, 83
[:<:], 124
[==], 168
[::], 166
[..], 167
[:>:], 124
$, 148, 529
Java
, 442
, 169
Perl
, 350,
351
, 63
PHP, 523
, 107
$$, .NET, 502
\$, PHP, 523
$/, 61
$&, , 362, 363
.NET, 502
, 398
, 427
, 426

576
, 425
, 428
$', , 362, 363
.NET, 502
, 398
, 427
, 426
, 425
, 428
$`, , 362, 363
.NET, 502
, 398
, 427
, 426
, 425
, 428
$+, , 363, 364, 413
.NET, 254, 502
, 425
, 254
$_, , 110, 388
.NET, 502
$^N, , 363, 365, 413
$^R, , 365, 393
$^W, , 359
^, 148
Java, 442
, 302
^Subject: , 127, 196, 199, 297,
301, 350
Java, 128
.NET, 129
PHP, 130
Python, 130
$0, 363
Java, 453
PHP, 543
${0}, 543
$1, , 178, 363, 366
Java, 453
.NET, 502
, 67
, 425
${}
.NET, 502
{min, max}, 45

A
\A, 148, 169
, 302


\a, 151
$all_matches, 538
$matches, 537

, 539
, 538
anchored(), 433
AND,
, 164
appendReplacement, , 454
appendTail, , 454
$ARGV, 111
ASCII, , 140, 151
AT&T Bell Labs, 118
awk
gensub, 231
, 174
, 179
, 118
, 133

B
<B></B>, 211
\B, 174
\b, 95, 151, 174
Perl, 347
PHP, 521
, 71, 74
Java, 438
\b\B, 295
BLTN, Java, 290
BOL, 433
<br/>, , 570
BRE ( ),
119

C
\C, 157
PHP, 522, 523
\c, 155
/c, , 371, 380
C#
, 179
, 137
Capture, , 516
.NET, 495
CaptureCollection, , 518
CDATA, 573
CharBuffer, , 445
charnames, , 351

577


CharSequence, , 436, 446, 456,
473
CheckNaughtiness(), , 429
chr, , 497
Compilation failed, 561
compile, , 444
Compiled (.NET), ,
292, 485, 487, 498, 506
CompileToAssembly, 515
Config, , 351, 361
CR, 144, 442
create_function, , 549, 550
CR/LF, 442
CSV, ,
, , 330
currentTimeMillis(), , 290

D
\D, 77, 158
\d, 77, 158
Perl, 349
PHP, 521
Darth, 247
date_default_timezone_set(), ,
289
DBIx::DWIW, Perl, 315
debugcolor, 434
definekey, 135
Devel::FindAmpersand, , 428
Devel::SawAmpersand, , 428
Dr, , 434

E
\e, 110, 151
\E,
Java, 440, 470, 480
/e, , 384, 385
ECMAScript (.NET),
, 485, 489, 498, 506
ed, , 117
egrep
, 174
, 179

, 39
, 124
, 194
, ,
47
, 39


, 232
, 58
, 118
Emacs
researchforward, 134
, 174
, 179
, 134

, 135
, 168
, 155
end, , 450
English, , 428
ERE (
), 119
ereg, , 519
Escape ANSI, 111
eval, , 385
Explicit, , 492
ExplicitCapture (.NET),
, 485, 498, 506

F
\f, 151
, 71
FF, 144
find, , 448
flags, , 470
flex, , 123
floating '', 433
foreach, if while, , 385

G
\G, 171, 265, 529
.NET, 484
, ,
475
, 172
, 302
, 379, 380
/g
, 79
, 371, 374, 375, 380, 384

, 424
gensub, , 231
GetGroupNames ( Regex),
505

578
GetGroupNumbers (
Regex), 505
gettimeofday(), , 289
GNU awk
gensub, 231
, 123
GNU egrep
, 123
, 194

, 232
GNU Emacs
, 124
, 123
GNU grep
, 123
, 231
GNU sed, , 123
GPOS, 433
grep
Perl, 390
, 118
, 117
y, 118
, 124
, 118
group, , 450
Group, (.NET), 495
Capture, 517
Captures, 509
Index, 509
Length, 509
Success, 509
ToString, 509
Value, 509
, 508
GroupCollection, , 518
groupCount, , 450
GroupNameFromNumber (
Regex), 505
GroupNumberFromName (
Regex), 505
Groups, ( Match), 507

H
hasAnchoringBounds, , 463
hasTransparentBounds, , 462
height, , Java, 473
hitEnd, , 465


$HostnameRegex, , 106, 179,
367, 420
HTML, , 532, 543, 545, 549, 574
<HR>, , 245
URL, 104, 258, 260, 367, 385
, 492
, 211
, 380
, 492
, , 251
, 97
, 172
, , 253
, 24, 43, 427
htmlspecialchars, , 545
HTTP, , 552
HTTP URL, , 51, 253, 258, 260,
318, 367, 385
http://regex.info/, , 429
$HttpUrl, , 367, 413, 420
http://www.cpan.org/, , 428
Hz, 144

I
/i, , 176
, 74
study, 430
i y, 118
if, while foreach, , 385
IgnoreCase (.NET),
, 130, 133, 485, 497, 506
IgnorePatternWhitespace (.NET),
, 133, 485, 497,
506
IllegalArgumentException, ,
445
IllegalStateException, , 449
<img>, , Java, 473
implicit, 433
Imports, , 490, 493, 513
In Is, , 160
Index,
Group, 509
Match, 507
IndexOutOfBoundsException, 
, 448, 449, 453
IP, , 236, 376, 379
Iraq, 35
Is In, , Perl, 349
IsMatch (.NET), 491

579


IsMatch ( Regex), 499
ISO88591, , 120,
140

J
Java, 436
BLTN, 290
\E, , 480
find, , 448
JIT, 290
\Q, , 480
, 146
split, , 470
, 290
, 174
, 438
, 179
, 442
,
463
, 261, 271, 289
, 457
, 463
, 443
, 436
, 436, 440, 467, 476, 480
, 447
, 447, 457
, 451
, 443,
448, 451, 455, 463
,
461
CSV, , 476
,
, , 271
, 477
, 440
, 442
, 441
, 447, 478
, 289
, 441
java.lang.Character, , 442
java.util.regex
, 123
java.util.Scanner, , 465
Jeffs, , 90
JfriedlsRegexLibrary, 513
JIT, , 487

Java, 290
JRE, Java, 290

L
\l, 351
\L\E, 352
Latin1, , 120, 140
lc(), , 352
lcfirst(), , 351
Length,
Group, 509
Match, 507
$LevelN, , 396, 411
lex
$, 148
, 118
, 147
LF, 144, 442
LIFO, , 204
local,
, 402
local, , 358
localtime, , 356, 385, 420
lookingAt, , 449
LS, 145, 442

M
/m, 176
Perl, 349
m//
, 64
makudonarudo, , 210, 282
Match, 130
Match (.NET) Success, 130
Match (.NET), , 495
Captures, 508, 517
Empty, 512
Groups, 507
Index, 507
Length, 507
NextMatch, 508
Result, 508
Success, 507
Synchronized, 508
ToString, 507
Value, 507
, 507
, 499, 508
Match ( Regex), 499
MatchCollection, , 500

580
matcher, , 445
Matcher, , 446
appendReplacement, , 454
appendTail, , 454
end, , 450
find, , 448
group, , 450
groupCount, , 450
hasAnchoringBounds, , 463
hasTransparentBounds, , 462
hitEnd, , 465
lookingAt, , 449
matches, , 449
pattern, , 468
quoteReplacement, , 452
region, , 460
regionEnd, , 460
regionStart, , 460
replaceAll, , 451
replaceFirst, , 452
requireEnd, , 465
reset, , 468
start, , 450
toMatchResult, , 450
toString, , 469
useAnchoringBounds, , 463
usePattern, , 468, 475
useTransparentBounds, , 462
, 453
, 457
$matches $all_matches, 537
Matches ( Regex), 500
$matches, , 532
matches, , 449, 470
MatchEvaluator, 501
mb_ereg, , 519
MBOL, 433
{min, max}, 45
minlen, , 433
MSIL (Microsoft Intermediate Language),
487
Multiline (.NET), ,
485, 497, 506
MungeRegexLiteral, , 409, 415
my, , 405
MySQL, , 123

N
\n, 77, 151
, 71



, 152
\N{}, 351
NEL, 144
$NestedStuffRegex, , 407,
414
.NET, 481
$+, 254
JIT, 487
MSIL (Microsoft Intermediate Lan
guage), 487
URL, , 257
, 174
, 485
, 124
, 494

, 483
, 481
, 492, 501
,
129
, 484
, 291
.NET Framework
, 123
New Regex, 130, 132, 493, 499
NextMatch ( Match), 508
no re 'debug', , 432
nomatchvars, , 428
None (.NET), 498, 506

O
/o, , 421
,
424
oneself, , 399
Options ( Regex), 505
OR,
, 164
osmosis, 355
overload, , 410

P
\p{^}, 164, 349
\P{}, 159, 164
Java, 439, 441, 480
\p{}, 159, 349
Java, 439, 441, 480
Perl, 164

581


\p{All}
Perl, 349
\p{all}, 441
\p{Any}, 164
Perl, 349
\p{Assigned}, 164
Perl, 349
\p{C}, 159
Java, 441
\p{Cc}, 161
\p{Cf}, 161
\p{Close_Punctuation}, 161
\P{Cn}, 165
\p{Cn}, 161, 441, 484
Java, 441
\p{Co}, 161
\p{Connector_Punctuatlon}, 161
\p{Control}, 161
\p{Currency_Symbol}, 161
\p{Cyrillic}, 162, 164
\p{Dash_Punctuation}, 161
\p{Decimal_Digit_Number}, 161
\p{Enclosing_Mark}, 160
\p{Final_Punctuation}, 161
\p{Format}, 161
\p{Greek}, 164
\p{Han}, 162
\p{Hebrew}, 162
\p{Hiragana}, 162
\p{InCyrillic}, 164
\p{Inherited}, 162
\p{Initial_Punctuation}, 161
\p{InTibetan}, 162
\p{IsCommon}, 162
\p{IsCyrillic}, 164
\p{IsGreek}, 164
\p{IsL}, 164
\p{IsTibetan}, 162
\p{javaJavaIdentifierStart}, 442
\p{Katakana}, 162
\p{L&}, 160, 164
Perl, 349
\p{L}, 159, 174, 438
\p{Latin}, 162
\p{Letter}, 159, 164
Perl, 349
\p{Letter_Number}, 161
\p{Line_Separator}, 160
\p{Ll}, 160, 484
\p{Lm}, 160, 484
\p{Lo}, 160, 484

\p{Lowercase_Letter}, 160
\p{Lt}, 160, 484
\p{Lu}, 160, 484
\p{M}, 157, 159
\p{Mark}, 159
\p{Math_Symbol}, 161
\p{Mc}, 160
\p{Me}, 160
\p{Mn}, 160
\p{Modifier_Letter}, 160
\p{Modlfier_Symbol}, 161
\p{N}, 159
\p{Nd}, 161, 438, 484
\p{Nl}, 161
\p{No}, 161
\p{Non_Spacing_Mark}, 160
\p{Number}, 159
\p{Open_Punctuatlon}, 161
\p{Other}, 159
\p{Other_Letter}, 160
\p{Other_Number}, 161
\p{Other_Punctuation}, 161
\p{Other_Symbol}, 161
\p{P}, 159
\p{Paragraph_Separator}, 161
\p{Pc}, 161
\p{Pd}, 161
\p{Pe}, 161
\p{Pf}, 161
Java, 441
\p{Pi}, 161
Java, 441
\p{Po}, 161
\p{Private_Use}, 161
\p{Ps}, 161
\p{Punctuation}, 159
\p{S}, 159
\p{Sc}, 161
\p{Separator}, 159
\p{Sk}, 161
\p{Sm}, 161
\p{So}, 161
\p{Space_Separator}, 160
\p{Spacing_Combining_Mark}, 160
\p{Symbol}, 159
\p{Titlecase_Letter}, 160
\p{Unassigned}, 161, 164
Perl, 349
\p{Uppercase_Letter}, 160
\p{Z}, 159, 438, 484
\p{Zl}, 160

582
\p{Zp}, 161
\p{Zs}, 160
\p{}, 484
panic: top_env, 399
Pascal, 62, 232
, 323
Pattern
CANON_EQ, 143
CASE_INSENSITIVE,
, 128, 132, 145, 444
, 468
COMMENTS, , 132,
440
compile, , 444
DOTALL, , 440, 442
flags, , 470
matcher, , 445
matches, , 470
MULTILINE, , 440,
442
pattern, , 470
quote, , 470
split, , 470
toString, , 470
UNICODE_CASE, ,
444
UNIX_LINES, ,
440, 442
pattern, , 468, 470
Pattern.CANON_EQ, ,
440
Pattern.CASE_INSENSITIVE,
, 440
Pattern.LITERAL, ,
440
PatternSyntaxException, ,
443, 445
Pattern.UNICODE_CASE,
, 440
PCRE
\w, 158
study, 529
X, 529

, 521
, 123
, 564
, 563
, 123
PCRE, , 520
, 521


Perl, , 61
$/, 61
$^W,
, 359
,
357
, 174
, 179
, 356
, 63
, 419
, 354
, 124

, 388
,
346
, 123, 343

0, 62
c, 432
Dr, 434
e, 62, 81, 432
i, 81
M, 432
Mre=debug, 434
n, 62
p, 81
w, 64, 391, 432
, 383
, 64
$^W, , 359
use warnings, 391
, 416
,
, , 266
, 175
, 346
, 120
Perl Porters, 122
PHP, 519
\w, 158
, 525
, 174
, 179
, 532, 564
study, 529
, 558

, 521
, 123, 521

583


, 542
CVS,
, 569
,
526, 530
, 564
, 563
, 523
, 525
, 548, 550
, 288
, 566
\pL, PHP, 521
\pN, PHP, 521
pos(), , 171, 378
POSIX
[==], 168
[::], 166
[..], 167
BRE (
), 119
ERE (
), 119
, 166
, 167
, 119
,
167
,
, 226
, 166
, 167
, 168
POSIX
, 283
, 190
preg, , 519
preg_grep, , 556
PREG_GREP_INVERT, , 556
preg_match, , 531
, 536
preg_match_all, , 536
PREG_OFFSET_CAPTURE, , 535,
538, 540
preg_pattern_error, , 562
PREG_PATTERN_ORDER, , 538
preg_quote, , 178, 557
preg_regex_error, , 563
preg_regex_to_pattern, , 558
preg_replace, , 542
preg_replace_callback, , 548

preg_split, , 551
PREG_SPLIT_DELIM_CAPTURE, ,
553, 554
PREG_SPLIT_NO_EMPTY, , 554
PREG_SPLIT_OFFSET_CAPTURE,
, 554
Procmail, , 123, 127
PS, 145, 442
Python
\Z, 148
, 174
, 179
, 176
, 123
,
130
, 175
, 138
, 293
\pZ, PHP, 521

Q
\Q, Java, 440, 470, 480
\Q\E, 352
Qantas, 35
qed, , 117
qr//, 107
quote, , 178, 470
quoteReplacement, , 452

R
\r, 77, 151

, 152
r"", 138
re 'debug', , 432
Regex (.NET),
CompileToAssembly, 513, 515
Escape, 511
GetGroupNames, 505
GetGroupNumbers, 505
GroupNameFromNumber, 505
GroupNumberFromName, 505
IsMatch, 491, 509
Match, 491, 495, 509
Matches, 500, 509
Options, 505
Replace, 492, 501, 509
RightToLeft, 505
Split, 504, 509

584
ToString, 505
Unescape, 512
, 496
, 499
, 495, 497
, 497
Regex.Escape, 178
RegexOptions
Compiled (.NET),
, 292, 485, 487, 498, 506
ECMAScript, ,
485, 489, 498, 506
ExplicitCapture, 
, 485, 498, 506
IgnoreCase, ,
130, 133, 485, 497
IgnorePatternWhitespace, 
, 133, 485, 497, 506
Multiline, , 485,
497, 506
None, 498, 506
RightToLeft, ,
485, 489, 498, 506
Singleline, , 485,
498
RegexOptions.RightToLeft, 504
region, , 460
regionEnd, , 460
regionStart, , 460
reg_match, , 538
regsub, 134
Replace ( Regex), 501
replaceAll, , 451
replaceFirst, , 452
requireEnd, , 465
researchforward, 134
reset, , 468
Result ( Match), 508
RightToLeft (.NET),
, 485, 489, 498, 504, 506
RightToLeft ( Regex), 505
Ruby
, 174
, 179
, 176
, 123
, 292
rx, 232

S
\S, 77, 84, 158
Emacs, 169
/s, 176
\s, 77, 158
Emacs, 168
Perl, 349
PHP, 521
, 74
s///, , 78, 383
SBOL, 433
sed
, 174
, 179
, 147
Singleline (.NET), ,
485, 498, 506
split
.NET, 504
Split ( Regex, 504
split,
Java, 470
,
472
split, , 387
start, , 450
Strict (), 492
strict, , 357, 403, 415
String, , 445
StringBuffer, , 445, 456, 473
StringBuilder, , 456, 473
str_ireplace, (PHP), 542
str_replace, (PHP), 542
study, , 429
, 430
Success,
Group, 509
Match, 507
System.currentTimeMillis(), ,
290
System.Text.RegularExpressions, 490,
493

T
\t, 77, 151
, 71
T1me::HiRes, , 286

585


Tcl
[:<:], 124
[:>:], 124
regsub, 134

, 298
, 174
, 179
, 176
, 124
, 123
, 134

, 232
, 139
, 147
, 293
this|that, , 298, 312, 319
time(), , 286
Time::HiRes, , 429, 431
Timer(), , 292
toMatchResult, , 450
ToString,
Group, 509
Match, 507
Regex, 505
toString, , 469, 470

U
\U, 155
\u, 155, 482
U+C0B5, 142
\U\E, 352
uc(), , 352
ucfirst(), , 351
UCS2, , 142
UCS4, , 142
UnicodeData.txt, , 351
unicore, , 351
URL, , 104, 318, 532
, 258, 260
use charnames, , 351
use Config, 351
use strict, 357, 403
useAnchoringBounds, , 463
usePattern, , 468, 475
useTransparentBounds, , 462
UTF16, , 142
UTF8, , 142, 528

V
\v, 151
, 435
\V, , 435
Value
Group, 509
VB.NET
, 179
,
129
, 137
vi, , 179
Visual Studio .NET, 513
VT, 144

W
\W, 77, 158
\w, 77, 95, 158
Perl, 349
PHP, 158, 521
Java, 438
,
125
w, , 359
warnings, , 391
while, foreach if, , 385
width, , Java, 473
with eval, 433

X
\X, 143, 158
/x, 103, 176
Perl, 349
, 122
\x, 144, 155, 482
Perl, 347
XML, 573
CDATA, 573
, 570, 573

Y
y,

grep, 118
Yahoo!, 258

Z
\Z, 148, 169
Java
, 442

586

PHP, 523
\z, 169, 529
, 381

 (rMoore), 300
,
191

, ,
307
, 162, 349
PHP, 523
, , 571

, 204
, 202
, 197
, 228
, 57, 114
, 187
, 186
, , 319
( local
Perl), 360
, 192
, 186


, 353, 384

Java, 453
, 525
, 317
, 180
, 218,
327
, 216
, 249, 253, 266, 397, 408
, 217
(Alfred Aho), 118, 228

, ,
123
, 144
\s, Perl, 349

, 70
,
, 361
, 290
, 291
, 570
.NET, 515
Perl, 393, 407
PHP, 563, 570
, 209
LIFO, 204
, 202
, , 206
, 205
, 220
, 204
, 215
, 227
, 144, 442

POSIX , , 283
, 280
, 285
, 280
, 306
, 282
, 275, 277
, 280
, 279
, 153

, 489
, 426
, 52
, 397
local, , 402
my, 405
, 393
, 181
, 37

, 142
,
235
, 419
, , 322
, 279,
397, 408
, 280
, 280
, 349, 441, 480, 484

587


, 37
, 399
, 312
, , 312
, 165


, 231
,
Perl, 357
(James Gosling), 121

\<\>
egrep, 39
Perl, 349
, 39
, 147
, 45
, 166

, , 186
, 501
, , 50

, 124
, 53

Java, 438
.NET, 485
PCRE, 521
Perl, 347
PHP, 521
, 143
, Perl, 355, 357
, 361
,
393

charnames, 351
Imports, 490, 513
no re 'debug', 432
nomatchvars, 428
overload, 410
re 'debug', 432
strict, 357, 403, 415
warnings, 391

, 188, 200

, 194, 231
, 230
, 190
, 231
, 201
,
, 225
, 200, 229, 279, 281
, 229
No Dashes Or Spaces, 543

\, 352

, 230
, 144
Java, 442
,
152
(Jeremy Zawodny),
315
, 463
Java, 463
, , 545
PHP, 542
s///, 78
, 546,
549
, 407
, 184,
219, 317, 565, 572
, 411

, 317, 327, 328, 571
, 307
, 249, 253
(Andrei Zmievski), 520

, , 49
, , 239
, 351
, 180
.NET, 484
PHP, 532, 540, 564
, 413

, 221

588
,
411
, 413
, 386
,
, 316

Perl, 409

, 166
$&, 427
$', 427
$`, 427
, , 131
URL, , 104
, , 526

VB.NET, , 257
URL, , 104
, 258, 260
, 51, 131, 180, 318, 325, 532
, , 255

,
213
, 373

, 34
, 573
, 127
, 298
, , 45
,
352
,
527
, 350
PHP, 137
, 107
Perl, 404
, 386
, 420
, 305
,
, 304

, , 304
, 302
, 316
, 337

IllegalArgumentException, 445
IllegalStateException, 449
IndexOutOfBoundsException, 448,
449, 453
PatternSyntaxException, 443, 445

\+, 119
awk, 118
egrep, 118
grep, 118
lex, 118
Perl, 120
PHP, 520
sed, 118
/, 122

, 116
\w, 120

, 211
, 43
*, , 43
?, , 42
, 206
(), 46
+, , 43
, 219, 565, 572
,
317, 327, 328, 571
, 207
, , 303

, 302
, 290
, 228
Java, 261, 271, 289

ASCII, 140
Latin1, 120, 140
UCS2, 142
UCS4, 142
UTF16, 142
UTF8, 142, 521, 528
, 140

, 141
U+FFFF, 144
, 142, 158

589


, 150, 177
Java, 132
.NET, 497
XML, 573
Pascal, 323
, 331

, 297
, /o, 421
, 487
(Robert Constable),
116

, 285
, 275, 285
, 222
,
239, 276, 318

, 279
, 223
, 374

, 238
, 71
Perl, 356
,
, 302
, 193
split,
.NET, 504
split
Perl, 392
, 515, 563
, 37
egrep, 45
, 180, 413,
532, 540
\(\), 117
, 533,
556
, 72
, 304
, 515
, 243
, 564
, 563
, Perl, 68
, 364
, 194

, 297
.NET, 510
PHP, 566
Perl, 419
Tcl, 299
, 419
, 420
, 297
,
298

, 300
, 299
,
419

, 361

, 193
, 139, 350,
354, 371
, 419
, 177
, 149

, 312

/, 432
, 358
, 167
, 119
(Tom Lord), 232

, 195
, 207
, 313
, 530
, 222
, 277

, 213
, 204
, 210
, 215
, 196
, 183

590

, 34
, 53

, 409
,
, 198
, 366
, , 200

, 231, 294, 298
, 190

, 281
, 200, 229
, POSIX
Perl, 402

, 296
, 212
, 313

, 213
, 304
, 215
, 184
, 71
, 211
, 147

/g, 79
/i, 74
/osmosis, 355
, Perl, 354
,
368
, 530
, 381
, 368
, 145, 176, 485

A, 528, 529
D, 523, 528, 529
e, 528, 544, 550
I, 527
m, 523, 527
PHP, 527
S, 317, 528, 544, 567
s, 527
U, 528, 530
u, 521, 527, 535


X, 527, 529
x, 523, 527
,
530

, , 88
, PHP, 558
, 116
, 356
, 116
, 228
, 71
, 205
, 218
, 72,
179
, , 399

egrep, 39
, 39

, 198
, 222
, 207
, 188
, 199
, 190
, 231
, 201
, 200, 229
, 228
, 229
, 280
, , 321
, 153
 , 157


hitEnd, , 465
Java, 457
requireEnd, , 465
, 473
, 463
, 
, 459
, 461


,
, 306
,
, 79
, , 81
, 178
egrep, 45
, 194, 231
, 489
, 46
, 297
, 321,
322
, 463
Java, 463
, 167

Java, 443
.NET, 494
 ,
127
, 300
, 367
/g, , 424
/o, , 424
, 370,
422
, 369
, preg_split, 551

Perl, 388
PHP, 551
, Java,
472
,
293
, ,
213, 223

, 217

, 216
,
Perl, 346
, 164
, 175
<B></B>, 213
Java, 440
, 88

591
,
221
,
166
, 316
, 96
, 90

Java, 436
.NET, 481
PCRE, 521
Perl, 343
PHP, 521
, 294
BLTN, 290
JIT, 290
, 487
, 307
, 312
,
304

, 304
, 302,
316

, 241
, 303
,
302
, 304
, 306
, 230
, 307

, 300

, 306
, 301
, 302
, 301
, 303
,
314
, 294
, 303
, 301, 303
, 308

, 308
, 193

592
, , 218

$&, , 398
$', , 398
$`, , 398
, 433
,
369
, 431
, 398
, 230
, 538

Java, 436, 440, 467, 476, 480


hitEnd, 467

c, 432
Dr, 434
e, 432
M, 432
Mre=debug, 434
w, 359, 432

, 335
, 144, 442
, 144
,
352, 409
, 410
, ,
420

, 426

Perl, 363
, 412
, 357
, 425
(Jeff Pinyan), 302
,
, 78

PHP, 523
, ,
307
, 264

s///, 383
, 111

, 302
, , 303
, 88
Perl, 349
, 220
, 88
, 183
, 447
, 447
Java, 457
,

, 329
,
328

IP, , 236
URL , 258
, ,
131
Java, 132
.NET, 132
, 407
, 49
, ,
101
, , 47
Emacs, 135
Perl, 61, 108

awk, 133
Emacs, 134
Java, 451, 456
.NET, 492, 501
Perl, 383
PHP, 542
Tcl, 134
, 400
,
, 402
, 370
, 279, 397, 408
, 280
, 280
, , 203

, 398


, 281
, 373

, 245
, 356
, 356
, 356, 374
, 356, 374

.*, 196
, 193
, , 195
, 201
, 201
, 193
, 193
, 193
, 193
, , 206
, 538
, 382
, 205
, 205
, 538
, 145
, 400
,
, 225, 402
, 377
, 229
HTML, 253
HTML, 251
, 246
, 247
, 227
, 488
, 357

, 546
,
,
egrep, 50
, , 262

,
191

, 195
, 277

, , 300

593
, 425
, 294

, , 306
, 151

.*, 85
Perl, 64
use warnings, Perl, 391
, 359

, , 100
,
.NET, 503
Perl, 63, 343
PHP, 525
, 356

, 351
, ,
302
, ,
301

$+ (.NET), 254
^Subject, 301, 350
gr[ea]y, 32
<img>, , 473
Jeffs, 90
HTML, 524, 543, 545, 549, 574
<HR>, 245
URL, 258, 260, 367, 532
, 492
URL, 385
, 211
, 380
, 492
, 251
, 172
, 475
, 253
, 51, 427
HTML URL, 255
HTTP URL, 255
HTTP, 552
IP, 376, 379
.NET
$+, 254
URL, 257
oneself, 399
this|that, 298, 303, 312, 319

594

URL, 51, 104, 253, 367, 385, 532


VB.NET
, 257
XML, 570, 573
, 249, 253,
266, 397, 408
, 50
,
249, 253
, 49
, 239, 526
, 131
URL, 104
, 131, 180, 318, 325, 532
Java, 261
URL, 104
, 88
, 79
, 81
, 213,
216, 223

, 223
, 90
,
410
, 78
IP, 236
URL, 260
, 258
,
131
Java, 132
.NET, 132
, 101

Emacs, 135
Perl, 61, 108
, 262

, 100
HTML, 97

Java, 455
.NET, 503
Perl, 63, 343
PHP, 525
, 255


, 62
, 452
, 239
, 381
CSV
Java, 476
CVS
PHP, 569
,
, 266
Java, 271

, 96
, 88

, 329
,
CSV, 330
, 329, 565

makudonarudo, 210, 282

, 322

, 247
, 327
, 275
, 330
, 245
Java, 443,
448, 451, 455, 463
, 140

, ,
303
, , 255
,
428

, 404
, 190
, Java, 461
, , 250
, 127
, 299
, 349
, 538
, , 239
, , 381

595

Perl, 386
PHP, 551
, 389
, 389

Perl, 388
, 387

, Perl, 392
CVS,
PHP, 569
, 486

, , 96
,
, 88
,
, 314
, 384

PHP, 526, 530
, 145, 442
, 145, 442
, Java, 477
, 291

, 335

, , 329
, 319, 565
, 322
, 565
, 99

, 231

.NET
, , 130
, 186
, 106
, 564
, Perl, 355
, 393
, 116
, 419
, 356
, 297, 419, 510, 566
, 296

, 354
, 235
, 352
, 350
, 431
, 352, 394
, 412
, 371, 424
, 354
, 229
, 561
, 116
, 368
, 563
,
, 225
, 356
, 116

, 145
/i, 74
Ruby, 145
study, 430
Java, 447

PCRE, 564
PHP, 564

Java, 477
PCRE, 563
PHP, 563
, 175
Java, 440
.NET, 484
Perl, 349
PHP, 523
, 96

, 484

, 175
, 137


, 331
, 335
Perl,
386

596
,
, 225
, 570
, 515,
563, 570
PHP, 570
,
, 291

, , 146
, 159
(Ravi Sethi), 228
, 290


, 152
, 142
, 302,
308
, 316
, 142, 158
HTTP, 153
, 55
\w, , 120
, 151
, 151
, 155
, 155
, 32
, 156
, 165

POSIX, 166
, 37
,
213

, 157
, 156
, , 304
, 34
, 193
, 164
, 483
, 165
, 165
, 156
, 163
, 168
Emacs, 168
, 263


, 356, 374, 377
, 464, 475
, ,
533
, 229
, 144
, 99

preg_match, 536
, 297

, 269
,
, 146
, ,
, Perl, 67
, 380
, 419
,
151
, , 53
(Henry Spencer), 120, 232
, 321
, 356, 374

, 460
, , 253
, 294

#, 137
Emacs, 135
Java, 137
, 137
Python, 138
Tcl, 139
VB.NET, 137
, 135, 369
, PHP, 525

makudonarudo, , 210, 282

, 322
, 327
, 275

, 247
, 97
, 226, 235
, 330

, 302

597



, , 300

, 217
,
, 215

HTML, , 51
XML, 570
, 570
, 246
, 447
Java, 478
, 228
, 146
(Thompson Ken), 147
, 156

, 156
, 35
, 193

, 190

,
, 308

, 
Perl, 346

preg, 524
, 548, 550

(Philip Hazel), 123


, , 202
, 291
, 286
.NET, 487
Java, 289
Perl, 431
PHP, 288
Python, 293
Ruby, 292
Tcl, 293
VB.NET, 291
, 426


, 417
, 250
(Jeffrey Ullman), 228
(Larry Wall), 120, 434
, 223
, 224
, 337
, 155
,
, 303
, 294
, 182
, 402

, 393

, 183

.NET, 486
, , 301,
303

, PHP, 289
, , 245

,
, 546,
549
, 155


, , 308
, , 47
, 280, 397, 408
, 280
, 280
, 306, 322

, 329

, 328

598


, 141
/x, 349
, Perl
\p(Any}, 349
\p{Assigned}, 349
\p{^}, 349
\p{}, 349
\p{Unassigned}, 349
, 159
Java, 441
, 142

PHP, 566
Perl, 416
, 416
, 227
,
422
, 277

Java, 441, 480


.NET, 484
Perl, 349
PHP, 521, 523
, Perl, 349

Java, 441, 480


.NET, 484
3.1, 144
, 442
Java, 442

, 141
U+FFFF,
144

. : .NET, C#, Java, MySQL,


Perl, procmail, Python, Ruby, Tcl,
VB.NET
, 34

$, 169
^, 169
, 193
, 193
, 169