Regular Expressions
Third Edition
2008
, 3
. .
.
.
.
.
.
.
.
.
, 3 . . . .: ,
2008. 608 ., .
ISBN13: 9785932861219
ISBN10: 5932861215
.
15 .
, Perl, PHP, Java,
Python, Ruby, MySQL, VB.NET, C# ( .NET),
.
PHP
. ,
,
java.util.regex Sun,
Java 1.4.2 Java 1.5/1.6.
,
,
, !
. ,
, .
ISBN13: 9785932861219
ISBN10: 5932861215
ISBN 0596528124 ()
, 2008
Authorized translation of the English edition 2006 OReilly Media, Inc. This trans
lation is published and sold by permission of OReilly Media, Inc., the owner of all
rights to publish and sell the same.
,
.
, , .
. 199034, , 16 , 7,
. (812) 3245353, www.symbol.ru. N 000054 25.12.98.
25.07.2008. 701001/16. .
38 . . 2000 .
199034, , 9 , 12.
.
, ,
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1. . . . . . . . . . . . . . . . . . . . . 24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
. . . . . . . . . . . . . . . . . . . . 29
,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
: egrep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
egrep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
. . . . . . . . . . . . . . . . . . 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
: . . . . . . . . . . . . . . . . . . . . . . . . . . 43
. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
. . . . . . . . . . . . . . . . . . . . . . . . 52
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Perl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
. . . . . . . . . . . . . . . . . . . . . 67
. . . 70
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
. . . . 77
: . . . . . . . . . . . . . . . . . . . . . . . . 78
: . . . . . . . . . . . . . . . . . . . . . . . 79
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
. . . . . . . . . . . . . . . . . . . . . . . . . 88
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
. . . . . . . . . . . . . . . . . . . . . . . . . . 108
3. : . . . . . . . . . . . . 114
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
. . . . . . . . . . . . . . . . . . . . . . 116
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
. . . . . . . . . . . . . . . 126
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
. . . . . . . . . 127
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
. . . . . . . . . . . . . . . . . . . . . . 149
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
. . . . . . . . . . . . . . . . . . . . . . 176
, ,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4. . . . . . . . . . . . . . . . . 186
! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
. . . . . . . . . . . . . . . . . . . . 188
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
1: . . . . . . . . . . . . . . . 191
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
2: . . . . . . . . . . . 195
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
: , . . . . . . . 198
: , . . . . . . . . . . . . . . . . . . . . . . . 200
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
. . . . . . . . . . . . . . . . . . . . . . . . . 209
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
, . . . . . . . . . . . 215
. . . . 216
?+, *+, ++ {max,min}+ . . . . . . 219
. . . . . . . . . . . . . . . . . . . . . . . . . . 220
? . . . . . . . . . . . . . . . . . . . . . . . 222
. . . . . . . . . . . . . . . . . . . . . . . 223
, POSIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
, . . . . . . . . . . 225
POSIX ,
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
5. . . . 234
. . . . . . . . . . . . . . . . . . . . . . . . . . 235
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
. . . . . . . . . . . . . . . . . . . . . . 245
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
. . . . . . . . . . . . . . . . . . 250
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
HTTP URL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
URL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
, . . . . . . . . . . . . . . . . . . . . . . . 266
6. . . . . . . . . . . . 274
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
. . . . . . . 276
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
. . . . . . . . . 277
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
POSIX . . . . . . . . . . . . . . . . . . . . . . . . . . 283
. . . . . . . . . . . . . . . . . 283
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
. . . . . . . . . . . . . . . . . . 285
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
. . . . . . . . . . . . . 288
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
VB.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Tcl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
. . . . . . . . . . . . . . . . 296
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
. . . . . . . . . . . . . . . . 301
. . . . . . . . . . . . . . . 303
. . . . . . . . . . . . . . . . . . . . . . . 309
, . . . . . . . . . . . . . . . . . . . . . . 310
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
. . . . . . . . . . . . . . . . . . . . . . . . . . 314
. . . . . . . . . . . . . . . . . . . 316
. . . . . . . . . . . . . . . . . . . . . . . . . . 317
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
1:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
. . . . . . . . . . . . . . . . . . . . . . . . . . 322
2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
3: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
. . . . . . . . . . . . . . . . . . . . . . . . . . 327
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
: ! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
7. Perl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
. . . . . . . . . . . . . . . . . . . 345
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
. . . . . . . . . . . . . . . 350
. . . . . . . . . 354
. . . . . . . . . . . . . . . . . . . . . . 354
rl . . . . . . . . . . . . . . . . . . . 355
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
. . . . . . . . . . . . . . . . . . . . . . . . . . 357
, . . . . . . . . . . . . 362
qr// . . . . . . . . . . . . . . . 366
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
. . . . . . . 369
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
12
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
. . . . . . . . . . . . . . . . . . . 374
/g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
/e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
. . . . . . . . . . . . . . . . . . . . . . . . . . . 386
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
split . . . . . . . . . . . . . . . . . 390
split . . . . . . . . . . 392
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
. . . . . . . . . . . . . . . . . . . . . . . . . 394
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
local . . . . . . . . . . . . . . . . . . . . . . 402
my . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
. . . . . . . . . . . . . . . . 409
. . . . 412
. . . . . . . . . . . . . . . . . . . . . . . . . 413
Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
. . . . . . . . . . . . . . . . . . . . . 417
, /, qr//
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
. . . . . . . . . . . . . . 431
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
8. Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
\p{} \P{} Java . . . . . . . . . . . . . . . . . 441
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
java.util.regex . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Pattern.compile() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Pattern.matcher() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
. . . . . . . . . . . . . . . . . . . . . . . . . 448
13
. . . . . . . . . . . . . . . . . . . . . . . . 449
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
split Pattern . . . . . . . . . . . . . . . . 471
split Pattern . . . . . . . . . . . . . . . 472
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
WIDTH HEIGHT <img> . . . . . . . . . 473
HTML
Matcher . . . . 475
CSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
1.4.2 1.5.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
1.5.0 1.6.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
9. .NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
.NET . . . . . . . . . . . . . . . . . . . . . . . . . . 482
.NET . . . . . . . . . . . . . . . . . . . . . . . . . . 485
.NET . . . . . . . . . . . . . . . . . . 490
. . . . . . 490
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
. . . . . . . . . . . . . . . . . . . . . . . . . . . 494
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Regex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Regex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
. . . . . . . . . . . . . . . . . . . . . . . . . 509
. . . . . . . . . . . . . . . . . . . . . . . . 510
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
.NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
14
preg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
preg_match_all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
preg_replace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
preg_replace_callback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
preg_split . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
preg_grep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
preg_quote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
preg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
preg_regex_to_pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
. . . . . . . . . . . . . . . . . . . 561
. . . . . . . . . . . . . . . . . . . . . . . . 562
. . . . . . . . . . . . . . . . . . . . . . . . . . . 563
. . . . . . . . . 563
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
. . . . . . . . . . . . . . . . . . . . . . . 566
PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
S: Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
CVS PHP . . . . . . . . . . . . . . . . . . . . . . . . . 569
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
.
, .
,
, ,
.
(
, , . .),
, Java Jscript, Visual Basic VBScript, JavaScript
ECMAScript, C, C++, C#, elisp, Perl, Python, Tcl, Ruby, PHP, sed
awk.
, .
,
.
. ,
, , ,
.
.
.
1996 .
, .
,
.
, ,
.
, , .
,
,
( )
16
.
. Perl, Python, Tcl, Java Visual Basic
.
( Ruby, PHP C#).
.
.
,
,
. 2002 ,
java.util.regex, .NET Frame
work Microsoft Perl 5.8.
. , , ,
PHP.
PHP
, .
PHP
; , ,
PHP
. ,
, Java,
Java 1.5 Java 1.6.
,
. ,
,
. ,
.
,
, . .
, (,
),
, , ,
, .
.
,
( ,
).
17
,
. ,
, .
,
, . ,
,
,
. .
,
. ,
,
, .
, ,
,
Perl, Java,
.NET PHP.
.
. ,
(, . 124) ,
.
,
.
,
, .
:
1
.
2
.
3 ,
.
4
.
18
5
4.
6 .
7 Perl.
8 java.util.regex
Java.
9
.NET.
10 preg,
PHP.
. ,
3
.
1
.
egrep. ,
,
.
, ,
.
2
,
.
,
.
,
,
.
3 :
,
.
, ,
, .
,
, .
.
19
,
.
,
? ?. ,
,
.
4
.
. ,
,
.
5
.
( )
.
6
, .
, 4 5,
, .
4, 5 6,
.
,
.
7 Perl Perl ,
. Perl
,
, . .
,
. ,
,
. ,
.
8 Java
java.util.regex,
Java 1.4.
20
Java 1.5,
1.4.2 1.6.
9 .NET
.NET (
Microsoft ).
VB.NET, #, C++,
JScript, VBScript, ECMAScript .NET.
10 PHP
,
PHP,
preg, PCRE.
(, , ).
,
.
: this.
,
. (, ,
) : this. ,
,
.
,
.
. , []
, [. . .] .
,
b.
, (
), .
, b .
, :
TAB
NL
CR
21
. :
... cat Itindicatesyourcat
is, cat, ...
.
,
:
... , (Sub
ject|Date .
: (Subject|Date): .
, 1200
, .
: 123,
123.
: ... . 8.2 (439).
,
. , ;
, ,
. , .
, . ,
.
, ,
,
.
. ,
, . ,
, .
, ,
.
, ,
,
, ,
URL :
http://regex.info/
22
,
, .
(
:) ).
, jfriedl@regex.info.
:
OReilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 9989938 ( )
(707) 8290515 (/)
(707) 8290104 ()
bookquestions@oreilly.com
, ,
OReilly Network :
http://www.oreilly.com
Safari Enabled
Safari Enabled,
, OReilly Net
work Safari Bookshelf.
Safari , .
,
, ,
,
. http://
safari.oreilly.com.
,
.
,
.
,
. , ;
,
, .
23
.
.
(Stephen Friedl)
,
. ( , ,
, Tech Tips http://
www.unixwiz.net/.)
(Zak Greant), (Ian Morse),
(Philip Hazel), (Stuart Gill),
. (William F. Maton) (Andy
Oram).
Java
madbot (Mike madbot McCloskey) (
Sun Microsystems, Google), (Mak
Reinhold) (Dr Cliff Click), Sun Mi
crosystems. .NET
(David Gutierrez), (Kit George)
(Ryan Byington). PHP
(Andrei Zmievski) Yahoo!
(Ken Lunde) Adobe Systems,
.1 Heisei Mincho W3,
Munhwa
.
: ,
.
http://regex.info
(Jeffrey Papen) Peak Web Hosting (http://
www.PeakWebhosting.com/).
. . .
:
(, this this).
,
. :
;
,
; ( Escape
ANSI)
.
,
, ,
.
,
(, The the) (,
, . .) .
, HTML.
HTML ,
: it is <B>very</B> very
important.
! ,
.
.
, ,
,
.
25
,
.
,
.
, ,
. ,
, ,
. ,
.
,
.
.
/
. ,
( ,
). ,
, .
Perl Java.
(Perl, Java, VB.NET . .)
, .
,
, ,
.
,
( , ,
. .).
,
, , .
,
( ,
).
, ,
,
. ,
,
,
.
26
1.
.
( 70 )
, SetSize
, ResetSize.
, (. .
setSIZE SetSize ). ,
32 000 , ,
.
,
.
!
,
. 15
2 . ! (
, , . 62.)
:
, ,
.
,
, .
, .
, .
!
( egrep,
) From: Subject: .
egrep, ( )
,
^(From|Subject):.
,
, 5000 !
. (sed)
,
.
.
. ,
,
.
, , .
,
, ,
.
27
,
, .1
. , ,
. , ,
.
, ,
^( From|Subject ):
.
.
,
. ,
,
.
,
.
, ,
, .
, .
, (
, report.txt). UNIX DOS/Windows
,
*.txt. ( (
, (wildcards))
, . (*)
, (?)
. , *.txt *
.txt.
,
.txt.
,
. , ,
.
,
. , , , HTML,
, ... .
1
TiVo, !
28
1.
, (,
) .
,
.
,
.
.
( * ) .
, . . , (
.
.
, ,
.
, ,
.
,
. ,
^ (From|Subject):
, From: Subject:.
, .
(
) .
,
. ,
!1
,
s!<emphasis>([09]+(\.[09])+){3})</emphasis>!<inet>$l</inet>!
1
!
: 3 , (
. ,
, , ,
.
,
, .
,
,
. ,
, !
29
.
Perl,
.
<emphasis> IP (
, , 209.204.146.22).
Perl, /
,
<emphasis>([09]+(\.[09])+){3})</emphasis>
<inet>, <emphasis>
. ,
,
.
<emphasis>
<inet>, ( (
. ,
.
,
.
,
. ,
.
, ,
. ,
, ,
.
(
), ,
, .
, .
. .
,
, ,
, , ,
.
30
1.
,
.
.
, , ,
, ,
.
: egrep
. egrep.
egrep
.
,
. egrep
, DOS, MacOS, Windows, UNIX . .
. ,
,
. 1.1. egrep
,
. : , . 1.1,
,
.1 egrep
.
, (
)
. ,
.
1
,
. , ,
. ,
. ,
, *
,
, egrep. , ,
. COMMAND.COM CMD.EXE
Windows .
31
egrep
,
egrep
. 1.1. egrep
, , ,
,
. , ^ |
.
,
, egrep,
.
, cat
, cat.
, ,
vacation.
cat, cat
vacation .
,
, egrep .
, egrep
,
, ,
.
egrep
, egrep
.
, .
,
.
, ,
.
,
.
32
1.
, ^ (,
) $ (),
. ,
cat cat
, ^cat
, cat ^
( )
. , cat$ cat
, ,
scat.
. , :
^cat , cat.
:
^cat , ,
, ,
t.
,
,
. ^cat$, ^$
^?
, .
^ $ ,
, . ,
.
, .
, grey,
gray. [],
(character class),
, . e
, a
, ea .
, gr[ea]y : g,
r, e a,
y.
. ,
sep[ea]r[ea]te, ,
33
egrep
[Ss]mith. ,
, smith ( Smith)
, blacksmith.
, .
, .
. ,
[123456] .
<[123456]>,
<1>, <2>, <3> . .
HTML.
((
) ; , <H[16]>
. [09] [az]
.
^cat$, ^$ ^
. 32.
^cat$ : , (
, ),
cat,
.
: cat
, ,
... cat .
^$
: , ,
.
: (
, ).
^
: , .
: !
, !
34
1.
,
[0123456789abcdefABCDEF] [09afAF] ( [A
Faf09], ).
.
: [09AZ_!.?]
, ,
, , .
:
. ,
.
, ,
. ,
, .
, [09AZ_!.?]
.
. ( )
.
.
[] [^],
, . , [^16]
, 1 6.
^ ,
, ,
, .
, ,
^, .
, . ,
; .
.
( ).
^
, ,
, (
).
,
; .
. ,
q u.
egrep
35
, q
q[^u]. .
, ! ,
.
:
% egrep 'q[^u]' word.list
Iraqi
Iraqian
miqra
qasida
qintar
qoph
zaqqum%
Qantas ( ) Iraq.
word.list,
. ? ,
, .
:
, ,
, . ,
, Iraq
.
,
, .
. ()
, .
,
. ,
, 19/03/76, 190376
19.03.76. ,
,
(/, .), 19[./]03[./]76.
19.03.76.
.
19[./]03[./]76 ,
( :
). ,
[ [^.
(, [./]),
, .
36
1.
19.03.76 ,
, /, ..
,
,
, , lottery numbers: 19 219303 7639.
, 19[./]03[./]76
, .
19.03.76 , .
? ,
. 35.
q[^u] Qantas Iraq?
Qantas ,
q , Qantas
. Q[^u],
, .
[Qq][^u] .
Iraq .
q, ,
u, .
, egrep
(, , !), q
. ,
u, .
,
.1 : egrep
( )
Iraq ,
, .
, : (
, , (
( .
1
, miss. : miss.
, , :
Miss M
.
.
.
egrep
37
.
. , ,
19.03.76
,
. ,
.
| .
,
. , Bob Robert
, a Bob|Robert ,
. , ,
(alternatives).
gr[ea]y. :
grey|gray gr(a|e)y.
(, , ).
gr[a|e]y
| , a e.
gr(a|e)y ,
gra|ey gra ey ,
. .
: (First]|1st)[Ss]treet.1 ,
First 1st st,
Fir|1)st[Ss]treet. ,
, , (first|1st) (fir|1)st
.
. ,
:
Jeffrey|Jeffery
Jeff(rey|ery)
Jeff(re|er)y
,
, :
(Geoff|Jeff)(rey|ery)
(Geo|Je)ff(rey|ery)
(Geo|Je)ff(re|er)y
, .
38
1.
,
( ) Jeffrey|Geoffery|Jeffery|Geoffrey.
.
, gr[ea]y gr(a|e)y
.
.
.
,
. , ,
(, ,
),
.
, .
, ^ $
. :
^From|Subject|Date: ^(From|Subject|Date):.
,
( , ).
; ^From, Subject,
Date: . ,
^ : .
:
^(From|Subject|Date):
,
: ,
From, Subject Date :.
:
1. , From, :
2. , Subject, :
3. , Date, :.
, ,
From:, Subject:, Date:, ,
.
:
% egrep '^(From<Subject<Date): ' mailbox
From: elvis@tabloid.org (The King)
Subject: be seein' ya around
Date: Mon, 23 Oct 2006 11:04:13
From: The Prez <president@whitehouse.gov>
Date: Wed, 25 Oct 2006 8:36:24
Subject: now, about your vote
.
.
.
egrep
39
,
.
(, Subject From),
,
DATE from . ,
.
From [Ff][Rr][Oo]
[Mm], from,
. ,
egrep
, . . .
;
, .
egrep
i.
:
% egrep i '^(From;Subject;Date): ' mailbox
, ,
:
SUBJECT: MAKE MONEY FAST
i
.
.
, ,
,
, .
cat, gray Smith. ,
egrep ,
egrep ,
( ).
\<
\>, egrep.
^ $ ,
, , . ^ $,
. \<cat\>
,
cat, . ,
40
1.
That
, \<
varmint' s
cost
, \>
me $ 199 . 95!
. 1.2.
cat.
\<cat cat\> ,
cat.
: < >
+
. .
,
,
.
, egrep
,
.
,
;
, . . 1.2
.
( egrep) ,
; , .
,
,
.
. 1.1 ,
.
.
,
,
( ). ,
, . ,
( ) ,
. , ^
,
[ .
41
egrep
1.1.
,
[]
[^]
^
$
\<
\>
+
+
*
*
*
egrep
|
()
|,
. [abc]
(a|b|c) ,
.
,
.
,
,
: \<(1,000,0001|million |thousandthou)\>.
, .
,
, . ,
[^x] , x,
,
x.
, .
,
[^x] .
i
( 39).1
, ,
,
, .
,
, , 39 .
42
1.
color colour.
u,
colou?r. ? ( ) ,
.
,
,
.
, ,
. , colou?r c,
o, l, o, u? r.
u? :
u,
. ,
?, .
, , ?,
. , semico
lon colo u?
( colo u, ).
r ,
semicolon colou?r.
, 4 (Ju
ly), July July Jul,
fourth, 4th 4. ,
(July|Jul)(fourth|4th|4),
.
, (July|Jul) (July?).
, .
| , .
,
, July?
. July?(fourth|4th|4).
. 4th|4
4(th)?. , ?
.
, .
? ( ,
) .
, July?(fourth |4(th)?.
, .
,
, (, ,
egrep
43
) .
, ,
. (
)
.
, .
:
+ () * ((
). +
, a *
( ). , *
, ,
. +
(
),
. ,
, ?,
+ *, ,
, .
*, ?, .
, ( )
. +,
.
, ?
, * .
. 33
<H[16]> . HTML1 ,
>
, <H3> <H4>. *
, ( )
, <H[16]*.
<H1>, ,
.
HTML
<HRSIZE=14>, ,
14 . <H3>,
.
, =. ,
1
HTML, . ,
,
. , HTML,
, .
44
1.
HR SIZE,
. *,
+. ,
.
*, .
<HR+SIZE*=*14*>.
, .
(, 14)
. 14
.
[09],
+, 14
[09]+.
, +, ? . .
.
<HR+SIZE*=*[09]+*>
,
. , egrep
i ( 39),
[Hh][Rr] HR.
<HR +SIZE *= *[09]+ *>
. ,
,
.
,
, , , j 4 ( ,
,
, egrep ).
. ,
,
<HR> ( , >
).
, ?
,
( ). , .
(
), , ?, * +
.
. 1.2.
:
,
( ).
.
45
egrep
1.2.
!
; (
)
;
(
)
;
(
)
egrep
: {min, max}.
. , {3,12} 12 ,
, 3 .
[azAZ]{1,5}
( ).
{0,1} ?.
egrep. ,
3
, .
: |
(
, ? *).
,
egrep ( GNU),
.
,
.
, .
,
, thethe. ,
thetheory, ,
46
1.
. 44.
,
, .
?.
,
: ()?
:
<HR(+SIZE*=*[09]+)?*>
: * .
,
<HR>. ,
SIZE.
, + SIZE
. ,
HR
SIZE. <HR> .
egrep
, . 39: \<thethe\>.
+,
.
. , :
. egrep
(backreferencing),
. ,
, ,
.
\<the+the\> the
[AZaz]+.
, ,
. , the
\1. \<([AZaz]+)+\1\>.
, ,
\1 (
) .
,
\1, \2, \3
. .
egrep
47
, ([a
z])([09])\1\2 \1 , [az],
a \2 , [09].
thethe [AZaz]+
the. ,
the \1
+ , \1
the. , \> ,
(
thetheft).
, . ,
(,
that ),
.
,
( egrep \<\> ).
Thethe, i,
. 39.1
:
% egrep i '\<([az]+) +\1\>'
,
!
,
.
. egrep
,
,
. ,
.
,
? , ega.att.com
ega.att.com
megawattcomputing. ,
: . .
1
, GNU egrep i
, .
, egrep the the, The the.
48
1.
, ,
\: ega\.att\.com.
\. (escaped) .
.1
.
+
, .
: (, (very))
\([azAZ]+\).
\ \( \)
() ,
.
\ ,
,
. , ,
egrep \<, \>, \1 . .
. .
,
. , , ,
.
,
egrep.
, .
.
, ,
, .
.
,
(flavors) .
.
,
egrep , \
, .
49
( egrep ),
.
, :
, ;
, .
, egrep ,
,
. , ,
:
zip Is 44272. If you write, send $4.95 to cover postage and
[09]+, ,
. (
( , ,
. .
), , ,
.
, ,
,
.
, . ,
.
,
, . . .
,
.
( . .),
,
.
[azAZ_][azAZ_09]*.
, (
*) .
, , 32 ,
{0,31},
{min, max} (
. 45).
50
1.
,
, , :
"[^"]*".
, ,
.
, ... !
[^"] , ",
, .
( )
",
\ , "nailthe2\"x4\"plank".
,
, .
( )
: \$[09]+(\.[09][09])?.
: \$, + ()?.
,
, .
(
), ,
.
. ,
,
$1000, $1,000.
,
egrep. egrep , .
.
, ,
,
^$.
, ( )
,
.
, $.49.
+ *, .
?
5 ( 245).
51
(URL) HTTP/HTML
URL ,
URL
.
URL
. ?
, URL,
, .
URL HTML/HTTP
:
http:///.html
.htm.
, , ( )
(. . , www.yahoo.com),
, ,
http:// ,
[az09_.]+.
,
[az09_:@&?=+,.!/~+%$]*. :
,
, ( 34).
,
:
% egrep i '\<http //[az09_.:]+/[az09_:@&?=+,.!/~*%$]*\.html?\>'
,
http://..../foo.html,
, , URL.
? , .
. ,
:
% egrep i '\<http://[^ ]*\.html?\>'
,
. URL
.
HTML
, ,
HTML, egrep.
HTML
,
.
52
1.
<TITLE> <HR>
<.*>.
. , .
<.*> , : <,
,
>. ,
, this <I>short</I> example.
,
,
. ,
,
.
, .
9:17 am 12:30 pm
. ,
[09]?[09]:[09][09](am|pm)
9:17am 12:30m,
99:99m.
, ,
. 1?[09]
19 ( 0 ),
: 1[012] [19] .
(1[012]|[19]).
.
[05], [09].
, (1[012]|[19]):[05][09](am|pm).
24 ,
0 23. ,
, , 09:59.
,
.
(regex)
,
(regular expression) ,
, .
regex ().
( FedEx, g (),
regular (), , Regina)
53
,
..., .
,
, .
, ,
, .
a cat,
.
, .
(
) ,
. , *
,
. ,
. , . ,
\* \\* (
\ ),
.
,
. 3 .
,
,
, , .
. egrep
\<\>, .
,
\b (
).
. ,
, .
.
.
\<\>, ,
. (
, .
. ,
,
54
1.
24
. 52.
,
.
: ( 00 09 , ),
( 10 19 ) ( 20 23 ).
: 0?[09]|1[09]|2[03].
, ,
: [01]?[09]|2[03].
,
. ,
, ,
.
[01]?[09]|2[03]
00 01 02 03 04 05 06 07 08 09
[01]?[49]|[012]?[03]
0
00 01 02 03 04 05 06 07 08 09
00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
10 11 12 13 14 15 16 17 18 19
20 21 22 23
20 21 22 23
. , (
)
. , ,
egrep, .
1990 Perl
,
Perl (
, Perl).
PHP, Python,
Java, Microsoft .NET Framework, Tcl,
. .
. Perl
(
, ).
.
(subexpression)
, ,
,
55
. , ^(Subject|Date): Subject|Date
. Subject
Date . S
, u, b j ...
16 [16]*,
. , H, 16 *
[16]*.
, (*, + ?)
. mis+pell
s, mis is. ,
,
( ) .
. ,
, .
, (
. ,
64 53 @ 5
ASCII, , , EBCDIC
(
).
.
.
, Latin1
` , 1
A
. : ,
,
, .
ASCII,
.
( 3, 140).
,
. , 3
.
1
(Ken Lunde) CJKV Information Processing.
CJKV , , (
, .
56
1.
, .
, ,
,
, .
,
.
a*((ab)*|b*) cid.
, ,
,
,
.
. ,
.
,
egrep. ,
. , ,
.
,
,
.
, ,
,
.
.
, .
,
. ,
,
.
, .
, ,
.
, ,
. ,
.
.
:
57
.
, egrep.
, ,
.
.
.
,
. (
)
.
, .
3.
.
( ) ,
, .
,
. ,
,
. 4, 5 6.
.
,
.
.
, : ,
, .
. ,
,
.
,
. , ,
.
2
. 3
(
). 4 . 5
. 6
, ,
.
, ( 4, 5 6),
.
58
1.
. 1.3 egrep,
.
1.3. egrep
,
[]
[^]
\,
(
)
( )
,
{min, max}
min ,
max
\<
/>
()
,
\1,\2,...
, ,
. .
egrep.
59
,
:
egrep .
.
( 38),
( 45) ( 37).
,
( 34).
. ,
( 37).
, .
,
, ( 34).
i
( 39).
:
1. \ + ,
(, \*
).
2. \ + ,
(, \<
).
3. \ + (
, \ ).
, egrep
.
, ? *,
. ,
( 43).
, ,
,
, egrep, .
, ,
60
1.
. ,
, ,
, . ., :
?
(schaff(
kopf) ,
. ,
, .
, ,
: ,
?
. ,
. , ,
,
.
, ,
. ,
,
. ,
. ,
,
.
2
?
, Perl
. , :
$/ = ".\n";
while (<>) {
next if !s/\b([az]+)((?:\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$3\e[m/ig;
s/^(?:[^\e]*\n)+//mg; # .
s/^/$ARGV: /mg;
# .
print;
}
, .
Perl, ,
(!). , , ,
egrep
.
:
\b([az]+)((?:\s|<[^>]+>)+)(\1\b)
^(?:[^\e]*\n)+
^
Perl,
(
) , PHP, Python, Java, Visual
Basic .NET, Tcl . .
, ^ ,
, egrep. , Perl egrep
.
, Perl (
62
2.
) .
.
(
; ;
HTML)
.
,
, .
,
egrep,
.
, PHP, Java VB.NET,
Perl. ( )
egrep,
.
Perl ,
. , Perl
,
.
, . 26,
, ResetSize
SetSize.
Perl, :
% perl 0ne 'print "$ARGV\n" if s/ResetSize//ig != s/SetSize//ig' *
( , , ,
).
Perl, .
: .
, :
,
Pascal1.
Perl,
,
( 7,
1
Pascal ,
. . (William F. Matton)
.
63
Perl, ).
, Perl
;
.
Perl
,
. ,
Perl. (
.
Perl
Perl 1980
, .
, awk sed,
Pascal.
Perl , DOS/Windows, Mac
OS, OS/2, VMS Unix.
WWW. ,
Perl ,
www.perl.com.
Perl 5.8,
5.005.
:
$celsius = 30;
$fahrenheit = ($celsius * 9 / 5) + 32; #
#
print "$celsius C is $fahrenheit F.\n"; #
:
30 C is 86 F.
( $fahrenheit $celsius)
( ).
# .
, #, Java VB.NET,
, Perl
, . "$celsius is $fahrenhe
it F.\n" .
( \n
).
Perl
:
64
2.
$celsius = 20;
while ($celsius <= 45)
{
$fahrenheit = ($celsius * 9 / 5) + 32;
#
#
, while,
, ( $celsius <= 45) .
(, temps),
.
:
% perl w temps
20 C is 68 F.
25 C is 77 F.
30 C is 86 F.
35 C is 95 F.
40 C is 104 F.
45 C is 113 F.
w
. Perl
,
(,
. . Perl ).
, .
, Perl
.
Perl
.
, .
$reply ,
:
if ($reply =~ m/^[09]+$/) {
print "only digits\n";
} else {
print "not only digits\n";
}
.
^[09]+$, m//
Perl, . m (
, /
65
.1 =~
m// , (
$reply).
=~ = ==. == ,
( ,
eq). = ,
$celsius = 20. , =~
, (
m/^[09]+$/,
$reply).
,
.
=~ , .
, :
if ($reply =~ m/^[09]+$/)
:
^[09]+$ ,
$reply, ...
$reply =~ m/^[09]+$/ true,
^[09]+$ $reply, false
. if
.
: $reply =~ m/[09]+/ ( ,
, ^ $) true,
$reply . ^$ ,
$reply .
.
, , .
,
.
:
print "Enter a temperature in Celsius:\n";
$celsius = <STDIN>; #
chomp($celsius);
# $celsius
if ($celsius =~ m/^[09]+$/) {
$fahrenheit = ($celsius * 9 / 5) + 32; #
1
m
. : $reply =~ /^[09]+$/.
, Perl,
. , m
,
.
66
2.
#
print "$celsius C is $fahrenheit F\n";
} else {
print "Expecting a number, so I don't understand \"$celsius\".\n";
}
\
print.
, .
,
. Perl
,
Java, Python .
( 71)
. VB.NET,
(\"),
("").
, c2f.
:
% perl w c2f
Enter a temperature in Celsius:
22
22 C is 71.599999999999994316 F
. , print
.
Perl,
,
printf:
printf "%.2f C is %.2f F\n", $celsius, $fahrenheit;
, ,
.
Perl
.
,
. ?
67
[+]?, .
, (\.[09]*)?. \.
, \.[09]*
,
. \.[09]*
( )?, (
\.?[09]*,
,
\.).
, :
if ($celsius =~ m/^[+]?[09]+(\.[09]*)?$/) {
,
, .
F. ,
[CF].
,
.
, egrep
\1, \2, \3,
,
( 45). Perl
,
,
.
, ,
( 178), Perl
$1, $2, $3 . ., ,
, . . .
,
. . Perl
.
68
2.
: \1
,
, $1
.
,
,
. , $1
, :
$celsius =~ m/^[+]?[09]+ [CF] $/
$celsius =~ m/^([+]?[09]+)([CF])$/
?
, :
*
|
,
.
,
. . 2.1, $1 , $2
(C F). . 2.2 ,
.
print "Enter a temperature (e.g., 32F, 100C):\n";
$input = <STDIN>; # .
chomp($input);
# $input .
if ($input =~ m/^([+]?[09]+)([CF])$/)
{
# . $1 , $2 "C" "F".
$InputNum = $1; # ,
$celsius =~ m/^([+]?[09]+)([CF])$/
$1
. 2.1.
$2
69
$type = $2;
# .
convert,
:
% perl w convert
Enter a temperature (e.g., 32F, 100C):
39F
3.89 C is 39.00 F
% perl w convert
Enter a temperature (e.g., 32F, 100C):
39C
39.00 C is 102.20 F
(
)
. 2.2. (
70
2.
% perl w convert
Enter a temperature (e.g., 32F, 100C):
oops
Expecting a number followed by "C" or "F",
so I don't understand "oops".
, Perl,
.
: ,
, f c
, .
98.6f.
,
(\.[09]*)?:
if ($input =~ m/^([+]?[09]+(\.[09]*)?)([CF])$/)
:
.
, .
,
,
.
,
$2. . 2.3.
,
[CF]
, . ,
, ,
, , $type $3
$2 (,
).
$1
$2
$3
$input =~ m/^([+]?[09]+(\.[09]*)?)([CF])$/
1#
2#
. 2.3.
3#
71
.
,
, *,
(
):
if ($input =~ m/^([+]?[09]+(\.[09]*)?) *([CF])$/)
,
,
,
(whitespace).
. ,
TAB * ,
: [ TAB ]*.
, (*| TAB *)?
,
.
TAB . ,
. [
]*,
, ,
.
Perl \t,
.
,
. , [ TAB ]* [\t]*.
: \n (
), \f ( ) \b (). , \b
, .
?
.
\n ,
, . Perl
,
.
(, , VB.NET
).
. , \t
,
\t ,
.
72
2.
: (?:)
. 2.3 (\.[09]*)?
,
? \.[09]*
. ,
,
$2, . ,
,
( ),
.
Perl,
.
() ,
(?:),
.
(?:,
.
? (
. 122 , ).
, :
if ($input =~ m/^([+]?[09]+(?:\.[09]*)?)([CF])$/)
, [CF]
,
$2, (?:)
.
.
(
6).
, .
, (?:)
.
?
, . ,
,
(
).
()
() ,
, .
73
. 71.
[ TAB ]* *| TAB *?
(*| TAB *) *, TAB *.
, ( ),
( ).
.
C , [ TAB ]* [ TAB ],
. TAB
, .
[ TAB ]* (| TAB )*,
, 4,
.
, ,
. \t
, ,
, ,
, .
. 1 egrep ,
, . egrep
,
. ,
, ,
.
,
,
( DOS ).
.
, egrep
(.
$, *, ? . .,
.
,
Perl
,
.
,
.
74
2.
\b? Perl
,
. ,
Perl
( ).
( \s)
[\t]*. ,
: \s.
\t, ,
\s
, .
, , .
, \s* , [\t]*.
\s* .
:
$input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/
, ,
.
: [CFcf]. ,
:
$input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/i
i . m//
Perl ,
. i ,
m// Perl,
.
, i
egrep ( 39).
, i ,
/i (
/ ). /i
Perl;
, ,
.
, /g (
) /x (
), .
:
% perl w convert
Enter a temperature (e.g., 32F, 100C):
75
32 f
0.00 C is 32.00 F
% perl w convert
Enter a temperature (e.g., 32F, 100C):
50 c
10.00 C is 50.00 F
! 50 ,
50 !
. , ?
:
if ($input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/i)
{
.
.
.
$type = $3; # $type $3
if ($type eq "C") { # 'eq'
.
.
.
} else {
.
.
.
, f
,
. , $type
C, ,
.
c, $type:
if ($type eq "C" or $type eq "c") {
, ,
:
if ($type =~ m/c/i) {
, .
.
,
.
print "Enter a temperature (e.g., 32F, 100C):\n";
$input = <STDIN>; # .
chomp($input);
# $input .
if ($input =~ m/^([+]?[09]+(\.[09]*)?)\s*([CF])$/i)
{
# . $1 , $3 "C" "F".
$InputNum = $1; # ,
$type = $3;
# .
76
2.
if ($type =~ m/c/i) { # Is it "c" or "C"?
# ,
$celsius = $InputNum;
$fahrenheit = ($celsius + 9 / 5) + 32;
} else {
# , . .
$fahrenheit = $InputNum;
$celsius = ($fahrenheit 32) * 5 / 9;
}
# , :
printf "%.2f C is %.2f F\n", $celsius, $fahrenheit;
} else {
# , .
print "Expecting a number followed by \"C\" or \"F\",\n";
print "so I don't understand \"$input\".\n";
}
Perl,
,
:
1. Perl
egrep;
. Perl
,
. (Java, Python, .NET Tcl)
, Perl.
2.
Svariable =~ m/ _/. m
(match), /
, .
, true false.
3. ( )
.
, ,
.
( , ,
. .),
Perl, PHP, Java, Tcl, GNU Emacs, awk, Python
. ,
.
4. ,
Perl, (
).
77
(,
, , . .)
\S , \s
\w [azAZO9_] ( \w+
)
\W , \w, . . [^azAZ09_]
\d [09], . .
\D , \d, . . [^09]
5. /i
.
/i,
i.
6.
(?:).
7. $1, $2, $3
,
.
(
;
).
, 1.
, (Washington(DC)?). (
) ()
,
.
\t
\n
\r
\s
, ,
, , .
, ,
Perl .
, $var =~ m//
true
false . $var =
=~ s///
: $var,
.
78
2.
, m//,
( /)
, . ,
( ,
$1, $2 . .,
).
, $var =~ s///
$var (, ,
). ,
$var JeffFriedl,
$var =~ s/Jeff/Jeffrey/;
$var JeffreyFriedl.
$var JeffreyFriedl,
JeffreyreyFriedl.
.
, egrep
. s///,
m//, (, /i . . 74).
.
?
$var =~ s/\bJeff\b/Jeff/i;
:
,
. ,
,
:
Dear =FIRST=,
You have been chosen to win a brand new =TRINKET=! Free!
Could you use another =TRINKET= in the =FAMILY= household?
Yes =SUCKER=, I bet you could! Just respond by.....
,
:
$given = "Tom";
$family = "Cruise";
$wunderprize = "100% genuine faux diamond";
:
$letter =~ s/=FIRST=/$given/g;
79
$letter =~ s/=FAMILY=/$family/g;
$letter =~ s/=SUCKER=/$given $family/g;
$letter =~ s/=TRINKET=/fabulous $wunderprize/g;
, ,
, .
Perl
, . ,
s/=TRINKET=/fabulous $wunderprize/g
, "fabulous $wunderprize".
,
,
(,
).
/g .
, s///
( ).
/g ,
, .
:
Dear Tom,
You have been chosen to win a brand new fabulous 100% genuine faux di
amond! Free!
Could you use another fabulous 100% genuine faux diamond in the Cruise
household?
Yes Tom Cruise, I bet you could! Just respond by.....
:
,
Perl
. 9,0500000037272. ,
9,05,
Perl ,
.
printf
,
, . ,
, 1/8, .125,
.
:
, ,
. .
12.3750000000392 12.375
12,375, 37.500 37,50. ,
.
80
2.
. 78.
$var =~ s/\bJeff \b/Jeff/i?
, ,
.
\bJEFF\b, \bjeff\b \bjEfF\b, ,
.
/i Jeff .
Jeff,
( /i ,
, 7).
jeff
Jeff ( ).
? ,
, $price,
$price =~ s/(\.\d\d[l9]?)\d*/$l/
(: \d, . 77,
).
\. ,
. \d\d
, . [19]?
, .
,
,
$1. $1
. ,
. $1
.
.
, . . \d*
.
; 4,
.
.
,
.
, .
81
Enter
. ,
. , ,
sysread read. ,
.
,
:
% perl i 's/sysread/read/g' _
Perl s/sysread/read/g. (,
, e,
.) p ,
, i
,
.
: ($var =~ ),
p
. , /g
.
,
; Perl
.
. Perl,
,
.
. ,
.
,
. ,
,
.
(, )
, . ,
, mkreply,
king.in,
:
% perl w mkreply king.in > king.out
( : w
, 64.)
82
2.
From elvis Thu Feb 29 11:15 2007
Received: from elvis@localhost by tabloid.org (8.11.3) id KA8CMY
Received: from tabloid.org by gateway.net (8.12.5/2) id N8XBK
To: jfriedl@regex.info (Jeffrey Friedl)
From: elvis@tabloid.org (The King)
Date: Thu, Feb 29 2007 11:15
MessageId: <2007022939939.KA8CMY@tabloid.org>
Subject: Be seein' ya around
ReplyTo: elvis@hh.tabloid.org
XMailer: Madam Zelda's Psychic Orb [version 3.7 PL92]
Sorry I haven't been around lately. A few years back I checked
into that ole heartbreak hotel in the sky, ifyaknowwhatImean.
The Duke says "hi".
Elvis
, king.out :
To: elvis@hh.tabloid.org (The King)
From: jfriedl@regex.info (Jeffrey Friedl)
Subject: Re: Be seein' ya around
On
|>
|>
|>
|>
.
( elvis@hh.tab
loid.org, ReplyTo ),
(The King), , .
,
.
:
1. ;
2. ;
3. |>.
, ,
. , Perl
<>.
($variable =
<>), .
,
Perl ( king.in ).
83
<> > _
Perl >=/<=.
getline() Perl.
, <>
(
false),
:
while ($line = <>) {
... $line ...
}
,
. ,
; .
, :
#
while ($line = < >) {
if ($line =~ m/^\s*$/) {
last; # while,
}
... ...
}
... ...
.
.
.
, ,
^\s*$.
( ),
( ,
),
.1 last while,
.
, , ,
, .
.
,
:
1
,
,
. ^ $
( ).
,
, $line
.
84
2.
if ($line =~ m/^Subject: (.*)/i) {
$subject = $1;
}
, Sub
ject:. ,
.* .
.* ,
$1.
$subject. ,
(
), if ,
$subject .
Date ReplyTo:
if ($line =~ m/^Date: (.*)/i) {
$date = $1;
}
if ($line =~ m/^ReplyTo: (.*)/i) {
$reply_address = $1;
}
From: . ,
, From:, ,
From. :
From: elvis@tabloid.org (The King)
, ,
. .
, ^From:(\S+).
, \S ,
( 77), \S+
( ).
.
, . ,
, \(
\) ( ,
).
!
[^()]* (:
;
, ).
, , :
From:(\S+)\(([^()]*)\)
,
. 2.4 .
85
.*
.*
, (
,
). ,
, .
.
,
,
.
( 52),
4 ( 210).
, . 2.4, ,
$2,
$1:
if ($line =~ m/^From: (\S+) \(([^()]*)\)/i) {
$reply_address = $1;
$from_name = $2;
}
Rep
lyTo, $1
. Reply
To, $reply_address .
:
while ($line = <>)
{
if ($line =~ m/^\s*$/ ) { # ...
last; # 'while'.
}
,
#
$2
. 2.4. ; $1 $2
86
2.
if ($line =~ m/^Subject: (.*)/i) {
$subject = $1;
}
if ($line =~ m/^Date: (.*)/i) {
$date = $1;
}
if ($line =~ m/^ReplyTo: (\S+)/i) {
$reply_address = $1;
}
if ($line =~ m/^From: (\S+) \(([^()]*)\)/i) {
$reply_address = $1;
$from_name = $2;
}
}
, ,
.
.
while :1
print
print
print
print
Re:,
.
:
print "On $date $from_name wrote:\n";
( )
|>:
while ($line = <>) {
print "|> $line";
}
, $line
.
,
:
$line =~ s/^/|> /;
print $line;
1
, , Perl
@ (107).
87
^ , ,
. , ,
|>, . . |> .
, (
) .
,
, ,
. , ,
, a Perl .
Perl ,
,
.
, .
From: ;
.
, $from_name
.
/,
(
) :
if (
not defined($reply_address)
or not defined($from_name)
or not defined($subject)
or not defined($date) )
{
die "couldn't glean the required information!";
}
Perl defined ,
, die
.
, , From:
ReplyTo:. From:
, $reply_address,
ReplyTo:.
, ...
,
,
.
88
2.
Pascal,
,
Pascal
Perl, Pascal!
,
, .
!
.
print "The US population is $pop\n";
( )
:
89
,
, . ,
, .
Jeffrey :
by Jeffrey Friedl.
, (?=Jeffrey), :
by Jeffrey Friedl.
,
, ,
. ,
, (,
Jeff), .
(?=Jeffrey)Jeff, . 2.5,
Jeff , Jeffrey.
by Jeffrey Friedl.
Jeff,
by Thomas Jefferson
Jeff ,
, (?=Jeffrey),
.
,
.
,
.
, (?=Jeffrey)Jeff
Jeff(?=rey).
Jeff , Jeffrey.
(?=Jeffrey)Jeff
. 2.5. (?=Jeffrey)Jeff
90
2.
. Jeff(?=Jeffrey)
Jeff,
Jeffrey.
,
, . ,
(?:) (
. 72),
.
,
(?. ,
, .
(?:),
(?=) (?<=) .
,
.
Jeffs Jeffs.
,
s/Jeffs/Jeff's/g ( /g
. 79).
: s/\bJeffs\b/Jeff's/g.
s/\b(Jeff)(s)\b/$1'$2/g,
, s/\bJeffs\b/
Jeff's/g. :
s/\bJeff(?=s\b)/Jeff'/g
,
s\b .
. 2.6 ,
.
s.
Jeff ,
. ,
s\b (. . Jeff s
). s\b ,
, s
. : Jeff
,
. ,
.
91
\b Jeff (?=s\b)
. 2.6. \b Jeff(?=s\b)
Jeffs,
Jeff.
,
? ,
.
. :
Jeffs,
,
Jeff, .
s , .
.
,
, . ,
,
. .
s
.
Jeff ?
(?<=\bJeff) (?=s\b) (. 2.7),
: , Jeff,
s..
, . ,
, :
s/(?<=\bJeff)(?=s\b)/'/g
.
,
.
.
, s/^/|>/
|>.
92
2.
(?<=\b Jeff)(?=s\b)
,
? ,
s/(?=s\b)(?<=\bJeff)/'/g?
, .
. . 2.1
Jeffs Jeff's.
2.1. Jeffs
s/\b Jeffs\b/Jeff's/g
,
; ,
.
Jeffs
.
s/\b(Jeff)(s)\b/$1'$2/g
.
Jeffs.
s/\bJeff(?=s\b)/Jeff'/g
s ,
.
s/(?<=\bJeff)(?=s\b)/'/g
.
(
, .
.
s/(?=s\b)(?<=\bJeff)/'/g
,
.
,
.
93
,
. ,
Jeffs ,
.
/i? : , .
, .
, , Jeffs
:
, .
:
,
, .
.
,
(?<=\d).
.
\d\d\d.
()+, ,
$,
. (\d\d\d)+$
, ,
(?=) ,
, 123 456 789.
,
(?<=\d).
:
$ =~ s/(?<=\d)(?=(\d\d\d)+$)/,/g;
print "The US population is $pop\n";
94
2.
. 92.
?
(?=s\b) (?<=\bJeff) .
, ( ,
, ),
,
. , Thoma s
Jeff erson (?=s\b) (?<=\bJeff)
( ),
,
.
,
. ,
,
. 4 ,
.
.
, ()
, (?:) .
.
,
. (?:) , ,
,
( ).
,
, . :
$text = "The population of 298444215 is growing";
.
.
.
Stext =~ s/(?<=\d)(?=(\d\d\d)+$)/./g;
print "$text\n";
, $
, .
,
, ,
; ...of 2,9,8,4,4,4,215 is...!
95
$ \b?
, Perl .
\w ( 77), Perl
,
. ,
,
(, ), (
, ), .
, ,
, ? Jeffs.
,
. , ,
, (
(
, .
. 2.2,
. ,
.
, ,
\w, , \w,
(?<!\w)(?=\w),
(?<=\w)(?!\w).
, (?<!\w)(?=\w)|(?<=\w)(?!\w)
. 93.
Jeffs /i?
,
(. .
Jeff's)
. . 2.1,
$1 $2
.
, .
,
.
.
/i.
JEFFS Jeff's Jeff'S .
96
2.
\b.
\b,
,
( 174).
2.2.
, ...
(?<=)
(?<!)
(?=)
(?!)
, (?!\d),
. \b $,
:
$text =~ s/(?<=\d)(?=(\d\d\d)+(?!\d))/,/g;
...tone of
12345Hz, . ,
...the 1970s.... ,
...in 1970 ..., .
, ,
, (
, ).
, (?!\w)
(?!\d). , \D (,
. 77) , (?!\d)
. .
, ,
. , \D
( . 36).
( )
, .
, Perl ,
.
,
. :
97
$text =~ s/(\d)(?=(\d\d\d)+(?!\d))/$l,/g;
,
,
\d, ,
$1.
?
(?!\d) \b, ,
,
? , :
$text =~ s/(\d)((\d\d\d)+\b)/$l,$2/g;
,
.
HTML
HTML. ,
, ,
,
.
, ,
,
. (
) ,
. Perl
:
undef $/; #
$text = <>; # ,
, :
This is a sample line.
It has three lines.
That's all
$text :
This is a sample line. NL Hit has three lines. NL That's all NL
:
This is a sample line. CR
NL Hit
NL That's
all CR
NL
, ( , Windows)
+ .
,
.
98
2.
. 97.
$text =~ s/(\d)((\d\d\d)+\b)/$1,$2/g
?
, ,
298,444215. , ,
(\d\d\d)+,
,
,
/g.
,
. ,
,
,
.
,
. ,
.
.
(, while),
.
(
/g). :
while
#
#
#
}
( Stext =~ s/(\d)((\d\d\d)+\b)/$l,$2/g ) {
,
.
&, < >
, HTML: &, < >.
HTML ,
.
HTML:
$text =~ s/&/&/g; # HTML
$text =~ s/</<:/g; # &, < >
$text =~ s/>/>:/g; # HTML
/g,
(
).
99
&,
.
HTML <p>. ,
.
.
$text =~ s/^$/<>/g;
:
, .
, . 33,
egrep, ,
.
Perl ,
,
.
, . 83, ^ $
,
.1 ,
,
.
,
,
^ $
, . Perl
/m:
$text =~ s/^$/<>/mg;
/m /g (
).
.
, chap
ter. NL NL Thus, chapter. NL <p> NL Thus.
,
. ^*$
, ^[\t\r]*$ ,
,
1
$ ,
, .
. 169.
100
2.
.
^$ , ,
^$ .
, ,
( ).
\s (. 74),
^\s*$,
. 83. [\t\r] \s,
\s
: , ,
,
, .
, ^\s*$ . ,
<>
, .
, $text
with. NL
NL NL TAB
NL Therefore
$text =~ s/^[ \t\r]*$/<p>/mg;
:
with. NL <p> NL <p> NL <p> NL Therefore
$text =~ s/^\s*$/<p>/mg;
:
with. NL <p> NL Therefore
^\s*$.
HTML
mailto , jfriedl@oreilly.com
: <href="mailto:jfriedl@oreilly.com>jfriedl@oreilly.com</a>.
.
;
,
.
_@_.
101
, ,
:
$text =~ s/\b(_\@_)\b/< href="mailto:$l">$l<\/a>/g;
\
</>.
. \@ ( 107),
, Perl @
.
\ /
. ,
Perl s///,
/. /
, , Perl
,
. ,
</>, <\/a>,
.
, , Perl
, s!!! s{}{}.
/
, .
,
.
.
\b\b.
, jfriedl@oreil
ly.compiler. ,
,
. ,
. ,
: <href="mailto:$1">$1</a>.
.
,
_ _. (
regex.info www.oreilly.com) ,
com, edu,
info, uk . .
\w+\@\w+(\.\w+)+,
\w+.
, .
(
, , ),
102
2.
\w+ \w[.\w]*.
, \w,
. :
,
, az.
.\w
,
, .
.
, ,
.
, ,
\w+(\.\w+)+, [\w.]+,
.....
Artichokes 4@1.00, .
, \w+(\.\w+)*\.(com|edu|
info).
com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[az][az],
,
. \w+,
\.\w+,
, .
, \w
. ASCII, ,
,
ASCII (, , , . .),
.
. [azAZ09]
[az09] /i (
). ,
[az09] ( ,
). ,
[az09]+(\.[az09]+)*\.(com|edu|info).
, ,
. ,
[az09]+(\.[az09]+)*\.(com|edu|info)
run C:\\startup.command at startup,
, ,
, . ,
$text = s{\b(_\@_)\b}{< href="mailto:$l">$l</a>}gi;
( ,
/i),
103
. , Perl
, !
/x, :
$text =~ s{
\b
# $1...
(
_
\@
_
)
\b
}{< href="mailto:$1">$l</a>}gix;
! / (
/g /i) ,
. ,
,
, . ,
, #.
, /x
, # , :
( 147).
( ,
/), #
, ,
. ,
\s, m/<a \s+ href=>/x.
, /
, .
, s{}{},
} (
, }), x
/x.
,
,
. :
undef $/: #
$text = <>; # ,
$text =~ s/&/&/g;
$text = s/</<:/g;
$text =~ s/>/>/g;
# HTML ...
# &, < >
# HTML
Stext =~ s/^\s*$/<p>/mg; #
104
2.
#
Stext =~ s{
\b
# $1...
(
\w[.\w]*
#
\@
[az09]+(\.[az09]+)*\.(com|edu|info) #
)
\b
}{< href="mailto:$1">$l</a>}gix;
print Stext; #
,
, : /m
,
^ $. ,
(
, ).
HTTP URL
HTTP URL
. http://www.yahoo.com
<ahref="http://www.yahoo.com/">http://www.ya
hoo.com</a>.
HTTP URL http://xoc/ny,
/ . ,
:
$text =~ s{
\b
# URL $1 ...
(
http://
(
/
)?
)
}{<a href="$1">$1</a>}gix;
,
. ;
[az09_:@&?=+,.!/~*'%$]*
( 51), ASCII, ,
(< > ( ) { } . .).
Perl,
@ $.
105
( 107).
, :
$text =~ s{
\b
# URL $1 ...
(
http:// [az09]+(\.[az09]+)*\.(com|edu|info) \b #
(
/ [az09_:\@&?=+,.!/~*'%\$]* #
)?
)
}{<a href="$1">$1</a>}gix;
\b URL
:
http://www.oreilly.com/catalog/regex3/
\b ,
.
, ,
URL. :
Read "odd" news at http://dailynews.yahoo.com/h/od, and
maybe some tech stuff at http://www.slashdot.com!
, ,
URL. URL
.,?! ( ,
).
[.,?!] (. . (?<![.,?!]))
. ,
URL
,
. ,
URL,
. ,
, (
5, 258).
:
undef $/; # ""
$text = <>; # , .
$text =~ s/&/&/g; # HTML ...
$text =~ s/</</g; # &, < >
$text =~ s/>/>/g; # HTML
106
2.
$text =~ s/^\s+$/<p>/mg; # .
# ...
$text =~ s{
\b
# $1 ...
(
\w[.\w]*
#
\@
[az09]+(\.[az09]+)*\.(com;edu;info) #
)
\b
}{<a href="mailto:$1">$1</a>}gix;
# HTTP URL ...
$text =~ s{
\b
# URL $1 ...
(
http:// [az09]+(\.[az09]+)*\.(com;edu;info) \b #
(
/ [az09R:\@&?=+,.!/~+'%\$]* #
(?<![.,?!])
# [.,?!]
)?
)
}{<a href="$1">$1</a>}gix;
print $text; # .
:
. ,
.
, $HostnameRegex,
:
$HostnameRegex = qr/[az09]+(\.[az09]+)*\.(com|edu|info)/i;
# ...
$text =~ s{
\b
# $1...
(
\w[.\w]*
#
\@
$HostnameRegex #
)
\b
}{<a href="mailto:$1">$1</a>}gix;
# HTTP URL ...
$text =~ s{
107
\b
# URL $1...
(
http:// $HostnameRegex \b
#
(
/ [az09_:\@&?=+,.!/~*'%\$]* #
(?<![.,?!]) # [.,?!]
)?
)
}{<a href="$1">$1</a>}gix;
Perl qr. m s,
(. . qr// m// s///),
, ,
,
.
( , ,
$HostnameRegex, ).
,
. :
,
.
6 ( 337),
7 ( 366).
;
, Java .NET
8 9.
$ @
, , $
, (. .
) . $
, .
$ ,
Perl
, .
$ .
, $
URL .
@. @ Perl
,
. , @
, ;
.
108
2.
, 1
.
, :
$/ = ".\n";
while (<>) {
next if !s/\b([az]+)((?:\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$3\e[m/ig;
s/^(?:[^\e]*\n)+//mg; # .
s/^/$ARGV: /mg;
# .
print;
}
, Perl,
<>, s/// print.
! Perl (
),
, .
, .
,
, 1,
:
% perl w FindDbl ch01.txt
ch01.txt: check for doubled words (such as this this ), a common problem with
ch01.txt: * Find doubled words despite capitalization differences, such as with
ch01.txt: ' The the ', as well as allow differing amounts of whitespace (space,
ch01.txt: tabs, /\<(1,000,000;million| thousand thousand)/. But alternation
ch01.txt: can't be of this chapter. If you knew the the specific doubled word
ch01.txt: to find (such
.
.
.
Perl.
Java
.
s{}{}(
/x,
( next if ! next unless).
.
$/ = ".\n";
while (<>)
# ;
# " "
109
{
next unless s{
### :
\b
# ....
( [az]+ )
# , $1 ( \1).
### / <...>
(
# $2.
(?:
# ( )
\s
# ( ).
|
#
<[^>]+> # <TAG>.
)+
#
# .
)
### :
(\1\b)
# \b .
# $3.
#( )
}
# /i, /g /x
{\e[7m$1\e[m$2\e[7m$3\e[m}igx;
s/^(?:[^\e]*\n)+//mg; # .
s/^/$ARGV: /mg;
# .
print;
}
,
. , ,
.
Perl ( 7).
, , :
Perl,
.
,
,
, .
$/ (,
!) <>
, ,
, .
,
.
, , <>,
? while <>
110
2.
.1 ,
s/// print.
, ,
,
.
next unless Perl
( ),
.
, .
"$1$2$3"
Escape ANSI,
(
, ). \[7m
, \[m (\
Perl
, Escape ANSI).
,
, "$1$2$3" . ,
Escape,
( )
.
, $1 $3 (,
!),
.
,
.
,
,
Escape \.
, .
/m
^([^\]*\n)+ ,
\, .
,
\, . . .2
1
$_ (, !),
.
,
ANSI.
.
111
$ARGV .
/m /g
,
.
, print Es
cape ANSI. while
( , ,
), .
,
, Perl
,
. , ,
,
.
,
Perl
,
. , ,
,
+ ,
,
.
. ,
3 ( 126),
.
, ,
, ,
, ; true
false ,
. , (
)
,
Java. ja
va.util.regex Java 1.4.
,
Perl,
Pattern.compile. , Java
\,
Java,
.
\, ,
, Java
( 71).
112
2.
, ,
, ,
. Pattern.compile
, Pattern (re
gexl . .)
regexl.matcher(text),
.
, :
.
Java
import java.io.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class TwoWord
{
public static void main(String [] args)
{
Pattern regex1 = Pattern.compile(
"\\b([az]+)((?:\\s<\\<[^>]+\\>)+)(\\1\\b)",
Pattern.CASE_INSENSITIVE);
String replace1 = "\033[7m$1\033[m$2\033[7m$3\033[m";
Pattern regex2 = Pattern.compile("^(?:[^\\e]*\\n)+", Pattern.MULTILINE);
Pattern regex3 = Pattern.compile("^([^\\n]+)", Pattern.MULTILINE);
// ....
for (int i = 0; i < args.length; i++)
{
try {
BufferedReader in = new BufferedReader(new FileReader(args[i]));
String text;
// .....
while ((text = getPara(in)) != null)
{
//
text = regex1.matcher(text).replaceAll(replace1);
text = regex2.matcher(text).replaceAll("");
text = regex3.matcher(text).replaceAll(args[i] + ": $1");
//
System.out.print(text);
}
} catch (IOException e) {
System.err.println("can't read ["+args[i]+"]: " + e.getMessage());
}
}
}
//
static String getPara(BufferedReader in) throws java.io.IOException
{
StringBuffer buf = new StringBuffer();
String line;
while ((line = in.readLine()) != null &&
(buf.length() == 0 ;; line.length() != 0))
{
buf.append(line + "\n");
}
return buf.length() == 0 ? null : buf.toString();
}
}
113
3
:
,
, .
, ,
.
egrep Perl Java
, ,
.
.
.
.
,
, ,
.
.
,
,
.
,
.
,
,
115
.
; . 58
. ,
, .
.
(, ,
),
(,
).
.
(,
),
, .
: ?
? ?
?
(, , )?
,
.
,
. ,
, ,
: ,
.
,
.
,
.
,
.
,
. ,
, .
.
,
.
116
3. :
, .
, .
, ,
, , (
) . ,
.
.
( )
. . (
) .
,
,
.
, ,
.
1940 .
, (Warren McCulloch)
(Walter Pitts),
.1
, (Step
hen Kleene)
, (regular sets).
, .
1950 60
. (Robert Cons
table) 2 .
1
117
,
,
, ,
Regular Expression Search Algorithm 1968
.1
, IBM 7094.
qed ,
UNIX ed.
ed
qed,
. ed
,
. , g/Regular Expression/p,
Global Regular Expression Print (
).
, .
grep,
egrep.
grep
, ,
egrep.
* , + ? (
).
grep \(\),
.2 grep ,
. ^
, ,
.
. ,
$ .
end$|^start. , ,
!
. , grep
, *
1
: ed ( grep)
, ,
, .
118
3. :
, ,
. , grep
, .
, grep
.
grep
grep ,
,
30
. ,
. grep
.
AT&T Bell Labs grep ,
\{min, max\},
lex. y,
,
.
, y i. , *
.
egrep
(Alfred Aho), AT&T
Bell Labs, egrep,
, 1. ,
(
) . egrep
+ ?,
,
egrep.
,
, . .
. , egrep
,
,
. .
(
awk, sed lex). ,
,
. , , .
, grep
,
+, grep
119
, .
\+ ,
.
. ,
. ,
,
,
, , .
, (
).
POSIX
POSIX (Portable Operating System Interface
) 1986 .
.
, ,
. ,
, ,
POSIX. POSIX
: BRE (basic regular exp
ressions, . . ) ERE (extended re
gular expressions, . . ). PO
SIX .
POSIX . 3.1.
3.1.
POSIX
, ^, $, [], [^]
BRE
ERE
+ ?
+ ?
\{min, max\}
\(\)
{min, max}
()
\1 \9
POSIX (
(locale) ,
: ,
120
3. :
, . .
. ,
. ,
Latin1 (ISO88591), ( 224 160
) .
, ,
.
\w,
( [azAZ09_]).
POSIX , .
\w ,
, ,
ASCII.
,
. . 141.
1986 (Henry Spencer)
, .
(
) . ,
( , ),
.
Perl
(Larry Wall)
, Perl.
patch,
, Perl
.
Perl 1987 .
;
,
, .
Perl
sed awk
.
rn
, . , rn
Emacs
121
Java.
,
. Java 1.4
, 8.
122
3. :
,
\G, 171,
( 72),
( 184), ( 90) /1 ( 102).
,
,
.
(
)
. (), [], <> {}
, (?,
.
, Perl
, .
;
, (?,
.
Perl ,
, .
,
,
. ,
, Perl Porters,
.
, ( 88),
( 180) .
( 182)
. ,
,
( 393). Perl 5.8.8.
Perl 5
World Wide Web. Perl
, ,
, Perl
1
, / ,
. . ,
Perl ,
/.
123
. Perl ,
.
Perl.
, Perl
.
Tcl, Python, Microsoft .NET, Ruby, PHP, C/C++
Java.
1997 (
),
(Philip Hazel) PCRE (Perl Compatible Regular
Expressions ,
Perl) ,
Perl. PCRE
.
PCRE
, PHP, Apache Version 2, Exim, Postfix Nmap.1
,
. 3.2
, .
(,
).
3.2. ,
GNU awk 3.1
Procmail 3.22
Python 2.3.5
PCRE 6.6
Ruby 1.8.4
flex 2.5.31
Perl 5.8.8
MySQL 5.1
PHP ( preg)
5.1.4 / 4.4.3
Tcl 8.4
, , .
. 3.3
.
1
PCRE : ftp://ftp.csx.cam.ac.
uk/pub/software/programming/pcre/.
124
3. :
3.3.
!
grep
! GNU
Emacs
egrep
Tcl
Perl
.NET
Java
Sun
+, ^, $, []
? + |
\? \+ \|
\(\)
(?:)
? + |
()
? + \|
\(\)
? + |
()
? + |
()
? + |
()
? + |
()
\< \>
\b, \B
\b, \B
\w, \W
.
, ,
,
.
, . ,
Tl
, .
[:<:] [:>:],
,
\m, \M \y ( ,
).
, grep egrep,
. ,
(, GNU
).
, ,
.
, Perl, .NET Java
, .
, . 3.3.
* ,
?
125
?
? (NUL)?
(. . ,
)?
?
?
?
? ,
(
)?
,
?
?
\123? ,
?
?
?
\w
? (
\w, . 3.3,
.) \w
,
?
?
, . 3.3,
. ,
, .
,
, .
, , , egrep
(Jul|July), GNU Emacs \(Jul\|Ju
ly\), , .
(
, ) ,
,
:
Jul July.
,
126
3. :
, (July|Jul) \(July|Jul\),
. , .
, ,
, ,
.
Perl egrep ,
Perl
. ,
.
, ,
, :
, , . egrep
,
.
(, , 1)
, egrep.
(,
)
, ,
.
, ,
.
egrep,
( , ),
,
.
(
(
) (
).
, .
: ,
.
,
Perl.
.
,
.
.
(
127
Perl, . 83:
if ($line =~ m/^Subject: (.*)/i) {
$subject = $1,
}
;
,
. Perl
^Subject:(.*) , $line,
. $1 ,
.
$subject.
,
UNIX procmail.
, .
, Perl, ()
.
,
.
, ,
.
, , ,
.
, ,
.
.
.
, (
) (
).
,
128
3. :
,
.
Java, VB.NET,
PHP Python.
Java
Subject
Java java.util.regex ( Java
8).
import java.util.regex.*; //
..
.
Pattern r = Pattern.compile("^Subject: (.*)", Pattern.CASE_INSENSITIVE);
Matcher m = r.matcher(line);
if (m.find()) {
subject = m.group(1);
}
,
;
, .
, ,
.
,
java.util.regex Sun: Pattern Matcher.
:
,
. Pattern.
,
. Matcher.
. find
, ,
.
,
, .
( )
, . Perl
, Java
.
Java, Sun,
,
129
.
,
;
.
Pattern.matches():
if (! Pattern.matches("\\s*", line))
{
// ... ...
}
^$
. ,
,
Sun.
(
, ),
(
)
(
,
6).
Java,
Sun
. ,
,
:
if (! line.matches("\\s*", ))
// ... ...
}
,
, ,
.
VB
.NET
,
,
, .
Subject VB.NET ( .NET
9):
Imports System.Text.RegularExpresslons '
'
130
3. :
.
.
.
Dim R as Regex = New Regex("^Subject: (.*)", RegexOptions.IgnoreCase)
Dim M as Match = R.Match(line)
If M.Success
subject = M.Groups(1).Value
End If
Java,
, .NET
Value . ?
,
, ,
( ).
.NET ,
. :
If Not Regex.IsMatch(Line, "^\s*$") Then
// ... ...
End If
Pattern.matches Sun,
^$,
Microsoft .
,
( ).
PHP
Subject PHP
preg,
. ( PHP 10.)
if (preg_match('/^Subject: (.+)/i', $line, $matches))
$Subject = $matches[1];
Python
Subject
Python:
import re;
.
.
.
R = re.compi1e( "^Subject: (.*)" , re.IGNORECASE);
M = R.search(line)
if M:
subject = M.group(l)
131
?
,
?
,
, .
Java; ,
,
Sun.
, :
,
Sun.
PHP,
,
.
PHP, ,
PHP . (,
preg
, .)
Subject
.
,
.
( 103)
Perl
mailto:
$text =~ s{
\b
# $1
(
\w[.\w]+
#
@
[\w]+(\.[\w]+)*\.(com|edu|info) #
)
\b
}{<a href="mailto:$1">$1</a>}gix;
Perl
, , ,
,
.
, .
,
132
3. :
, ,
.
, .
Java
,
java.util.regex Sun:
import java.util.regex.*; //
.
.
.
Pattern r = Pattern.compile(
"\\b
\n"+
"# $1...
\n"+
"(
\n"+
"
\\w[.\\w]*
# \n"+
"
@
\n"+
"
[\\w]+(\\.[\\w]+)*\\.(com|edu|info) #
\n"+
")
\n"+
"\\b
\n",
Pattern.CASE_INSENSITIVE|Pattern.COMMENTS);
Matcher m = r.matcher(text);
text = m.replaceAll("<a href=\"mailto:$1\">$1</a>");
.
, , \
\\ .
, \\w \w .
System.
out.rintln(r.pattern()); ,
. ,
,
, .
# ,
. , ,
,
.
Perl
/g, /i / ( , , (
176). java.util.regex
(replaceAll replace)
, (, Pattern.CASE_INSENSITIVE
Pattern.COMMENTS).
VB.NET
VB.NET :
Dim R As Regex = New Regex
133
("\b
" & _
"(?# $1... )
" & _
"(
" & _
" \w[.\w]*
(?# ) " & _
" @
" & _
" [\w]+(\.[\w]+)*\.(com|edu|info) (?# )
" & _
")
" & _
"\b
", _
RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
text = R.Replace(text, "<a href=""mailto:${1}"">${1}</a>")
VB.NET (
,
) VB.NET
, . , \
VB.NET ,
. ,
VB.NET
( ,
, ).
PHP
PHP:
$text = preg_replace('{
\b
# $1...
(
\w[.\w]*
#
@
[\w]+(\.[\w]+)*\.(com|edu|info) #
)
\b
}ix',
'<a href="mailto:$1">$1</a>', #
$text);
(Java VB.NET),
$text,
Perl.
Awk
awk
/_/ ,
var ~ .
134
3. :
awk
Perl (, Perl
sed). awk
,
sub():
sub(/mizpel/, "misspell")
mizpel
, misspell.
Perl s/mizpel/misspell/.
/g
awk : gsub(/mizpel/, "misspell").
Tcl
Tcl ,
,
Tcl. Tcl
:
regsub mizpel $var misspell newvar
var,
mizpel misspell
newvar ( $).
Tcl ,
,
. Tcl
regsub . , all
(
):
regsub all mizpel $var misspell newvar
nocase
( egrep i
Perl /i).
GNU Emacs
GNU Emacs (
Emacs) elisp
(Emacs lisp)
. , researchforward,
,
, .
.
135
. 3.3 ( 124),
Emacs \. ,
\<\([az]+\)\([\n\t]\|<[^>]+>\)+\1\>
(. 1).
,
Emacs \n \t. ,
Emacs, ,
,
.
. , .
\ elisp ,
.
:
(defun FlndNextDbl ()
"move to next doubled word. Ignoring <> tags" (interactive)
(researchforward "\\<\\([az]+\\)\\([\n \t]\\|<[^>]+>\\)+\\1\\>")
)
,
.
, ,
. ! ,
.
,
, :
, .
,
.
,
,
, .
, ,
.
: ,
Perl, awk sed,
136
3. :
, "^From:(.*)".
( )
.
. ,
,
, .
\t, \\ \2,
. ,
, ,
\
\\ . ,
\n \\n.
\
"\n", NL ,
, \n. ,
/, NL , \n
, .
, . . 3.4
\t \2 (2 ASCII *).
, .
3.4.
"[\t\x2A]"
"[\\t\\x2A]"
"\t\x2A"
"\\t\\x2A"
'[\t\x2A]'
'\t\x2A'
[ TAB *]
[\t\x2A]
AB *
\t\x2A
* * ,
*
/x
* *
,
*
,
, \
. , VB.NET
.
.
137
,
:
,
?
Java
Java:
, \ .
\t (
), \n ( ), \\ ( \) . .
\ ,
Java, .
VB.NET
VB.NET ,
Java.
VB.NET :
.
, "he said ""hi""\." he said
"hi"\..
#
Microsoft .NET Framework
,
,
. Visual Basic
. .NET, C#,
.
C# , ,
,
\"
"". C#
@"",
\ ,
:
. , \t\x2A
"\\t\\x2A", @"\t\x2A".
@"".
,
C#. , ,
\n,
,
Perl ( 108), {}
138
3. :
,
.
,
\,
. Java C#
\ ,
. \t
, ,
\t "\\t". , "\w"
\w , \w
.
, ,
, ,
, .
, ,
VB.NET @""
C#, . ,
, \
, \\
\. ( \)
.
, '\t\x2A' \t\x2A.
, ,
.
, ,
10 ( 526).
Python
Python
.
, ,
. Python
'''''' """""",
.
\n;
, ( )
( Java #,
).
#, Python
, r,
, (
# @""). , r"\t\x2A"
\t\x2A.
139
Python \ , ,
(
): , r"he said \"hi\"\."
he said \"hi\"\..
,
Python \" ",
: r'he said "hi"\.'
Tcl
Tcl ,
. ,
Tcl , ,
, ,
.
\n, \ .
,
.
Tcl ,
Python, r''
: {}. , \+
, ,
\t\x2A {\t\x2A}.
( ).
\,
\ .
Perl
Perl, ,
. ,
. ,
$str =~ m/(\w+)/;
$regex = '(\w+)';
$str =~ $regex;
$regex = "(\\w+)";
$str =~ $regex;
( 297, 418).
, Perl
,
, :
140
3. :
(
).
\Q\E ( 149).
\N{},
. ,
Hola! \N{INVERTED EXCLAMATION
MARK}Hola!.
Perl
. ,
Perl, .
,
.
, Perl,
,
\Q\E Perl.
,
(
. .), ,
.
7, . 350.
,
. 110
ASCII n,
EBCDIC >. ?
,
. ,
.
ASCII
, . ISO88591
( Latin1)
,
. , Latin1
234
, ASCII.
: ,
, ? ,
234, 116, 101 115,
Latin1 ( tes),
^\w+$
^\b. ,
141
\w \b Latin1;
, , .
.
, :
?
,
?
?
, :
, ,
? . ^x (
?
\w, \d, \s, \b
? , ,
, \w \b?
? [az]?
? , ?
,
. , \b java.util.regex
, \w
ASCII.
.
,
.
, , . .
. ,
49 333.
, ,
U+.
49 333 C0B5,
U+C0B5.
3 E ,
e.
142
3. :
,
.
, UCS2 (
), UCS4 (
), UTF16 ( ,
) UTF8 (
, ). ,
,
. , ,
(, )
(ASCII, Latin1, UTF8 ) ,
.
/,
.
, ,
\u,
( 155).
,
, \uC0B5 . ,
\uC0B5 U+C0B5
,
( ,
).
UTF8,
, .
. (
,
preg PHP u; 528.)
,
,
. , ,
,
: U+0061 ()
U+0300 (`).
,
( ).
,
. U+0061 + U+0300?
, .
,
143
. , (U+0061 + U+0300)
^..$ ^.$.
Perl PCRE ( preg PHP)
\X, ,
. ,
.
. 155.
.
(, A), A+, ,
, A (
). ,
, [ ]
,
[A ].
,
, A +.
,
.
,
U+0061, U+0300.
U+00E0.
? La
tin1. Latin1,
, , U+00E0,
U+0061 + U+0300. ,
,
java.util.regex
CANON_EQ,
,
( 440).
:
,
. , I (U+0049)
I ( U+0399).
..
I ,
(U+00CF; U+03AA; U+0049 U+0308; U+0399 U+0308). ,
, , .
144
3. :
,
. , Unicode
SQUARE HZ (U+3390), Hz
Hz (U+0048 U+007A).
Hz
,
, ,
, .
,
(U+0020),
(U+00A0) ,
.
3.1+ , U+FFFF
3.1, 2001
, ,
U+FFFF (
,
). ,
Clef
(U+1D121). ,
U+FFFF , .
\u
.
,
\u,
\{},
. , Clef
\x{1D121}.
(
),
(line terminators). . 3.5.
3.5.
LF
U+000A
(ASCII)
VT
U+000B
(ASCII)
FF
U+000C
(ASCII)
CR
U+000D
(ASCII)
/ (ASCII
)
NEL
()
U+0085
145
LS
U+2028
()
PS
U+2029
()
(
, ).
,
. ( 146),
^, $ \Z ( 147).
.
Perl /x
(,
102) /i ( 74).
, ,
.
, , , Perl /i,
i PHP Pattern.CASE_INSENSITIVE
java.util.regex ( 132). ,
, (?i) (
) (?i) ().
(?i:) (?i:),
.
( 176).
, .
, (b
b B),
.
, ,
.
,
. ,
. ,
Ruby.
146
3. :
(
)
. ,
,
, .
. :
, ;
(
Perl Java java.util.regex).
:
. ,
SS, Perl
.
,
. , J (U+01F0)
U+006A U+030C
( 142). , ,
.
,
! ,
.
.
( java.util.regex), #
.
Perl ( 102), Java ( 132) VB.NET ( 132).
,
( java.util.
regex). ,
, .
\123
\12,
3, \123, .
, ,
, .
ASCII.
( )
. .
UNIX ,
147
sed lex
.
.* ,
,
.1
,
( ),
. .
,
,
.
.
.
.
(,
Java Sun)
( 144). Tcl
,
,
.
/s Perl
(singleline).
,
^ $,
, .
,
.
( )
^ $. ^
, ,
.
, ^
.
( 99) Perl,
HTML.
1
, ed,
.*.
148
3. :
, s/^$/<p>/mg
...tags. NL NL It's... ...tags. NL <p> NL It's....
.
$ ,
$ ( 169).
,
,
$.
, ,
\A \Z, ^ $ :
. , \A
\Z .
$ \Z
, .
\z,
.
. 169.
., .
GNU Emacs
,
. , lex
$ ( ^
).
( java.util.regex)
( 144). Ruby
, Python
\Z \z, $.
.
,
,
.
., ^ $.
,
. .
,
^ $ .1
1
Tcl ,
. Tcl
( ., ),
.
,
Tcl.
149
( ) . ,
[az]* [az]*.
(
),
.
,
, . ,
PCRE (. . PHP) Perl
\Q\E,
(, \).
30
. ,
,
, .
,
, ,
.
,
. .
, ,
.
,
.
, ,
.
, .
,
151
153
: \
155
155
: \c
150
3. :
155
: [az] [^z]
156
157
: \C
157
: \X
158
159
, : \p{},\P{}
164
: [[az]&&[^aeiou]]
166
POSIX: [[:alpha:]]
167
POSIX: [[.spanll.]]
168
POSIX: [[=n=]]
168
Emacs
169
/ : ^, \
169
/ : $, \Z, \z
171
( ): \G
174
175
176
177
177
: \Q\E
177
, ,
178 : (), \1, \2,
178
180
: (?:)
: (?<>)
181
: (?>)
: ||
182
: (? if then | else)
180
183
: *, +, ?, {min,max}
184
184
151
, .
,
.
\
( ).
ASCII <BEL>, 007 ( ).
\b
\e
\f
. ASCII <FF>,
014 ( ).
\n
. ( Unix DOS/
Windows) ASCII <LF>, 012 (
). MacOS ASCII
<CR>, 015 ( ). Java
.NET ASCII <LF>
.
\r
. ASCII <CR>.
MacOS ASCII <LF>. Java
.NET ASCII <CR>
.
\t
() .
ASCII <>, 011 ( ).
\v
. ASCII
<VT>, 013 ( ).
. 3.6
.
,
.
( 135),
,
.
152
3. :
\r (
)
\t ()
\v (!
)
\n (
)
\f (
)
\e (ASCII!!
Escape)
\a ()
Python
\b ()
\b (
)
3.6. ,
Tcl
\y
Perl
Java
SR
SR
SR
SR
GNU awk
GNU sed
GNU Emacs
.NET
PHP ( preg)
MySQL
GNU grep/egrep
flex
Ruby
. 123.
; C ;
SR ( );
X (
);
X (
);
S ( ).
,
\n \r 1 ,
1
C C++, \
\ C
,
.
, ,
. ,
\n \r, .
153
. , ,
, ,
\n.
(,
HTTP), \012
, (\012
). DOS,
\015\012. \015?\012
DOS Unix (
: ,
,
169).
: \
, (. .
8),
,
. , \015\012 ASCII
CR/LF.
, .
, Perl ASCII Escape
\0 .
, ( ,
, \1).
,
. (, , java.util.regex)
;
,
0.
, ,
\565 (8
\000 \377). ,
, (
),
.
\377.
154
3. :
3.7.
Python
\0, \07,\377
Tcl
Perl
\xFF
\x \uFFFF; \UFFFFFFFF
\xF; \xFF; \x{}
Java
GNU awk
\7, \77,\377
\xFF; \uFFFF
\x
GNU sed
GNU Emacs
.NET
\0, \77,\377
\xFF, \uFFFF
PHP ( preg)
\7, \77,\377
\xF, \xFF
\7, \77,\377
\xF, \xFF
MySQL
GNU egrep
GNU grep
flex
Ruby
\0 \0 ,
;
\7, \77 ;
\07 , 0;
\077 , 0;
\377 \377;
\0377 \0377;
\777 \777;
\ \ ;
\{} \{} ;
\xF, \xFF \ ;
\uFFFF \u
;
\UFFFF \U
;
\UFFFFFFFF \U
.
. 123.
155
:
\x, \x{}, \u, \U
( 16) \x, \u \U. ,
\x0D\x0A ASCII
CR/LF. . 3.7
.
,
,
( ).
. 3.7.
: \
\,
, 32 (
). , \cH Control+H,
ASCII Backspace, a \cJ
ASCII ( \n,
\r
151).
, ,
.
,
.
, , ,
Java Sun.
, \c
.
, GNU Emacs,
?\^(
(, ?^H ).
,
,
.
: [az] [^az]
,
,
156
3. :
,
. , *
, a
. (
\b)
( 152).
,
(. . [09]
[9081726354]). (,
Java Sun)
,
, .
.
, ,
.
, .
(
).
[^LMNOP] [\x00KQ\xFF].
,
, ,
255 (\xFF), [^LMNOP]
, L, M, N, .
,
. , [aZ]
.
[azAZ] ( ,
ASCII). . \{L}
. 159.
\x80\xFF.
()
, ,
, .
,
. ,
:
(,
Java Sun)
( 144).
157
( 146).
POSIX ,
(. . , ),
(
).
,
, ,
, [^"]
. , ".* "[^"]*"
.
.
( ) . 146.
Perl PCRE (. . PHP) \C,
,
, (
).
,
.
,
, .
: \X
Perl PHP \X
\P{M}\p{M}*, ..
(, \p{M}),
(, \p{M}).
( 142),
, (
a U+0061 ` U+0300)
.
. ,
% , , c,
% (U+0063 U+0327 U+0306).
francais franais
fran.ais fran[c]ais
, ,
158
3. :
U+00C7,
, c
(U+0063 U+0327).
fran[c ?|]ais,
fran.ais
fran\Xais.
, \X ,
\X .
, \X
( 144),
( 146).
,
, \X
( ).
. [09]
.
\D
He!. [^\d].
\w
, . [azAZ09_],
,
( 119).
\w
( : ja
va.util.regex PCRE, PHP, \w
[azAZ09_]).
\W
, . [^\w].
\s
. , ASCII,
[\f\n\r\t\v].
U+0085, \p{Z} (
).
\S
He! ( [^\s]).
. 119, POSIX
( \w).
\w
, \p(L}) .
159
, :
\{}, \{}
( 141),
.
(, ,
,
, . .).
,
\p{(
} (, ) \P{}
(, ).
\p{L} (L ,
), (
( ).
,
\p{} \P{}, .
. 3.8. ( ,
, ,
)
.
(L , S . .),
(Letter, Symbol . .).
, Perl.
3.8.
\p{L}
\p{Letter} ,
\p{M}
\p{Mark} , ,
(
, . .)
\p{Z}
\p{Separator} , ,
(
. .)
\p{S}
\p{Symbol}
\p{N}
\p{Number}
\p{P}
\p{Punctuation}
\p{C}
\p{Other} (
)
(, \pL \p{L}).
( )
160
3. :
In Is (, \p{IsL}).
,
In/Is .1
. 3.9,
,
(. 3.9),
: , , ,
.
.
3.9.
\p{Ll}
\p{Lowercase_Letter} .
\p{Lu}
\p{Uppercase_Letter} .
\p{Lt}
\p{Titlecase_Letter} , (
, D d
D).
\p{L&}
,
\p{Ll}, \p{Lu} \p{Lt}.
\p{Lm}
\p{Modifier_Letter}
, .
\p{Lo}
\p{Other_Letter} ,
, , ,
. .
\p{Mn}
\p{Non_Spacing_Mark} ,
(, , . .).
\p{Mc}
\p{Spacing_Combining_Mark} ,
( ,
, , ,
, , , , ,
).
\p{Me}
\p{Enclosing_Mark} ,
(, , ).
\p{Zs}
\p{Space_Separator} ( ,
).
\p{Zl}
, Is/In
. ,
.
Perl 5.8
Perl. Perl :
Is In,
( 162); In.
161
\p{Zp}
\p{Sm}
\p{Math_Symbol} +, , /, < ,
\p{Sc}
\p{Currency_Symbol} $, c| , , ,
\p{Sk}
\p{Modlfier_Symbol}
, .
\p{So}
\p{Other_Symbol} ,
, . .
\p{Nd}
\p{Decimal_Digit_Number} 0 9
( , ).
\p{Nl}
\p{Letter_Number} .
\p{No}
\p{Other_Number} , ;
, , (
, ).
\p{Pd}
\p{Dash_Punctuation} .
\p{Ps}
\p{Open_Punctuatlon} , (,
\p{Pe}
\p{Close_Punctuation} , ), , ,
\p{Pi}
\p{Initial_Punctuation} , , , <,
\p{Pf}
\p{Final_Punctuation} , , , >,
\p{Pc}
\p{Connector_Punctuatlon} ,
(, ).
\p{Po}
\p{Other_Punctuation} : !, &, , :, :: ,
\p{Cc}
\p{Cf}
\p{Format} ,
.
\p{Co}
\p{Private_Use} ,
( . .).
\p{Cn}
\p{Unassigned} , .
, ,
,
\p{L&},
, , \p{Lu}\p{Ll}\p{Lt}.
. 3.9 (, Lowerca
se_Letter Ll), .
(
, LowercaseLetter, LOWERCASE_LETTER, LowercaseLetter, Lower
caseLetter . .),
, . 3.9.
162
3. :
( ),
\p(}. , \p{Hebrew} (
) , .
,
(, ).
(Gujarati, Thai,
Cherokee...), (Latin, Cyril
lic). ,
Hiragana,
Katakana, Han ( ) Latin.
.
,
, (
) . (,
) . ,
IsCommon,
\\p{IsCommon}. , In
herited, ,
, .
, .
.
, Tibetan 256
U+0F00 U+0FFF. Perl java.util.regex
\p{InTibetan}, .NET
\p{IsTibetan} ( ).
:
(Hebrew, Tamil, Basic_Latin, Hangul_Jamo, Cyrillic, Katakana)
(Currency, Arrows, Box_Drawing,
Dingbats).
Tibetan ,
, ,
, ,
.
:
,
. , Tibetan 25%
.
, ,
. , Currency
163
, $, , , (
, \p{Sc}).
. ,
( ) Latin_1_Supplement.
. ,
Greek Greek_Extended.
, .
(,
Tibetan, Tibetan).
, . 3.10,
. Perl java.util.regex Tibetan
\p{InTibetan}, a .NET Framework
\p{IsTibetan} (
, Perl (
Tibetan).
, ,
. . 3.10 ,
.
,
\p{}.
( , ),
, . .
.
.
: [[az][aeiou]]
.NET
, ,
. , [[az][aeiou]]
, [az],
, [aeiou], . .
.
[\p{P}[\p{Ps}\p{Pe}]]
, \p{P},
, [\p{Ps}\p{Pe}], . .
,
, ( >>.
164
3. :
3.10. //
Perl
Java
.NET
PHP/
PCRE
\p{L}
\pL
\p{IsL}
\\p{Letter}
\p{L&}
\p{Greek}
\p{IsGreek}
\p{Cyrillic}
\p{InCyrillic}
\p{IsCyrillic}
\{}
\p{^}
\p{Any}
\p{all}
\p{Assigned}
\P{Cn}
\P{Cn} \P{Cn}
\p{Unassigned}
\P{Cn}
\p{Cn} \p{Cn}
,
. ( . 123.)
: [[az]&&[^aeiou]]
Java Sun
(, ,
).
, ( , Java
,
[[az]&&[^aeiou]]).
,
(OR) (AND).
165
.
: [abcxyz]
[[abc][xyz]], [abc[xyz]] [[abc]xyz].
,
.
, |
or.
,
.
.
, . ,
[\p{InThai}&&\P{Cn}]
Thai, .
(. . ,
) \p{InThai} \{n}. ,
\{} ( P) ,
. , \P{Cn}
,
, , , (
( Sun
Assigned \P{Cn} ,
\p{Assigned}).
.
. ,
[[this][that]]
, [this] [that],
[this]
[that]. .
, ,
[\p{InThai}&&\P{Cn}] : ,
\p{InThai} \P{Cn},
: ,
\p{InThai} \P{Cn}.
:
.
[\p{InThai}&&\P{Cn}], ,
\P{Cn} [^\p(Cn}],
[\p{InThai}&&
[|^\p{Cn}]]. ,
Thai Thai
166
3. :
, .
,
, [\p(InThai}&&[^\p(Cn}]]
\p{InThai} \p{Cn}.
, [[az]&&[^aeiou]],
.
. , [this&&[^that]] :
[this] [that]. && [^]
,
[ &&[^]].
,
( 175),
. [\p{InThai}&&[^\p{Cn}]]
(?!\p{Cn})\p{InThai}.1
,
.
( .NET InThai IsThai, 164):
(?!\p{Cn})\p{InThai}
(?=\P{Cn})\p{InThai}
\p{InThai}(?<!\p{Cn})
\p{InThai}(?<=\P{Cn})
POSIX: [[:alpha:]]
, , POSIX
(bracket expression).
POSIX
, 2,
.
1
, Perl
\p{Thai}, \p{Thai} ,
.
Thai .
,
, .
, .
http://unicode.org.
POSIX ,
, POSIX
, .
167
POSIX
,
POSIX.
[:lower:],
( 119).
[:lower:] az.
,
, [az], [[:lower:]]. ,
,
, . . (
).
POSIX
,
:
[:alnum:]
[:alpha:]
[:blank:]
[:cntrl:]
[:digit:]
[:graph:]
( ,
. .)
[:lower:]
[:print:]
[:graph:],
[:punct:]
[:space:]
([:blank:], ,
. .)
[:upper:]
[:xdlgit:] ,
(. . 09afAF)
, ( 159),
POSIX. ,
,
POSIX.
POSIX: [[.spanll.]]
(
,
.
, ll (, , tortilla),
,
168
3. :
l m, s t,
ss.
, , ,
spanll eszet.
,
POSIX, ,
( spanll),
. ,
[^abc] ll.
[..]: torti
[[.spanll.]]a tortilla.
,
. ,
,
!
POSIX: [[=n=]]
(
(character equivalents), ,
. ,
n, n ,
, a, a. ,
[::],
,
; , [[=n=][=a=]]
.
,
,
.
([..], [.b.], [..] . .),
[[=n=][=a=]]
[na].
Emacs
GNU Emacs
\w, \s . .;
:
\s
,
Emacs,
169
,
Emacs.
, \sw , , a \s
.
\w \s.
\S
Emacs ,
. , ,
, .
, .
: ^, \A
^ ,
, ( 147)
.
^
( 144).
\A ( )
.
: $, \Z, \z
. 3.11,
. $
.
. , s$ (,
s)
s NL , s
.
$
.
( 144). (
, Java $
, 442.)
( 147) $,
( ).
\Z ( )
$ ,
170
3. :
,
. \z
.
. 3.11.
3.11.
...
^
$ 1
,
$
(147)
^
$
...
^
^ 1
$
\A ^
\Z $ 1
\z
:
1
Java Sun
(143).
2
Ruby $ ^ ,
\A \Z .
3
Python \Z .
4
Ruby \A ^ .
5
Ruby \Z $
, .
.
( . 123.)
171
( ): \G
\G Perl
/g ( 79).
, .
\G , \A.
, \G
. ,
( s///g
)
\G .
Perl \G ,
.
, \G, ,
.
,
\G ,
.
Perl
/c ( 380), ,
\G ,
.
,
.
\G
,
( pos 378). , ,
. ,
, .
.
, \G Perl
,
. ,
.
?
,
\G
?
,
. . 268,
172
3. :
:
x? abcde.
abcde, .
, , ,
.
,
\G
( 191). , s/x?/!/g
abcde, !a!b!c!d!e!.
,
. :
\G? Perl s\Gx?/!/g
abcde !abcde, , Perl \G
.
\G .
\G Perl
,
HTML $html ,
HTML ( <IMG> <A>,
>).
Yahoo! ,
HTML .
Perl m//gc,
,
( 380).
,
, .
,
,
.
my $need_close_anchor = 0; # True, <A>,
# </A>.
while (not $html =~ m/\G\z/gc) #
# ...
{
173
if ($html =~ m/\G(\w+)/gc) {
... $1 ,
, ...
} elsif ($html =~ m/\G[^<>&\w]+/gc) {
# ,
# HTML, .
} elsif ($html =~ m/\G<img\s+([^>]+)>/gci) {
... <IMG> ...
.
.
.
} elsif (not $need_close_anchor and $html =~ m/\G<A\s+([^>]+)>/
gci) {
... ...
.
.
.
$need_close_anchor = 1; # </A>
} elsif ($need_close_anchor and $html =~ m{\G</A>}gci){
$need_close_anchor = 0; # ,
} elsif ($html =~ m/\G&(#\d+<\w+);/gc){
# > {
} else {
# , .
# HTML ,
# .
my $location = pos($html); #
# HTML.
my ($badstuff) = $html =~ m/\G(.{1,12})/s;
die "Unexpected HTML at position $location: $badstuff\n";
}
}
# , HTML <A>
if ($need_close_anchor) {
die "Missing final </A>"
}
,
!a!b!c!d!e, , \G
.
,
. , Micro
soft .NET java.util.regex
, (
). , PHP Ruby \G
, Perl. java.util.regex
.NET .
174
3. :
GNU awk
\< \>
\y
\B
GNU egrep
\< \>
\b
\B
GNU Emacs
\< \>
\b
\B
Java
\B
(?<!\pL)(?=\pL) (?<=\pL)(?!\pL)
\b
[[:<:]] [[:>:]]
[[:<:]]|[[:>:]]
.NET
(?<!\w)(?=\w) (?<=\w)(?!\w)
\b
\B
Perl
(?<!\w)(?=\w) (?<=\w)(?!\w)
\b
\B
MySQL
PHP
Python
(?<!\pL)(?=\pL) (?<=\pL)(?!\pL)
\b
(?<!\w)(?=\w) (?<=\w)(?!\w)
\b
Ruby
\b
\B
\B
\B
GNU sed
\< \>
\b
\B
Tcl
\m \M
\y
\Y
, , ASCII (
8 ),
. ( . 123.)
,
, ,
.
,
. ,
\w, . ,
Java Sun \w
ASCII , Java,
\pL
( \p{L} 159).
175
.
NE14AD8
, a M.I.T. .
(?=), (?!);
(?<=), (?<!)
( 88).
?
(
!).
Perl Python,
. ,
(?<!\w) (?<!this|that) , a (?<!books?) (?<!^\w+:)
,
. , (?<!books?),
(?<!book)(?<!books), .
, (?<!books?)
(?<!book|books). PCRE ( reg
) .
,
, . ,
(?<!books?) , (?<!^\w+:) ,
\w+ .
Java Sun.
,
, ,
, (
).
, .
, ,
, (?<!^\w+:)
. , .NET
Microsoft, ,
( ,
,
,
).
176
3. :
,
( 145),
( , ).
, .
: (?),
(?i) (?i)
( 145)
. (?i)
( ) (?i) ().
, <B>(?i)very(?i)</B> very
,
. , <B>VERY</B>
<B>Very</B>, <b>Very</b>.
(145)
(146)
(146)
(147)
Ruby, (?i) ,
, |,
( ,
).
177
: (?:),
(?i:)
, ,
.
(?i:)
, .
<B>(?:(?i)very)</B> <B>(?i:very)</B>.
, ,
. Tcl
Python (?i),
(?i:).
: (?#) #
(?#).
,
( 146).
,
(, VB.NET) 132, 497.
: \Q\E
\Q\E Perl.
, \Q,
(, \; ,
). , ,
,
.
.
, WWW ,
, $query,
m/$query/i. $query
C:\WINDOWS\,
,
, (
\).
Perl m/\Q$query\E/i; C:\WINDOWS\
C:\\WINDOWS\\,
C:\WINDOWS\, .
( 127),
.
,
. , VB
178
3. :
, ,
: () \1, \2,
:
.
(), \(\).
GNU Emacs, sed, vi grep.
,
. 68, 70 85. ,
, ,
\1, \2 . .
. , (
, ),
,
(, Perl
$1, $2 . .).
\1 ; sed
vi. . 3.14 ,
, .
: (?:)
(?:) ,
.
179
$1, $2 . .
, (1|one)(?:and|or)(2|two)
$1 1 one, a $2 2 two.
.
3.14.
GNU egrep
( )
GNU Emacs
(matchstring 0)
(\& )
(matchstring 1)
(\1 )
GNU awk
MySQL
Perl
67 $&
$1
PHP
532 $matches[0]
$matches[1]
Python
122 MatchObj.group(0)
MatchObj.group(1)
Ruby
$&
GNU sed
& ( \1 (
)
)
Java
Tcl
128 MatcherObj.group()
$1
MatcherObj.group(1)
regexp
129 MatchObj.Groups(0)
MatchObj.Groups(1)
C#
MatchObj.Groups(0)
MatchObj.Groups(1)
vi
&
\1
VB.NET
( . 123.)
.
,
,
$1. ,
,
(
6).
.
. 106, $HostnameRegex
. ,
Perl m/(\s*)$HostnameRegex(\s*)/. ,
180
3. :
$1,
$2, :
$4, $HostnameRegex
:
$HostnameRegex = qr/[az09]+(\.[az09]+)*\.(com|edu|info)/i;
,
$HostnameRegex .
, Perl,
,
.
: (?<>)
Python .NET ,
, . Python
(?<>), .NET (?<>) (
). .NET:
\b(?<Area>\d\d\d)(?<Exch>\d\d\d)(?<Num>\d\d\d\d)\b
Python/PHP:
\b(?P<Area>\d\d\d\)(?P<Exch>\d\d\d)(?P<Num>\d\d\d\d)\b
Area, Exch Num,
. , VB.NET
.NET RegexObj.Groups("Area"), # Regfex(
Obj.Groups["Area"], Python RegexObj.groups("Area"), PHP
$matches["Area"].
.
\k<Area> .NET (?P=Area) Python PHP.
Python .NET ( PHP)
. ,
(###) ###,
(
.NET): (?:\((?<Area>\d\d\d)\)|(?<Area>\d\d\d)).
,
Area.
: (?>)
(?>)
( 216). , ,
, (
, . . ) ,
181
,
. ,
.
.*! Hola!,
, .*
: (?>.*)!. .*
(Hola!),
.* (
!), .
.* ,
; !
, .
,
. ,
( 217)
,
( 329).
: ||
.
. | ,
.
| \|.
. , this and|or that
(this and)|(or that), this (and|or) that,
and|or .
(this|that| ). ,
(this|that)?.1
POSIX ,
lex awk. ,
.
, ,
.
, (this|that|)
((?:this|that)?). (this|that)? ,
.
,
.
182
3. :
(1) (?(1)) ,
.
,
.
,
<>. (<)?\w+(?(1)>)
, (<?)\w+(?(1)>) .
.
()
, ( ,
) . ()
< ,
,
< . , if (?(1))
.
( 180)
.
183
if
, (?=) (?<=).
,
then.
else. :
(?(?<=NUM:)\d+|\w+).
\d+ NUM:,
\w+. .
Perl :
Perl.
,
(then else).
7 . 392.
: *, +, ?, {min, max}
(*, +, ? ,
)
. ,
+ ? \+ \?.
,
, .
: {min, max} \{min, max\}
.
.
([az]{3} [az]\{3\}
) ,
. [az]
[az][az],
( 307).
: , X{0,0}
X. X{0,0} ,
: X ,
. ,
X{0,0} X ,
,
.1
1
{0,0} .
! (
GNU awk, GNU grep Perl) {0,0}
*, (
sed grep) ?. !
184
3. :
.
, ,
. .
, ,
.
4 ,
,
, ,
,
. .
.
5
.
185
6
.
, ,
. 6
(
) ,
.
4, 5 6 .
,
.
,
, ,
. ,
,
.
4
. ,
. ,
,
, .
, ?
,
(, Perl, Tcl,
Python, .NET, Ruby, PHP, Java
. .) ,
, ,
. ,
, .
!
,
. , ,
. ,
.
, , . ?
, .
,
187
, .
,
, .
.
, .
,
.
, .
,
: . ,
, .
: ,
.1
, , ,
,
.
, ,
. ,
: ( )
( ).
,
,
. ,
.
,
.
,
( ,
). ,
,
.
, (
, ).
1
,
.
:
.
188
4.
,
. , ,
. ,
.
: () ().
( 201),
( ).
, ,
, . ,
, .NET, PHP, Ruby,
Perl, Python, GNU Emacs, ed, sed, vi, grep
egrep awk. ,
egrep awk, lex flex.
,
(
). . 4.1
, .
, ,
.
4.1.
awk ( ), egrep ( ),
flex, lex, MySQL, Procmail
POSIX HKA
3, 20
,
. POSIX,
,
189
. ,
( )
, , ,
,
.
:
( POSIX
)
( : Perl, .NET,
PHP, Java, Python, ...)
POSIX
POSIX
POSIX (
). POSIX
, ,
( 166).
,
.
, ,
egrep, awk lex,
,
.
,
POSIX, .
, (POSIX
HKA), ,
,
.
, . ,
,
. ,
. .
.
.
. ,
?
?
? POSIX HKA?
?
190
4.
, ,
,
,
, (
,
?). , ,
,
. , ,
. 4.1,
,
.
?
,
.
( 184)? ,
. ,
, POSIX .
nfa|nfa
not nfanot; nfa, ,
. nfanot,
POSIX , .
POSIX ?
POSIX
.
,
.
. X(.+)+X =XX=============
=========, :
echo =XX========================================= | egrep 'X(.+)+X'
,
(
, , POSIX ).
, ,
.
,
.
191
, ,
.
( ),
.
,
.
,
,
. ,
( , ).
Perl,
,
,
.
, 3 ( 149).
. ,
, ,
. ,
.
:
1. .
2. (*, +, ? {m,n})
.
,
. .
1:
,
, ,
.
( ).
, ,
.
:
,
( ).
,
192
4.
( ) .
,
, .
.
,
(
).
, ORA FLORAL
( ORA FLO).
, (LOR), .
, , ,
: FLORAL.
,
. , cat
The dragging belly indicates your cat is too fat
indicates, cat,
. cat , cat indicates
, . ,
egrep, , (
, ,
(, ) .
: The dragging belly indicates your cat is
too fat fat|cat|belly|your?
.
,
. (
) ,
.
,
,
, . .
.
, ,
,
.
6.
.
, , ,
193
.
, (*
), , . .
. 3 ( 149). (
) , ,
.
.
(, , \*, !,
. .)
, z !
?
,
usa u, s,
a.
, b b B,
( ,
146).
, ,
, ,
. . ( 155) :
.1
, ( 146),
.
, \w, \W \d.
,
, .
(, ^, \Z, (?<=\d) )
: (^, $, \G,
\b, ... 169) (
175).
(^, \Z, ...),
(\<, \b, ...). ,
.
1
, (167),
POSIX
, . ,
(145),
.
194
4.
. 192.
:
, fat|cat|belly|your The
dragging belly indicates your cat is too fat , fat,
fat .
,
fat ,
(. .
) ,
. ,
,
.
,
,
, .
(
$1) .
(),
() ,
.
.
, .1
,
. , awk, lex egrep
$1.
, GNU egrep
.
! ,
(
) .
,
(
, ).
1
, ,
, .
. 231.
195
2:
.
(*, +),
. . ,
.
, ,
(?, *, + {min, max}) .
(, a a?, ()
()* [09] [09]+),
,
, ,
.
.
,
. (
,
.)
,
,
.
.
: \b\w+s\b,
, s (, regexes). \w+
, s
. , \w+
regexes:
s\b .
,
,
( ,
*, ? {min, max})... ,
. ,
.
(
) , ,
.
(
) . , , [09]+
March1998. 1
+.
196
4.
,
, ,
998.
[09] , +
.
, .
,
, .
( 83),
^Subject:.
^Subject:(.*).
($1 Perl).1
, .* ,
: ^Subject: ,
. ,
^Subject: ,
.* ,
*
.
.* ? *
, $1.
, .*.
.* ,
, *
( .* , ,
).
, ,
.
,
, . ,
,
.* ^Subject:(.*).*. : .
.* ( ) ,
1
(
).
, .
197
.* .
, *
. .*
, $2 .
, .*
, ?
, . \w+s,
,
,
.
^.*([09][09]).
, ,
$1. :
.* . ([09]
[09]) ,
: , .*, ! ,
.
, ,
. ,
, . ,
,
+.
, ^.*([09][09]) about24cha
racterslong. .* ,
[09] .*
g ( ).
[09] , .*
n long.
15 , .* 4.
, [09].
, , . .*
,
. .* 2,
[09]. 4
, about24char. $1
24.
^.*([09]+).
, ,
. ,
Copyright 2003.?
.
198
4.
, . .* ...
[09] ... .
,
. ,
, ,
. , .
. ,
, .
: ,
,
to(nite|knight|night)
tonight.
, t, ,
.
. ,
;
, .
to(nite|knight|night)
t. ,
t. , o
,
.
(nite|knight|night),
nite, knight, night.
, .
, ,
, tonight
. ( 116),
, ,
.
, nite,
: n,
i, t e. (
), . .
,
( ).
,
, .
199
. 197.
^.*([09]+) Copyright 2003.?
, ,
. .*
,
[09]+.
3,
[09].
+,
;
, .
, .* 0
. [09]+
, : ,
.
: ,
, .
$1 3.
, : [09]+
[09]*,
.*. + * ^.*([0
9]+) ^.*(.*),
^Subject:(.*).* . 196,
.* .
. ,
.
(,
).
,
( )
. 5 6 ,
,
.
, .
200
4.
: ,
,
.
tonight
, t:
t onight
: t o(nite|knight|night)
.
:
toni ght
( , knight,
). , g,
. h t
, ,
.
,
.
( ) .
. ,
. , to()?
,
. (to)
,
.
, ,
, ( ) ,
.
,
, ,
. ,
,
(
).
201
.
( ).
, ,
( ),
.
,
,
. , (
. , ,
(
, ), ,
,
. .
,
, : (
()
(). , ,
.
.1
,
,
. ,
. , tonight
:
to(ni(ght|te)|knight)
tonite|toknight|tonight
to(k?night|nite)
,
.
,
, .
, ...
! ,
,
, .
.
202
4.
:
, ,
.
,
(
),
. ,
abc [aaa](b|b{1}|b)c, .
:
.
,
.
,
. ,
, .
,
(backtracking).
:
,
,
.
,
, (
) (
, ).
,
.
,
, ,
.
(
, ).
, .
.
203
,
, .
,
. ,
, .
,
( ) .
.
x?, , x
. x+ +
,
. x
, ,
x.
, , ...
... . . , ,
, ,
(
, ).
to(nite|knight|night)
hottonictonight! (, , ,
). t .
h,
.
, .
. t ,
o ,
.
, tonic, .
to ,
.
(
) ,
. , nite.
n + i + t ...,
toni c, .
,
.
(, knight), .
, night.
. ,
, tonic,
.
204
4.
,
tonight!. night
( ,
).
,
.
,
? ,
?
:
,
(,
?, * ), (
.
. ,
. ,
. :
.
LIFO ( ).
, ,
.
, . LIFO
:
,
.
(saved states).
,
.
, ,
.
,
, . ,
, .
205
ab?c abc.
c :
a bc a b?c
, b?,
b ? ?
. ,
,
:
a bc ab? c
, ,
b?,
, b (. . ). ,
, b , ?.
,
b. ,
:
ab c ab? c
, c, . ,
.
,
.
ac,
b. ,
. , ,
?, .
, ,
.
.
a c
ab? c
b. c c ,
.
bX.
b ? :
206
4.
a bX ab? c
b ,
, c X.
. c
b, . ,
.
,
,
.
? .
.
.
:
a bX ab?c
, , ,
. ( ab X abX )
,
.
,
. ,
ab??c abc. a
:
a bc a b??c
b??
: b
? ??,
,
.
, .
:
a bc a bc
,
b , b
( , ,
; ,
). ,
,
:
207
a bc ab?? c
c b,
:
a bc a bc
,
c c.
, ab?c,
.
,
, , .
,
. ,
?
??. ,
* +.
*, +
x*
x?x?x?x?x?x? ( , (((x(x?)?)?)?)?)1,
, .
,
( ),
*.
, *
.
, [09]+ 1234num [09]
4.
, ,
+:
1 234
12 34
123 4
1234
num
num
num
num
,
[09]
. [09] ,
(
) a1234 num [09]+
1
,
; .
208
4.
.
. , ,
.
: a 1234num
, +
. ,
[09]*? ( !)
, .
^.*([09][09])
. 197. ,
, , .
.
95472,USA.
.*
13 .
, * 13
, ( ) .
^.* ([09][09]) ,
.
[09]. , .
: (
). ,
, , . . ,
.* A.
A [09] .
.
,
2.
[09] .
, . ,
[09]
[09].
7, [09]
.
( 2). , :
95472,USA, $1 72.
. ,
,
.
209
. 208.
, 1234num
[09]* 1234
num?
, . ,
. : , *,
.
, .
, ,
.
1234num,
.
,
. [09]*
,
:
a 1234
[09]*
1
:
a 1234
[09]*
^.* ([09][09]) ,
, ^.* [09][09],
, ...
[09].
, $1, .
, , *
( ),
, (
. .* ,
,
.
, , ^.*([09]+) [09]+
( 197).
( )
, (
210
4.
,
).
, ,
. ,
. ,
. ,
. ,
,
, .
.
, ,
( 6).
,
.
,
,
. ,
, .
, .*
.1 , .*
, . ,
,
.
.
, .
".*",
, .*, ,
:
The name "McDonald's" is said "makudonarudo" in Japanese
,
, .
"
.*,
. (
), ,
.
, , :
1
,
,
,
.
211
, . ,
.*,
,
.
"McDonald's"? ,
, , ,
, . .*
[^"]*, .
"[^"]*"
. [^"]*
.
, McDonald's.
, [^"]
,
. ,
:
The name "McDonald's" is said "makudonarudo" in Japanese
,
, [^"]
, a . .
,
[^"\n].
HTML. ,
<B>very</B> very
, .
<B></B> ,
. ,
<B> </B>. ,
.*:
<B>Billions</B> and <B>Zillions</B> of suns
<B>.*</B>,
.*
, </B>.
, <B>, </>.
,
, ,
, . ,
<B>[^</B>]*</B> .
, </>.
212
4.
[^</B>] .
, , <, >,
/ B. [^/<>].
, , </> . ( ,
</B> , ,
;
.)
,
.
( 184),
*? *.
, <B>.*?</B> :
<B>Billions</B> and <B>Zillions</B> of suns
<B>
, ,
,
<:
<B> Billions
<B>.*? </B>
< ,
.*?,
( , ).
, .
B <B>Billions *?
, .
,
. < ,
.*? .
.*? Billions,
< (
</B>):
<B>Billions</B> and <B>Zillions</B> of suns
, *
, . ,
,
,
( ).
,
. ,
.
<B>.*?</B> :
<>illions and <B>Zillions</B> of suns
213
. ,
,
. .*?
<B> Zillions </B>.
,
. ".*" [^"]
.
( 175),
. (?!<B>)
, <B> .
<B>.*?</B>,
a ((?!<B>).) ,
, , .
, ,
( 146)
:
<>
(
(?! <> )
.
)*?
</>
#
#
#
#
#
#
<>
...
<> ...
...
...
,
,
:
<>
(
(?! </?> )
.
)*
</>
#
#
#
#
#
#
<>
...
<> </> ...
... .
( )
...
</B> <B>. ,
,
.
;
6 ( 329).
( 79).
214
4.
, :
1.625 3.00
1.625000000002828 3,00000000028822.
:
$price =~ s/(\.\d\d[l9]?)\d*/$1/;
, ,
, $ric. \.\d\d
, a [19]?
, .
:
,
,
$1. $1
. ,
. $1
.
, .
, . . \d*
.
, , . , ,
$price .
27.625 (\.\d\d[19]?)
. \d*
, .625 .625 ,
.
, ,
,
(. . \d* ,
)? ,
! \d* \d+:
$price =~ s/(\.\d\d[l9]?)\d+/$1/
1.625000000002828
, , 9.43 \d+
, .
, ? ! ,
(, 27.625)?
, .
27.625. ,
5.
. ,
(\.\d\d[19]?)\d+ 27.625, ,
\d+ .
215
, [19] 5 (
,
. [19]? ,
5 \d+.
, , : .625
.62, .
[19]?
? ,
5 ,
[19]??
. ,
.
,
, :
,
.
(
),
. , (
,
, .
(, !),
.
, ,
,
,
.
,
,
( 6).
, ,
, .
, ".*"
:
The name "McDonald's" is said "makudonarudo" in Japanese
, ".*"
, ".*?"
.
216
4.
.625
.
. ,
,
,
,
. , .
, .625 ,
.
(\.\d\d[19]?) \d+,
. , , [19]
,
. ,
, ,
[19] (
, ,
,
[19]).
( ,
?
[19])? , ,
[19] .
!
, ,
.5000? [19]
, [19],
\d+ .
, ,
,
[19] ,
[19] . [19] ,
[19],
. (
,
(?>) ( 180)
[19]?+ ( 184). .
(?>)
(?>) , ,
(. .
), ,
, . ,
217
, ,
. ,
,
,
, (
).
\.\d\d(?>[19]?))\d+.
, , [19]
,
, ?.
\d+.
,
.
[19] ,
,
, ?, .
, ,
. .625,
, , .625000.
, \d+ .625000
. .625,
\d+ ,
[19] .
, ,
.625 , .
,
(. 215) :
,
, .
, ,
.
,
.
:
. ,
,
.
.625000. ,
.
218
4.
. ,
,
. .625.
.
,
.
.
.
.
: (?>.*?)? ,
? , .
.
:
,
, . ,
, ,
,
.
.
^\w+: Subject.
, ,
,
.
, : \w+
.
,
\w+ ( ,
).
: ,
:
Subjec t
^\w+ :
:
( t).
,
:
S ubject
^\w+ :
.
, .
219
,
, +!
, , \w+,
,
: ^(?>\w+):.
;
\w+
( ).
, ,
(
307).
6 ( 329),
. ,
.
,
(307).
220
4.
. 218.
(?>.*?)?
!
! *?
*,
,
.
,
,
.
, ,
.
,
,
; M++ (?>M)+.
, +
: (?>(\\"|[^"])*).
, (
2, 88)
.
:
. ,
/ ,
( )
( ).
?
.
. ,
?
,
.
,
, ,
.
?
,
,
221
. , ,
,
, , .
, ,
.
( ),
.
,
.
, (,
), .
? ,
.
, ,
,
:
(
, , Tcl),
. ?>
222
4.
?
,
.
,
? ,
, ?
, , .
, ,
.
( )?
, Perl,
Java, .NET ( 188).
,
.
(Subject|Date):.
, ,
Subject. ,
, :. ,
,
( Date),
. , (
.
,
( ).
, tour|to|tournament,
threetournamentswon
? ()
, three to
urnamentswon. , tour.
,
tour
. .
, ,
, , .
,
, , ,
, .
.
POSIX ,
,
( tournament). Perl, PHP,
.NET, java.util.regex
223
( 188)
.
(\.\d\d[19]?)\d* . 214. ,
\.\d\d[19]? \.\d\d, \.\d\d[19],
(\.\d\d|\.\d\d[19])\d* (
).
?
, .
.
, \.\d\d. ,
\d*, .
, \d* , ,
,
( , ,
,
\d* ). ,
, ,
.
.
: (\.\d\d[19]
|\.\d\d)\d*, ,
,
(\.\d\d[19]?)\d*.
[19], .
,
, .
[19]?
,
?.
,
.
,
. , ,
a*((ab)*|b*)
, .
, (ab)*,
, ( b*)
. :
*((ab)*|b*|.*|partridgeinapeartree|[z])
. :
224
4.
,
.
,
,
. ,
Jan 31. ,
Jan[0123][09], Jan00
Jan39, Jan7.
,
. 1 9
0?[19], . [12][09]
10 29, a 3[01]
. ,
Jan(0?[19]|[12][09]|3[01]).
, Jan 31 is my
dad's birthday? , , Jan 31,
Jan 3. ?
, 0?[19],
0? , ,
, .
, .
1
01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
1
20 21 22 23 24 25 26 27 28 29
30 31
01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
31|[123]0|[012]?[19]
20 21 22 23 24 25 26 27 28 29
01 02
01 02 03 04 05 06 07 08 09
30 31
[12][09]|3[01]|0?[19]
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31
0[19]|[12][09]?|3[01]?|[49]
, POSIX
225
[19] 3.
, .
,
,
. : Jan([12][09]
|3[01]|0?[19]).
: Jan(31|[123]0|
[012]?[19]). ,
. , ,
Jan(0[19]|[12][09]?|3[01]?|[49]),
.
(
,
).
, POSIX
,
:
, .
.
, ,
, .
. ,
one(self)?(selfsufficient)? one
selfsufficient. one,
(self)?, (selfsufficient)?
sufficient. ,
, . ,
oneselfsufficient
( POSIX ,
).
, oneselfsuf
ficient. , (self)?
,
(selfsufficient)?. ,
,
. ,
.
226
4.
,
: .
, .
.
\, .
:
SRC=array.c builtin.c eval.c field. gawkmisc.c i. main. \
missing. msg.c node. re. version.
=
^\w+=.*,
(,
).
, (\\\n.*)*,
^\w+=.*(\\\n.*)*. ,
, \+ .
,
. , .* , \
,
( 197).
,
, .
,
^\w+=.*?(\\\n.*?)*.
,
, , \\
\ . .
, ,
= ,
.
SRC=;
, .
.
( 235).
POSIX ,
POSIX ,
, ,
, .
, POSIX
227
,
. ,
. ,
? POSIX ,
oneselfsufficient
,
.
.
? ,
, .
,
. POSIX HKA.
(self)? ,
one(self)? (selfsufficient)?
one selfsufficient. oneselfsufficient,
, POSIX
oneselfsufficient.
7 ,
POSIX Perl
( 402).
, POSIX
, .
POSIX
. 6
,
.
.
.
?
,
( , ).
,
,
228
4.
.
.
,
,
.
.
:
,
.
.
,
.
.
, ( )
.
,
. ?
, ,
,
.
.
,
,
.
?
. ,
, .
, , ,
.
,
,
Compilers Principles, Techniques, and Tools (Ad
disonWesley, 1996), (Alfred Aho),
(Ravi Sethi) (Jeffrey Ullman),
. ,
,
Principles of Compiler Design,
.
, POSIX
229
,
, .
( )
.
.
,
,
.
,
.
,
.
.
POSIX .
.
,
.
,
, (
6) ,
. ,
( ).
, .
, POSIX
,
,
, . POSIX
.
,
, .
, ,
^
( 192).
6.
230
4.
,
( , ),
,
, ,
.
,
.
.
, ,
. ( ,
.) ,
,
.
( , POSIX)
, .
,
.
,
,
.
,
, , .
,
, :
,
.
,
;
1 ( 175);
.
(
),
231
, POSIX
:
?
,
.
,
,
. . 228 ,
, .
, .
, .
GNU grep , .
, ,
.
GNU awk
GNU grep, ,
,
.
, GNU awk
gensub.
Tcl
. (,
,
120).
Tcl
, ,
.
,
POSIX (225),
, 6.
.
,
;
( 184)
( 180).
,
.
( , )
.
232
4.
,
ed 7 ( 1979 )
350 (,
grep 478 ).
( 8, 1986 ),
, 1900 .
rx POSIX , (Tom Lord)
GNU sed, 9700 .
,
egrep 7 400 ,
POSIX (1992 )
4500 .
, GNU
egrep 2.0 (
8300 ), /
Tcl 9500 .
, .
Pascal. Pascal
,
.
,
.
, ,
, . ,
, . ,
, . , ,
.
, ( ,
,
). ,
,
.
.
( 198),
( 200). . 201.
POSIX ( 226)
:
(
)
233
POSIX (
)
, POSIX (
, )
, ,
,
.
,
. . 4.1 ( 188)
,
( 190) ,
.
, :
,
.
( 191).
?
,
. .
( 225). ,
( 227).
,
.
( 202, 207).
: , , (* ) (
( 195), ()
( 216).
( 222), POSIX .
POSIX .
,
( 6).
. ,
, ,
, .
,
,
.
.
5
,
, .
: , ,
, . ,
,
.
.
, .
(
) .
.
.
,
.
, HTML,
HTML.
, ;
. ,
!
,
.
,
,
.
235
:
, ,
(. . ,
)
.
grep, , ,
. ,
.
:
, ,
.
.
, .
, ^(display|geometry|cemap||quick24|random|
raw)$, ,
.
,
,
, 100
. ,
.
, .
( 226).
, ^\w+=.*(\\\n.*)*
:
SRC=array.c builtin.c eval.c field. gawkmisc.c io. main. \
missing. msg.c node. re. version.
, .* \
(\\\n.*)*,
. : ,
\,
236
5.
, [^\n\\] (
\n ; ,
,
( 157)):
, :
^\w+=[^\n\\]*(\\\n[^\n\\]*)*
, ,
: \ ( ,
).
, . ,
, ,
.
:
,
, .
, :
,
.
( \ \n), \+
. \\.
,
\+ .
, ^\w+=([^\n\\]|\\.)*,
.
^
(147);
.
, .
(330).
IP
.
IP (Internet protocol), . . ,
(, 1.2.3.4).
001.002.003.004.
IP, [09]
*\.[09]*\.[09]*\.[09]*, ,
and then.....?.
:
(
, , ).
, ,
.
237
, IP,
^$. :
^[09]+\.[09]+\.[09]+\.[09]+$
, , .
\d ,
[09], . ,
\d ,
ASCII (157).
238
5.
, 1
( 54) ( 224).
, 2, ,
255 . ,
5, , . ,
6. 2[04]
\d|25[05].
,
, , .
\d|\d\d|[01]\d\d|2[04]\d|25[05].
, ,
[01]?\d\d?|2[04]\d|25[05].
,
. : \d\d?
\d?\d
.
.
, ,
.
, ,
0 255.
\d{1,3} .
(
):
^([01]?\d\d?|2[04]\d|25[05])\.([01]?\d\d?|2[04]\d|25[05])\.
([01]?\d\d?|2[04]\d|25[05])\.([01]?\d\d?|2[04]\d|25[05])$
! ?
. 0.0.0.0,
,
. ( 175),
, (?!0+\.0+\.0+\.0+$)
^,
,
.
. , ^\d{1,3}
\.\d{1,3}\.\d{1,3}\.\d{1,3}$,
, $1, $2, $3 $4,
.
^ $.
.
239
ip=72123.3.21.993 (
) i=123.3.21.223.
223, . (
, (
), .
, [01]?\d\d?, ,
$ .
( 224),
.
,
,
(,
POSIX ,
).
,
. ! .
. , .
1.2.3.4.5.6.
,
. ,
(?<![\w.])(?![\w.])
,
[\w.] (
).
(^|)(|$).
( /usr/local/bin/perl UNIX
\Program Files\Yahoo!\Messenger Windows)
. ,
, ,
, Perl, PHP, Java VB.NET.
,
, .
,
/usr/local/bin/gcc g.
.
/ ( \ Windows) .
/, ,
. , .*
240
5.
, . ^.*/
.* ,
/, .
,
f, .
Unix:
Perl
$f =~ s{^.*/}{};
PHP
java.util.regex
f = f.replaceFirst("^.*/", "");
VB.NET
( ,
) ,
.
Windows,
\, /.
^.*\\. ,
\
\\,
\
, .
Perl
$f =~ s/^.*\\//;
PHP
java.util.regex
f = f.replaceFirst("^.*\\\\", "");
VB.NET
. \\\\,
\ Java ( 135).
:
?
/, ,
. .
,
,
. , ,
^ (
241
) ,
/. ,
. .*
, / \.
,
.*, .
, (
!
,
.
( )
.
,
,
(, ).
, ,
.* ,
,
( 301).
, .
:
,
, [^/]*$.
,
. Perl:
$WholePath =~ m{([^/]*)$}; #
# $WholePath.
$FileName = $1;
#
: ,
, .
$ ,
. ,
$1,
( /,
).
[^/]
*$ . ,
, ,
242
5.
. /usr/local/bin/
perl 40
. ,
local/. [^/]*$
l /, $
l, a, c, o l (
). ,
l l/,
l l/ . .
,
, 40
40 !..
,
.
, .
, ,
.
.
: .
; , . ,
^(.*)/(.*)$,
$1 $2.
, , , ,
, .*
$2 , /. .*
,
/.
.* .
, $1 , $2 .
: ,
(.*)/ (.*) , .
,
.
,
[^/]*. ^(.*)/([^/]*)$.
, ,
.
,
.
file.txt, , ,
243
.
:
if ( $WholePath =~ m!^(.*)/([^/]*)$! ){
# $1 $2
$LeadingPath = $1;
$FileName = $2,
} else {
# , /
$LeadingPath = "."; # "file.txt"
# "./file.txt" ("." )
$FileName = $WholePath;
}
(, )
.
, , ,
. ,
, .
.
,
. ,
\bfoo\([^)]*\).
.
foo. ,
. ,
foo(2,4.0) foo(somevar,3.7), ,
. , foo(bar(somevar),3.7) .
[^)]*.
, ,
:
1. \(.*\)
,
2. \([^)]*\)
3. \([^()]*\)
. 5.1 , .
244
5.
. 5.1.
,
1, .
(this),
foo,
. , .
, ,
(
.
, Perl, .NET PCRE/
PHP ,
(. 394, 515 563 ). ,
,
, .
:
\([^()]*(\([^()]*\)[^()]*)*\)
, ,
.
, .
Perl $depth
, .
Perl ,
:
$regex = '\(' . '(?:[^()]|\(' $depth . '[^()]*' . '\))*' $depth . '\)';
.
1
.* ,
, *
. ,
.* .
245
, ,
. ,
HTML ,
<HR> (
).
s/*/<HR>/, ,
, . ?
: s/*/<HR>/ <HR>
, , !
: , ,
. *
.
, .
*.
.
,
, .
,
,
.
: ?[09]*\.?[09]*.
, ,
1, 272.37, 129238843., .191919 .0. .
, ,
thistexthasnonumber, nothinghere
?
.
, ,
.
,
. ,
, num123,
, !
, .
,
.
,
.
+: ?[09]+.
,
.
246
5.
\.?[09]*, [09]*
.
, , .
( ,
): (\.[09]*)?.
,
.
: , [09]*
.
, ?[09]+(\.[09]*)?.
.007,
.
,
, ,
(,
).
: ?[09]+(\.[09]*)?|?\.[09]+.
, ,
. , ...
,
? . , ?
,
?([09]+(\.[09]*)?|\.[09]+).
,
2003.04.12. ,
,
, ,
, .
,
, ^$
num\s*=\s*$.
IP ,
,
.
, (, , )
. :
,
/* */.
HTML, <>,
<CODE>.
247
, HTML, su
per exciting a <I>super exciting</I> offer!.
.mailrc.
, :
alias _
:
1. .
2. ( ,
).
3. .
, ,
,
.
2\"3\". ,
.
,
.
, .
, (. .
[^"]), .
, ,
\.
( 175). , "([^"]|(?<=\\)")*",
2\"3\".
,
. , .
,
:
Darth Symbol: "/|\\" or "[^^]"
248
5.
Darth Symbol: "/|\\" or "[^^]"
\, ,
; ,
.
\, ,
\\ ,
. ,
\, ,
.
.
,
.
(\\.), ,
([^"]); ,
"(\\.|[^"])*". , !.. , .
,
,
( ) :
"You need a 2\"x3\" photo.
? ,
( 213). ,
, ,
,
:
2\"x3 \"
(\\.| [^"])
[^"] \,
.
.
,
, ,
.
,
,
. ,
, .
? . 235,
\, [^"] [^\\"].
, \
249
. "(\\.|[^\\"])*
. ,
;
301.
.
, ,
(,
).
, ,
( 184)
( 180) "(\\.|[^"])*+
"(?>(\\.|[^"])*) .
, ,
,
. , , ,
.
,
,
. ,
(
, ,
(
).
,
,
.
,
, .
, a, ,
( 140), .
,
.
,
, . ,
,
( 146).
, (?s:.),
.
250
5.
, ,
, .
, .
,
, ,
.
,
,
.
,
.
, .
,
.
:
s/^\s+//;
s/\s+$//;
* +,
.
,
. ,
,
.
s/\s*(.*?)\s*$/$1/s
,
,
( Perl 5 ). ,
\s*$,
.
s/^\s*((?:.*\S)?)\s*$/$1/s
, ,
.
^\s* , .*
.
\S
HTML
251
, ,
\s*$
.
,
, .
, .
s/^\s+|\s+$//g
.
( ),
,
,
.
/g
, .
/g,
,
. 4 .
,
. ,
,
,
.
s/^\s+//;
s/\s+$//;
,
.
HTML
2
HTML ( 97),
URL.
,
HTML.
HTML
HTML <[^>]+>
, Perl
:
$html =~ s/<[^>]+>//g;
,
>, HTML: <input
252
5.
<("[^"]*"|'[^']*'|[^'">])*>
,
:
<
(
"[^"]*"
|
'[^']*'
|
[^'">]
)*
>
# "<"
#
. . .
#
, ,
#
...
#
, ,
#
...
#
"" .
#
# ">"
. ,
, ,
,
.
,
,
.
,
+ *? , ,
, (,
alt=""). +, *,
[^'">]
()*. (
, ([^'">]+)*) ,
;
( 279).
,
: ,
, ,
253
HTML
( 178).
, , >
,
. ,
, .
, ,
.
(?>) ( *,
).
HTML
, URL .
:
< href="http://www.oreilly.com">O'Reilly Media</a>
<A> ,
. <A>
, ,
URL.
* <a\b([^>]+)>(.*?)</a>,
. <A>
$1, $2. , [^>]+
, .
,
.
<A> ,
. URL href=. HTML
,
/ (
).
,
Perl, $Html:
# while(...)
#
while ($Htm1 =~ m{<a\b(^>]+)>(.*?)</a>}ig)
{
my $Guts = $1; #
my $L1nk = $2; # .
if ($Guts =~ m{
\b HREF
# "href"
254
5.
\s* = \s*
(?:
"([^"]*)"
|
'([^']*)'
|
([^'">\s]+)
)
}i)
# "="
# ...
#
, .
#
...
#
, ,
#
...
#
#
#
{
my $Url = $+; # $1, $2,
# .
print "$Url with link text: $L1nk\n";
}
}
.
.
,
, .
,
> ,
=.
+,
href. He
,
+ . 252? ,
,
. ,
.
URL
$1, $2 $3.
. Perl $+,
$1,
$2..., .
URL.
Perl $+ ,
URL.
,
.
( 180),
VB.NET . 257 (, .NET
, $+
, 502).
HTML
255
HTTP URL
URL,
, , (
). URL
, URL ;
.
, URL,
. ,
^http://, / ( ),
: ^http://([^/]+)(/.*)?$.
, URL
,
: ^http://([^/:]+)(:(\d+))?(/.*)?$.
Perl URL:
if ($url =~ m{^http://([^/:]+)(:(\d+))?(/.+)?$}i)
{
my $host = $1;
my $port = $3 || 80;
# $3,
# ;
# 80
my $path = $4 || "/";
# $4, ;
# "/".
print "Host: $host\n";
print "Port: $port\n";
print "Path: $path\n";
} else {
print "Not an HTTP URL\n";
}
[^/:]+. , 2 ( 106)
[az]+(\.[az]+)*\.(com|edu||info).
, ?
,
.
(, , URL),
. , http://
, [^/:]+
.
2
, .
, ,
. ,
256
5.
,
.
, ,
ASCII, , .
,
[az09]|[az09][az09]*[az09],
. 2,
(com, edu, uk )
. ,
:
^
(?i) # .
# , ...
(?: [az09]\. | [az09][az09]*[az09]\. )+
# ...
(?: com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[az][az] )
$
,
: 63
. , [az09]*
[az09]{0,61}.
,
. , (com, edu
. .), .
,
. ,
, ai, :
http://ai/ .
cc, co, dk, mm, ph, tj, tv, tw.
,
(?:)+ (?:)*.
:
^
(?i) # .
# , ...
(?: [az09]\. | [az09][az09]{0,61}[az09]\. )*
# ...
(?: com|edu|gov|int|mil|net|org|biz|inf|name|museum|coop|aero|[az][az] )
$
,
. ,
,
, :
? ,
257
HTML
? , .
,
2
. , ,
.
VB.NET
HTML
, Html:
Imports System.Text.RegularExpressions
.
.
.
'
Dim A_Regex as Regex = New Regex(
_
"<a\b(?<guts>[^>]+)>(?<Link>.*?)</a>", _
RegexOptions.IgnoreCase)
Dim GutsRegex as Regex = New Regex( _
"\b HREF
(?# 'href'
)" & _
"\s* = \s*
(?# '=' )" & _
"(?:
(?# ... )" & _
" ""(?<url>[^""]*)"" (?# ,
)" & _
" |
(?# ...
)" & _
" '(?<url>[^']*)'
(?# ,
)" & _
" |
(?# ...
)" & _
" (?<url>[^'"">\s]+) (?#
)" & _
")
(?#
)" , _
RegexOptlons.IgnoreCase OR RegexOptions.IgnorePatternWhitespace)
' 'Html...
Dim CheckA as Match = A_Regex.Match(Html)
' ...
While CheckA.Success
' <>, URL.
Dim UrlCheck as Match = _
GutsRegex.Match(CheckA.Groups("guts").Value)
If UrlCheck.Success
' , URL/
Console.WriteLine("Url " & UrlCheck.Groups("url").Value & _
" WITH LINK " & CheckA.Groups("Link").Value)
End If
CheckA = CheckA.NextMatch
End While
,
.
258
5.
VB.NET, ,
Imports,
.
(?#),
VB.NET ,
#
(
#
).
# ,
&chr(10) ( 497).
( 137).
,
Groups("url") Groups(1), Groups(2) . .
URL
Yahoo! Finance,
.
,
HTML, (
http://finance.yahoo.com 10 ,
, ).
,
( )
, ,
, URL, , (
.
, ,
Yahoo! .
URL mailto,
http, https ftp. http://,
, URL,
http://[\w]+(\.\w[\w]*)+.
(
ASCII) [az09] \w.
\w ,
, ,
.
URL http:// mailto::
visit us at www.orei11y.com or mall to orders@orei11y.com
HTML
259
.
, :
(?i: [az09] (?:[az09]*[az09])? \. )+ #
# .com
(?i: m\b
| edu\b
| biz\b
| org\b
| gov\b
| in(?:t|fo)\b # .int .Info
| mil\b
| net\b
| name\b
| museum\b
| coop\b
| aero\b
| [az][az]\b #
)
(?i:) (?i:)
( 176). URL
www.OReilly.com, NT.TO (
Nortel Networks )
, ,
. URL (
, .com) ,
URL .
( URL,
), (
. .) . ,
(?i:) ,
, URL
, .
URL ,
:
\b
# URL (://_ _)
(
# ftp://, http:// https://
(ftp|https?)://[\w]+(\.\w[\w]*)+
|
#
_____
)
#
( : \d+ )?
260
5.
# URL / ...
(
/
)?
,
(. . http://www.oreilly.com/catalog/
regex/). ,
, .
2, , URL ,
URL. :
Read his comments at http://www.oreilly.com/ask_tim/index.html. He...
index.html URL
, index.html URL.
,
,
, .
2 ,
, URL
. Yahoo! Finance
,
, (
).
URL
\b
# URL (://_ _)
(
# ftp://, http:// https://
(ftp|https?)://[\w]+(\.\w[\w]*)+
|
#
(?i: [az09] (?:[az09]*[az09])? \. )+ #
# .com
(?i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t;fo)\b # .int .info
| mil\b
| net\b
| org\b
| [az][az]\b #
)
)
#
( : \d+ )?
# URL / ...
261
HTML
(
/
# ,
[^.!,?;"'<>()\[\]{}\s\x7F\xFF]*
(?:
[.!,?]+ [^.!,?;"'<>()\[\]{}\s\x7F\xFF]+
)*
)?
Java
String SubDomain = "(?i:[az09]|[az09][az09]*[az09])";
String TopDomains = "(?xi:com\\b
\n" +
"
|edu\\b
\n" +
"
|biz\\b
\n" +
"
|in(?:t|fo)\\b \n" +
"
|mil\\b
\n" +
"
|net\\b
\n" +
"
|org\\b
\n" +
"
|[az][az]\\b \n" + //
")
\n";
String Hostname = "(?:" + SubDomain + "\\.)+" + TopDomains;
String NOT_IN
= ";\"'<>()\\[\\]{}\\s\\x7F\\xFF";
String NOT_END
= "!.,?";
String ANYWHERE = "[^" + NOT_IN + NOT_END + "]";
String EMBEDDED = "[" + NOT_END + "]";
String UrlPath
= "/"+ANYWHERE + "*("+EMBEDDED+"+"+ANYWHERE+"+)*";
String Url =
"(?x:
\n"+
" \\b
\n"+
" ##
\n"+
" (
\n"+
"
(?: ftp ; http s? ): // [\\w]+(\\.\\w[\\w]+)+
\n"+
" |
\n"+
"
"+ Hostname + "
\n"+
" )
\n"+
" #
\n"+
" (?: :\\d+ )?
\n"+
"
\n"+
" # URL / \n"+
" (?: " + UrlPath + ")?
\n"+
")";
// , ,
Pattern UrlRegex = Pattern.compile(Url);
// url...
.
.
.
262
5.
,
; 2 (
. 105), . , Java (
) , .
,
. . 106 (
$HostnameRegex) . 261.
.
,
, .
,
,
(
).
,
, 5 .
, , ,
44. ,
:
03824531449411615213441829503544272752010217443235
\d\d\d\d\d
. Perl @zips = m/
\d\d\d\d\d/g;. ,
( ,
, Perl $_
110).
find .
,
, Perl.
\d\d\d\d\d. ,
,
; (
, ;
, ).
263
, \d\d\d\d\d 44\d\d\d ,
44,
,
, 44
. 44\d\d\d
5314494116.
, \A,
,
.
,
.
, ,
.
. ,
((44\d\d\d)).
(?:),
$1:
(?:[^4]\d\d\d\d|\d[^4]\d\d\d)*
:
, , 44 (
, [^4] [12359],
, ,
). ,
(?:[^4][^4]\d\d\d)*,
43210.
(?:(?!44)\d\d\d\d\d)*
,
, 44.
, ,
.
. (
44) (?!44) ,
.
(?:\d\d\d\d\d)*?
,
, (. .
, ,
, ).
(?:\d\d\d\d\d)
. *
264
5.
,
; ,
.
(44\d\d\d), :
@zips = m/(?:\d\d\d\d\d)*?(44\d\d\d)/g;
44,
(
@array = m//g Perl ,
; 375).
, ,
,
.
,
? !
,
,
. ,
,
,
.
:
03824 53144 94116 15213 44182 95035 44272 7 5 2 0 10217 443235
(
).
, ,
, . 44272
,
. ? , .
,
.
10217 44323.
,
,
.
,
.
( ,
, ) ,
265
(44\d\d\d) ,
?. ,
(?:(?!44)\d\d\d\d\d)* (?:[^4]\d\d\d\d|\d[^4]\d\d\d)*
( ,
). (44\d\d\d)?
, , .
. ,
, . ,
.
\G
,
\G ( 171).
,
,
. ,
\G ,
( Ruby
, \G ,
171).
,
@zips = m/\G(?:(?!44)\d\d\d\d\d)*(44\d\d\d)/g;
.
, ,
.
, ,
.
\d\d\d\d\d ,
44. Perl :
@zips = ( ); #
while (m/(\d\d\d\d\d)/g) {
$zip = $1;
if (substr($zip, 0, 2) eq "44") {
push @zips, $zip;
}
}
266
5.
\G
. 172,
Perl.
,
, ,
(CSV Comma Separated Values), ,
. ,
, CSV,
, .
CSV Microsoft
Excel, .1 Microsoft
CSV.
(. . ,
), (
" "").
:
Ten Thousand,10000, 2710 ,,"10,000","It's ""10 Grand"", baby",10K
:
TenThousand
10000
2710
10,000
It's"10Grand",baby
10
,
.
,
, [^",]+.
,
, .
"", " .
,
[^",]|"" "" "(?:[^"]|"")*" (
, (?:)
(?>),
317).
,
: [^",]+|"(?:[^"]|"")*".
1
CSV Microsoft
6 (330),
.
267
,
(. 146):
# , ...
[^",]+
# ......
|
# ... ( )
" #
(?: [^"] | "" )*
" #
,
CSV.
, , .
, "" ".
.
,
.
(), ""
". ,
,
. ,
,
, ,
:
# , ...
( [^",]+ )
# ......
|
# ... ( )
" #
( (?: [^"] | "" )* )
" #
, ,
. ,
"" ".
Perl, (
) Java VB.NET
( , 10
PHP 569). ,
$line,
( , !)
while ($line =~ m{
# ,
# ..
268
5.
( [^",]+ )
# ......
|
# ... ( "")
" #
( (?: [^"] | "" )* )
" #
}gx)
{
if (defined $1) {
$field = $1;
} else {
$field = $2;
$field =~ s/""/"/g;
}
print "[$field]"; #
# $field...
}
, , :
[TenThousand][10000][2710][10,000][It's"10Grand",baby][10K]
, , ,
. $field ,
(10,000). ,
.
, , [^",]+ [^",]*
, ?
. :
[TenThousand][][10000][][2710][][][][10,000][][][It's"10Grand",...
! ,
.
( )* ,
. ,
TenThousand ,10000.
( ),
,
. ,
,
,
( 171).
(
, ).
269
,
.
, .
.
1. .
,
.
2. ,
,
. , .
, .
( ),
,
(
, ).
^|, $|,
, .
:
(?:^|,)
(?:
# ,
# ...
( [^",]* )
# ......
|
# ... ( )
" #
( (?: [^"] | "" )* )
" #
)
,
:
[TenThousand][10000][2710][][][000][][baby][10K]
:
[TenThousand][10000][2710][][10,000][It's"10Grand",baby][10K]
? ,
,
... ? , .
. 224: ... (
(
, (
.
, [^",]* ,
270
5.
,
.
,
.
:
(?:^|,)
(:? # ( )
" #
( (?: [^"] | "" )* ) )
" #
|
# ... ,
# .
( [^",]* )
)
!.. ,
. ? ,
.
,
,
\G ,
,
. ,
,
. \G,
.
,
,
[TenThousand][10000][2710][][][000][][baby][10K]
[TenThousand][10000][2710][][]
,
.
. ,
,
.
^|,, (?<=^|,).
, 3 ( 175),
, .
, (?<=^|,)
(:?^|(?<=,)), ,
271
,
. ,
, ,
"10, 000".
, .
CSV Java
CSV,
Sun java.util.regex.
8 ( 476).
import java.util.regex.*;
..
.
String regex = // group(1)
// group(2)
"\\G(?:^|,)
\n"+
"(?:
\n"+
"
# ...
\n"+
"
\" #
\n"+
"
( (?: [^\"]++ | \"\" )*+ )
\n"+
"
\" #
\n"+
" | # ......
\n"+
"
# , ... \n"+
"
( [^\",]* )
\n"+
")
\n";
// .
Matcher mMain = Pattern.compile(regex, Pattern.COMMENTS).matcher("");
// ""
Matcher mQuote = Pattern.compile("\"\"").matcher("");
.
..
// ,
mMain.reset(line); // CSV
// line
while (mMain.find())
{
String field;
if (mMain.start(2) >= 0)
field = mMain.group(2); //
//
else
// ,
field = mQuote.reset(mMain.group(1)).replaceAll("\"");
// ...
System.out.println("Field [" + field + "]");
}
272
5.
, ,
( ).
(?=$|,)
.
? ,
,
,
,
.
,
,
( 180). (?:[^"]
|"")* (?>[^"]+|"")*. VB.NET
.
( 184),
Java Sun,
. Java.
.
, . 330.
CSV
CSV ,
Microsoft, ,
.
.
, ; (
?).
,
.
\ (. .
"" \").
, \
( ).
.
;
\s*, . .
(?:^|,\s*.
273
( 249) [^"]+|"" [^\\"]+|\\.. ,
s/""/"/g s/\\(.)/$1/g
.
CSV VB.NET
Imports System.Text.RegularExpressions
.
.
.
Dim FieldRegex as Regex = New Regex( _
"(?:^|,)
" & _
"(?:
" & _
"
(?# ...)
" & _
"
"" (?# )
" & _
"
( (?> [^""]+ | """" )* )
" & _
"
"" (?# )
" & _
" (?# ... ...)
" & _
" |
" & _
" (?# ... , ...) " & _
"
([^"",]*)
" & _
" )", RegexOptions.IgnorePatternWhitespace)
Dim QuotesRegex as Regex = New Regex(" "" "" ") ',
'
.
.
.
Dim FieldMatch as Match = FieldRegex.Match(Line)
While FieldMatch.Success
Dim Field as String
If FieldMatch.Groups(1).Success
Field = QuotesRegex.Replace(FieldMatch.Groups(1).Value, """")
Else
Field = FieldMatch.Groups(2).Value
End If
Console.WriteLine("[" & Field & "]")
' ...
FieldMatch = FieldMatch.NextMatch
End While
6
Perl, Java, .NET, Python PHP (
,
. 188). ,
. ,
, .
,
.
.
. ,
, ,
. 4 5,
,
(, ,
,
). , ,
, .
,
,
.
, ,
. ,
, ,
,
, .
, ,
,
275
. ,
,
.
,
.
,
,
. , marty smarty
: m s (),
m m, a a . . (
). (
,
).
,
, ,
, , , . . ,
; ,
.
: ,
, , .
, , ,
.
, .
, ,
(,
,
).
, .
POSIX
,
( POSIX
). ,
. ,
, .
,
.
, ,
. . 249
276
6.
"(\\.|[^\\"])*" ,
. ,
\. ,
,
, .
( )
\\., ,
[^\\"].
, , ,
.
, ,
, ,
[^\\"] ,
a \\. . [^\\"] ,
(, , *,
, ).
6.1
.
, .
.
,
:
, POSIX ?
, ?
"(\\.|[^"\\])*"
"2\"x3\"
likeness"
"([^"\\]|\\.)*"
"2\"x3\"
likeness"
, ,
. 6.1. ( )
277
,
. 279. ,
, .
,
,
?
,
. "(\\.|[^"])*", ( 248)
.
\\
,
.
, ,
. , ,
, .
, [^"] ,
,
:
"You need a 2\"3\" photo."
, ,
, .
. 6.1 , *
,
( )
. ,
.
, , ,
, ,
[^"\\] (
) , [^\\"]+ ()*
.
.
* .
.
,
. . 6.2 , .
"(\\.|[^\\"])* (
. 6.2) ,
278
6.
"(\\.|[^"\\])*"
"2\"x3\"
likeness"
"(\\.|[^"\\]+)*"
"2\"x3\"
likeness"
"([^"\\]|\\.)*"
"2\"x3\"
likeness"
"([^"\\]+|\\.)*"
"2\"x3\"
likeness"
, ,
. 6.2. ( )
, *.
. 6.2 ,
.
+ ,
, , ,
*. *
,
,
,
(
).
6.1 , ,
*.
,
.
.
6.1.
"([^\\"]|\\.)*"
. . .
"([^\\"]+|\\.)*"
. . .
"makudonarudo"
16
13
17
"2\"x3\" likeness"
22
15
25
"very99 long"
111
108
112
279
. 277.
?
POSIX .
,
. ,
, ,
,
.
?
.
,
( POSIX
). , , ,
, .
( , ).
"(\\.|[^"])*"
"2\"x3\" likeness"
"makudonarudo"
"([^"]|\\.)*"
POSIX
. .
. .
32
14
22
48
30
28
14
16
40
26
109
111
325
216
86
124
86
124
86
124
, POSIX
,
( ).
( )
,
.
, .
,
. :
, POSIX .
, , , "verylong"
280
6.
(324 518 553 658 426 726 783 156 020 576 256,
325 ). , .
50 ...
.1
! ?
+, *,
,
.
. .
+, [^\\"] *,
([^\\"])*
. ,
. . ,
.
,
.
, .
([^\\"]+)* ,
+ * ,
. makudonarudo.
12 *,
[^\\"]+ (m a k u d o n a r u d )?
, *, [^\\"]+
(makudonarudo)? , *,
[^\\"]+ 5,
3 4 (makud ona rudo)? 2, 2, 5 3 (ma ku do
nar udo)? ...
, (4096 12
).
, POSIX
, ,
(
).
, ,
, !2 4096 12
,
1
;
.
: ,
n, 2n+1. 2n+1+2n.
281
20 . 30
, 40 . ,
.
! . POSIX
. , ,
. POSIX
,
. ,
,
. "No\"match\"here
8192 .
,
. ,
, . ,
.
, ,
:
, , , .
, .
, POSIX .
,
(
306).
, ,
.
,
( ,
). ,
4 ( 190).
,
.
, ,
.
. ,
,
, .
282
6.
. .
,
.
. ".*"
The name "McDonald's" is said "makudonarudo" in Japanese
. 6.3.
, .
(),
, A.
,
( 192) , ,
.
.* ,
* .
46 , .*, ,
46 ,
.
.*
".* " anese .
,
. ,
".*"
B
C
D
E
F
G
H
. 6.3. ".*"
POSIX NFA
283
( ), .
Japanes e. .
, A
B,
B C.
.
,
.
POSIX
POSIX
, ,
,
. ,
,
.
, .
, DEF FGH BCD,
, F
D.
I
.
, ,
( ), POSIX
D.
, .
".*"!,
.
, , ,
.
. 6.4. I
. 6.3. ,
D (
). ,
. 6.4
284
6.
".*"!
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
W
X
Y
. 6.4. ".*"!
, POSIX :
, POSIX , . .
.
,
I, ,
. , J, Q
V, ,
. , Y
, .
. 6.4,
.
[^"].
, ,
.
"[^"]*"! [^"]*
, .
285
. 6.5 , (
. 6.4). ,
. ,
.
"[^"]*"!
G
H
I
J
O
P
Q
T
U
V
W
X
Y
. 6.5. "[^"]*"!
,
.
makudonarudo ,
u|v|w|x|y|z [uvwxyz].
1, [uvwxyz]
( 34)
:
The name "McDonald's" is said "makudonarudo" in Japanese
u|v|w|x|y|z
,
204 . ,
,
1
,
,
.
286
6.
, .
, ,
,
.
,
,
.
.
,
.
,
,
.
.
,
.
, ,
.
^(a|b|c|d|e|f|g)+$ ^[ag]+$.
Perl,
. ( ,
) Perl :
use Time::HiRes 'time'; # time()
# .
$StartTime = time();
"abababdedfg" =~ m/^(a|b|c|d|e|f|g)+$/;
$EndTime = time();
prlntf("Alternation takes %.3f seconds.\n", $EndTime $StartTime);
$StartTime = time();
"abababdedfg" =~ m/^[ag]+$/;
$EndTime = time();
printf("Character class takes %.3f seconds.\n", $EndTime $StartTime);
(
),
.
.
. ,
,
.
.
287
, !
.
,
.
Perl
:
Alternation takes 0.000 seconds.
Character class takes 0.000 seconds.
:
,
. ,
, 10 10 000 000
, .
.
1/100 ,
.
.
,
. (
,
. Perl
;
.
, :
use Time::HiRes 'time';
# time()
# .
$TimesToDo = 1000;
#
$TestString = "abababdedfg" x 1000; #
$Count = $TimesToDo;
$StartTime = time();
while ($Count > 0) {
$TestString =~ m/^(a|b|c|d|e|f|g)+$/;
}
$EndTime = time();
printf("Alternation takes %.3f seconds.\n", $EndTime $StartTime);
$Count = $TimesToDo;
$StartTime = time();
while ($Count > 0) {
$TestString =~ m/^[ag]+$/;
}
$EndTime = time();
printf("Character class takes %.3f seconds.\n", $EndTime $StartTime);
288
6.
: $TestString $Count
. $TestString
x,
. Perl 5.8
:
Alternation takes 7.276 seconds
Character class takes 0.333 seconds
, 22 .
,
, .
,
:
$TimesToDo = 1000000:
$TestStrlng = "abababdedfg";
1000 , 1000
. ,
, :
Alternation takes 18.167 seconds
Character class takes 5.231 seconds
, . ?
, . . $Count
1000 , .
11 5 .
,
(
,
1000 ).
, ,
.
PHP
PHP,
preg:
$TimesToDo = 1000;
/* */
$TestString = "";
for ($i = 0; $i < 1000; $i++)
289
$TestString .= "abababdedfg";
/* */
$start = gettimeofday();
for ($i = 0; $i < $TimesToDo; $i++)
preg_match('/^(a|b|c|d|e|f|g)+$/', $TestString);
$final = gettimeofday();
$sec = ($final['sec'] + $final['usec']/1000000)
($start['sec'] + $start['usec']/1000000);
printf("Alternation takes %.3f seconds\n", $sec);
/* */
$start = gettimeofday();
for ($i = 0; $i < $TimesToDo; $i++)
preg_match('/^[ag]+$/', $TestString);
$final = gettimeofday();
$sec = ($final['sec'] + $final['usec']/1000000)
($start['sec'] + $start['usec']/1000000);
printf("Character class takes %.3f seconds\n", $sec);
:
Alternation takes 27.404 seconds
Character class takes 0.288 seconds
PHP
not being safe to rely on the system's timezone settings (
),
:
if (phpversion() >= 5)
date_default_timezone_set("GMT");
Java
Java
. ,
, :
import java.util.regex.*;
public class JavaBenchmark {
public static void main(String [] args)
{
Matcher regex1 = Pattern.compile("^(a|b|c|d|e|f|g)+$").matcher("");
Matcher regex2 = Pattern.compile("^[ag]+$").matcher("");
long timesToDo = 1000;
StringBuffer temp = new StringBuffer();
for (int i = 1000; i > 0; i)
temp.append("abababdedfg");
String testString = temp.toString();
// ...
long count = timesToDo;
290
6.
long startTime = System.currentTimeMillis();
while (count > 0)
regex1.reset(testString).find();
double seconds = (System.currentTimeMillis() startTime)/1000.0;
System.out.println("Alternation takes " + seconds + " seconds");
// ...
count = timesToDo;
startTime = System.currentTimeMillis();
while (count > 0)
regex2.reset(testString).find();
seconds = (System.currentTimeMillis() startTime)/1000.0;
System.out.println("Character class takes " + seconds + " seconds");
}
}
:
. ,
.
, (VM)
. JRE Sun
: ,
, ,
.
:
Alternation takes 19.318 seconds
Character class takes 1.685 seconds
:
Alternation takes 12.106 seconds
Character class takes 0.657 seconds
,
,
,
.
JIT (JustInTime, . .
, ),
,
.
Java ,
BLTN (Better Late Than Never ,
), ,
.
BLTN , ,
, .
, (,
291
), ,
(. . BLTN
).
:
//
for (int i = 4; i > 0; i )
{
long count = timesToDo;
long startTime = System.currentTimeMil1is();
while (count > 0)
regex1.reset(testString).find();
double seconds = (System.currentTimeMillis() startTime)/1000.0;
System.out.println("Alternation takes " + seconds + " seconds");
}
(, 10 ), BLTN
,
.
8 25%:
Alternation takes 11.151 seconds
Character class takes 0.483 seconds
Java
.
,
.
VB.NET
VB.NET:
Option Explicit On
Option Strict On
Imports System.Text.RegularExpressions
Module Benchmark
Sub Main()
Dim Regex1 as Regex = New Regex("^(a|b|c|d|e|f|g)+$")
Dim Regex2 as Regex = New Regex("^[ag]+$")
Dim TimesToDo as Integer = 1000
Dim TestString as String = ""
Dim I as Integer
For I = 1 to 1000
TestString = TestString & "abababdedfg"
Next
Dim StartTime as Double = Timer()
292
6.
For I = 1 to TimesToDo
Regex1.Match(TestString)
Next
Dim Seconds as Double = Math.Round(Timer() StartTime, 3)
Console.WriteLine("Alternation takes " & Seconds & " seconds")
StartTime = Timer()
For I = 1 to TimesToDo
Regex2.Match(TestString)
Next
Seconds = Math.Round(Timer() StartTime, 3)
Console.WriteLine("Character class takes " & Seconds & " seconds")
End Sub
End Module
:
Alternation takes 13.311 seconds
Character class takes 1.680 seconds
.NET Framework
, Regex
RegexOptions.Compiled ( 487).
:
Alternation takes 5.499 seconds
Character class takes 1.157 seconds
Compiled ,
( 3 ,
1,5 ). ,
, .
Ruby
Ruby:
TimesToDo=1000
testString=""
for i in 1..1000
testString += "abababdedfg"
end
Regex1 = Regexp::new("^(a|b|c|d|e|f|g)+$");
Regex2 = Regexp::new("^[ag]+$");
startTime = Time.new.to_f
for i in 1..TimesToDo
Regex1.match(testString)
end
print "Alternation takes %.3f seconds\n" % (Time.new.to_f startTime);
startTime = Time.new.to_f
for i in 1..TimesToDo
Regex2.match(testString)
293
end
print "Character class takes %.3f seconds\n" % (Time.new.to_f startTime);
:
Alternation takes 16.311 seconds
Character class takes 3.479 seconds
Python
Python:
import re
import time
import fpformat
Regex1 = re.compile("^(a|b|c|d|e|f|g)+$")
Regex2 = re.compile("^[ag]+$")
TimesToDo = 1250;
TestString = ""
for i in range(800):
TestString += "abababdedfg"
StartTime = time.time()
for i in range(TimesToDo):
Regex1.search(TestString)
Seconds = time.time() StartTime
print "Alternation takes " + fpformat.fix(Seconds,3) + " seconds"
StartTime = time.time()
for i in range(TimesToDo):
Regex2.search(TestString)
Seconds = time.time() StartTime
print "Character class takes " + fpformat.fix(Seconds,3) + " seconds"
Python
,
(
).
(
) .
:
Alternation takes 10.357 seconds
Character class takes 0.769 seconds
Tcl
Tcl:
set TimesToDo 1000
set TestString ""
for {set i 1000} {$i > 0} {incr i 1} {
append TestString "abababdedfg"
294
6.
}
set Count $TimesToDo
set StartTime [clock clicks milliseconds]
for {} {$Count > 0} {incr Count 1} {
regexp {^(a|b|c|d|e|f|g)+$} $TestString
}
set EndTime [clock clicks milliseconds]
set Seconds [expr ($EndTime $StartTime)/1000.0]
puts [format "Alternation takes %.3f seconds" $Seconds]
set Count $TimesToDo
set StartTime [clock clicks milliseconds]
for {} {$Count > 0} {incr Count 1} {
regexp {^[ag]+$} $TestString
}
set EndTime [clock clicks milliseconds]
set Seconds [expr ($EndTime $StartTime)/1000.0]
puts [format "Character class takes %.3f seconds" $Seconds]
:
Alternation takes 0.362 seconds
Character class takes 0.352 seconds
!
, . 188, Tl
/,
.
, , Tl .
. 298.
. :
. ( \d+)
,
,
;
. ,
,
,
. ,
, \A ( ),
,
.
295
,
.
( ,
),
, .
,
.
,
.
, ,
,
. ,
,
,
.
. \b\B ( ,
)
. ,
\b\B
,
.
. .
. ? , ,
, .
\b\B ,
.
, ,
\b\B ,
.
, ,
,
.1
\b\B,
.
,
296
6.
,
, .
, ,
,
.
.
,
. ,
.
:
1. .
.
, .
2. .
.
3. .
,
, 4.
,
:
(, S,
u, b, j, e, ... Subject) .
,
;
( ,
) ,
( );
. ,
, , $1, $2
.
,
, ,
.
297
4. . ,
. , POSIX
,
,
.
,
.
5. . ,
,
( 3).
6. .
,
, ,
.
,
.
,
.
,
, .
2 ( 85).
,
, :
while () {
if ($line
if ($line
if ($1ine
if ($1ine
if ($1ine
.
.
.
}
=~
=~
=~
=~
=~
m/^\s*$/ )
m/^Subject: (.*)/)
m/^Date: (.*)/)
m/^ReplyTo: (\S+)/)
m/^From: (\S+) \(([^()]*)\)/)
.
.
,
.
()
.
298
6.
,
. . 126,
: , (
.
, Perl awk,
.
;
,
.
.
, Tcl
, ,
. ,
, ,
,
, . 4
( 202), (,
this|that th(is|at))
. ,
, .
Tcl
/? Tcl
,
( 120).
.
2000 Usenet:
... Tcl
, .
, ,
;
.
.
Tcl
.
, .
299
(. .
)
. m/^Sub
ject:\Q$DesiredSubject\E\s*$/
,
. ,
,
.
,
, ,
.
( ,
)
, .
,
.
,
,
.
,
.
, .
, ,
,
.
,
.
.
, .
,
(, ,
).
(. . ,
).
GNU Emacs 20 , Tcl 30
, PHP 4000 . .NET
300
6.
15 ,
.
: ,
, ,
, ,
.
.
(New Regex, re.compile Pattern.compi
le .NET, Python java.util.regex).
3 ( . 128)
,
(,
)
. ,
. 289, 291 292.
.
.
/
( )
,
,
,
.
,
.
, ^Subject:(.*) Subject:
. ,
( (r(Moore),
( ,
). ,
.
,
(, Subject: :
t).
301
^Sub
ject:(.*), ,
. , ,
this|that|other th,
, . ,
,
th , ,
t h,
.
.
,
th(is|at) this|that.
// . 302.
^Subject:(.*)
,
. ,
.
,
, :\d{79}: (
81 ).
. 303.
,
,
,
.
/
,
, ^, ,
.
,
/ ,
. ,
, , ^(this|that)
, ^,
302
6.
^this|^that.
^(this|that) , , ^(?:this|that)
.
\A,
\G ( ).
, , ,
.* .+
,
^. (
/ (. ),
.
,
, .*
.+ ,
, . ,
(.+)X\1 X,
^ ,
"12342345".1
/
,
, $
( 169),
. ,
regex(es)?$
2 ,
.
.
//
(
/.
(
),
1
: Perl
10 ,
Perl (Jeff Pinyan)
2002 . , (.+)X\1
, .
, , $
(171).
303
. , this|that|other
, [ot];
,
. ,
.
(
, ,
.
, \b(perl|java)\.regex\.info\b
.regex.info.
,
.regex.info
.
,
.
\b(vb|java)\.regex\.info\b,
, ,
. ,
\b(w+)\.regex\.info\b,
.
,
( . 301),
,
.
, ,
abc ,
a, b, c.
,
.
*, + ,
(,
), ,
.
304
6.
,
.
,
.*
,
. ,
.
, .* (?:.)* ,
.* (?:.)*.
:
Java Sun 10% , Ruby
.NET . Python
50, PCRE/PHP 150 !
Perl , ,
.* (?:.)* (
, ).
, (?:.)*
.*, .
, ,
, ,
.
[.] \..
"(.*?)"
() ,
(").
( (
, ).
,
, .
:
,
, .
,
305
,
.
(, ['"]
['"](.*?)["'], (
, . 303).
. , . 304
,
Java Sun
10% . .NET Framework
2,5
, PCRE 150
. Perl (. .
).
? . 150
PCRE
,
, ,
. ,
,
.
,
, 10
Java 150 PCRE.
(?:.)* 11 , Java,
.* 13 !
Java Ruby
, Ruby
2,5 Java.
, Ruby
10% Python,
Python 20
Ruby.
,
Perl. ,
Perl 10% Py
thon. ,
.
306
6.
, . 279,
(, (.+)*)
.
, .
, ,
,
.
, 10 000 , .*?
, 10 000
,
. ,
,
.
(
). , Python 10 000
. ,
, ,
.
.
,
, , "(.*)", "(.)*", "(.)*?" "([^"
])*?".
,
. . 293.
.
,
, .
. ,
,
. , , ,
, .
,
307
.
,
, . ,
1/100 ,
.
.
;
. ,
, ,
.
(
).
( 184) .
,
(
,
).
4 ( 219), ^\w+:
Subject. \w+
, ,
: , \w+
. ,
^(?>\w+): ^\w++.
.
, , (
, (
,
.
,
,
. ,
.
308
6.
,
. .*
,
.
\d\d\d\d,
\d{4}.
?
,
? .
, \d{4}
, ,
. ? .
, Perl, Python, PHP/PCRE .NET
\d{4} 20% . ,
Ruby Java Sun \d\d\d\d
( ).
, ,
. !
==== ={4}. ,
, ,
====
.
(
/ ( 302), .
Python Java Sun,
==== ={4} 100 !
, Perl, Ruby .NET
====, ={4},
(
\d\d\d\d \d{4}).
:
,
(,
),
.
,
,
.
,
, Tcl. Tcl
309
, .
, .NET Framework ,
.
,
.
,
, .
.
.
.
,
( ,
). , x+ xx*
,
( 302) ( 302).
. ,
,
.
, :
(?=t) this|that
( 302)
, ,
t.
.
,
.
this|that. th,
th, th
,
. ,
th(?:is|at).
th ,
,
. , th(?:is|at)
th,
.
,
.
:
310
6.
,
, ;
,
, ;
, ,
,
(
);
, ,
;
,
;
( )
, .
, ,
.
: (000|999)$ Perl
.
.
( ). !? ,
/ ( 302).
Perl
,
.
,
,
,
.
,
.
,
,
.
. ( 127)
311
. ,
,
.
(, GNU Emacs Tcl)
, ,
, ( 299).
(, Perl)
,
,
. , Perl
( 418).
,
, (?:) ( 72).
, ,
.
,
( 304).
, :
.
, .*,
(.)*.
, ,
!
,
, ^.*[:]. He
,
,
, . ,
[.]
[*] (\.
\*).
( 146).
:
Perl, ,
^[Ff][Rr][Oo][Mm]: ^from:
. Perl
312
6.
,
.
,
.
, .*,
^ \A ( 169).
,
, . .
( 301)
.
, ,
:
,
. ,
,
, ,
,
.
xx* x+
x. {5,7}
{0,2}.
th(?:is|at) (?:this|that) ,
th .
, :
(?:optim|standard)ization. ,
,
.
, ,
( ^, $
\G).
, . ,
, .
313
^ \G
^(?:abc|123) ^abc|^123 ,
/ ( 301)
. ,
. PCRE ( ,
)
,
.
(^abc) ^(abc).
,
,
.
. ( PCRE,
Perl .NET) ,
( Ruby Sun Java)
.
, Python ,
. ,
Tcl ( 298).
$
. abc$|123$ (?:abc|123)$ ,
. Perl,
(
/ ( 302).
(|)$, ($|$).
,
.
, ^.*: ^.*?:;
,
. ,
?
( ),
, .
314
6.
.
, ,
. ,
.
, ,
,
,
( 304).
.
,
, .
,
, (,
^.*?: ^[^:]*:). ?
,
.
Perl, , (
.
.
: ,
, , January,
February, March . . ,
January|February|March|.
, ,
( 303)
. ,
,
.
. Perl,
, ,
HASH(0x80f60ac). ,
.
, , :
\b(?:SCALAR|ARRAY||HASH)\(0x[09afAF]+\).
,
. ? Perl
,
315
, ( 431),
. , (
( 302),
, (0
. ,
,
. , Perl
,
. , .
,
. ,
,
? , , \(0x(?<=(?:SCALAR||HASH)
\(0x)[09afAF]+\) ( ). \(0x
, (
)
, .
, \(0x,
. ,
(
, /
( 302). ,
, Perl
( 175), .
, Perl
\(0x , :
if (Sdata =~ m/\(0x/
and
Sdata =~ m/(?: SCALAR|ARRAY||HASH)\(0x[09afAF]+\)/)
{
#
}
\(0x ,
,
.
( ) ( ).1
. , ,
DBIx::DWIW, CPAN.
MySQL. (Jeremy Zawodny)
Yahoo!.
316
6.
( 302)
,
( 175).
,
.
, Jan|Feb||Dec
(?=[JFMASOND])(?:Jan|Feb||Dec).
[JFMASOND] .
,
. ,
,
,
(Java, Perl, Python, Ruby, .NET). ,
[JFMASOND]
Jan|Feb||Dec.
PHP/PCRE ,
pcre_study PCRE (ghb
S 567). , Tcl
( 298).
, [JFMASOND],
, , ,
.
,
?
[JFMASOND](?:(?<=J)an|(?<=F)eb||(?<=D)ec)
,
,
, . ,
, (
,
[JFMASOND] .
,
Jan|Feb||Dec
.
. ,
.
, .
317
Tcl!
,
. . 298
,
Tcl ,
.
, !
(?=[JFMASOND])
Tcl 100 .
PHP!
,
PHP, PHP
study ( S).
10 . 567.
( 180)
( 184) ,
. ,
^[^:]+:
, [^:]+,
, ,
. ^(?>[^:]+):
^[^:]++: ,
+ ( ).
,
( . 307,
).
,
, .
, ^.*: , ^(?>.*):
. .*,
, :.
,
:, .
. ,
318
6.
. ,
this|that th(?:is|at).
,
, th.
, ,
.
.
,
( 54, 224, 238, 269).
,
, ,
.
,
( 256)
(?:aero|biz|com|coop|). ,
,
,
,
? ,
(?:com|edu|org|net|)
.
,
. POSIX
,
.
.
: (?:com|edu||[az][az])\b com\b|edu\b|\b|[az][az]\b.
\b, ,
.
, ,
\b
, .
, .
, .
,
, , , .
, $OTHER* . 340.
319
,
,
. ,
, :,
(?:this|that): this:|that:,
,
( 312).
,
.
$ ,
( 313).
(?:com|edu|)$ ,
com$|edu$|$ (
Perl).
,
,
, .
,
.
. ,
, ( 280),
. ,
, .
, , *
, (1\2\)*.
, ,
, "(\\.|[^\\"]+)*", . ,
, !
.
1. , (\\.|[^\\"]+)*
.
,
.
: , ()*,
. , (),
,
( , ).
320
6.
2.
, .
, ,
, .
, .
, , .
, ,
.
, ().
(?:), ,
.
( 180)
( 184).
1:
"(\\.|[^\\"]+)*"
,
. ,
"hi" "[^\\"]+".
,
", [^\\"]
".
"he said \"hi there\" and left"
"hi there"
"[^\\"]+"
"[^\\"]+ \\.[^\\"]+"
"\"ok\"\n"
321
. 6.2.
,
, . :
[^\\"]+,
\\.[^\\"]+.
,
[^\\"]+( \\.[^\\"]+)*. ,
.
, ,
.
.
, .
, [^\\"] .
[^\\"]+( \\.[^\\"]+)*,
, +
( +)*.
, , "[^\\"]+
( \\.[^\\"]+)*". ,
. 6.2. ,
[^\\"]
.
,
,
.
: "[^\\"]*( \\.[^\\"]*)*".
? ,
?
, ,
. , "\"\"\"".
,
,
. ,
, ?
? ?
"[^\\"]*( \\.[^\\"]*)*" .
"[^\\"]*
;
, .
. ( \\.[^\\"]*)*"
()*, . ,
. ,
, "[^\\"]*", ,
.
322
6.
, ( \\.[^\\"]*)* ,
"[^\\"]*( \\.[^\\"]*)*".
[^\\"]* (
"[^\\"]* \\."),
. (
, ),
,
.
,
:
"[^\\"]*( \\.[^\\"]*)*"
,
: "[^\\"]*(\\.[^\\"]*)*".
,
, ,
.
:
,
.
:
* ( *)*
"[^\\"]*( \\.[^\\"]*)*"
, :
1. .
, ,
.
, [^\\"],
\\.,
\,
.
, \\. [^"] ,
"Hello \n", .
, ,
,
, .
m a k u d o n a r u d o ( 280) .
( PO
SIX )
. ,
, .
323
,
,
,
()* .
,
, .
,
; ,
.
2. .
,
(
).
,
( *)*,
(*)*.
, (\\.)*
. "[^\\"]*
( (\\.)* [^\\"]*)*" "Tubby ( )
[^\\"]* Tubby
. ,
.
3. .
, ,
.
,
Pascal {} .
\{[^}]*\},
: (\{[^}]*\}|+)*.
:
\{[^}]*\}
*(
*)*, : (\{[^}]*\})*( + (\{[^}]*\})*)*.
:
{comment}{another}
+, + (
)
+, .
m a k u d o n a r u d o.
324
6.
,
,
, ()*
.
.
, ,
+,
(,
, )
(+)*
. ,
.
, ,
.
: ,
.
( ),
()*.
,
,
:
*( \{[^}]*\}*)*
, Pascal
, ,
.
(,
),
,
.
( (*)*) ,
:
(Re:*)* Re:
(,
Subject:Re:Re:Re:hey).
(*\$[09]+)* (,
).
(.*\n)+ .
: ,
, ,
.
325
,
,
. Re:,
\$, (
) \n.
2:
,
.
, ,
. ,
( \\.|[^\\"]+)*
. ,
, ,
[^\\"]+. \\.
,
.
,
( )
.
, [^\\"]+
,
, . ,
( )
[^\\"]+. , [^\\"]+
,
, .
,
, 1:
"[^\\"]+( \\.[^\\"]+)*"
, , ,
, ,
. , ,
.
, ,
. ,
.
3:
,
,
.
326
6.
( www.yahoo.com),
, ,
, .
( 255),
(
) [az]+.
[az]+
, ,
.
, .
, [az]+(\.[az]+)*.
[az]+( \.[az]+)*,
!
,
.
[^\\"], (
\\., , "",
. "[^\\"]+
( \\.[^\\"]+)*" ,
1.
, ,
;
,
.
,
.
, . 1
,
,
,
. , ,
,
.
(
) :
.
, . .
, ( ,
).
, ,
,
, .
[^\\"]+ [^\\"]* .
327
,
.
. , "[^\\"]*(\\.
[^\\"]*)*" , .
:
. ,
, "([^\\"]|\\.)*"
.
.
. "[^\\"]*(\\.[^\\"]*)*"
,
[^\\"].
.
:
.
( POSIX ).
,
,
,
.
.
( 337). ,
,
.
,
.
"(\\.|[^\\"]+)*"
,
. , ;
, [^\\"]+
( ).
[]+ ( 303),
,
()*
.
, "(\\.|[^\\"]+)*" ,
, .
328
6.
,
( abc foo,
abc abc, , abc, abc abc).
, ,
.
( )
: ( 180)
( 184).
,
: "(\\.|[^\\"]+)*"
"([^\\"]+|\\.)*". ,
, .
,
,
, ,
. , ,
(
, ),
.
"([^\\"]+|\\.)*" .
,
. ?
, []+,
[]+
. ,
.
()*
, []+
, ,
.
,
.
?
.
Java Sun,
,
, .
,
,
, Sun
.
329
"([^\\"]+|\\.)*".
: "(?>[^\\"]
+|\\.)*". ,
(?>|)*
(|)*+
.
(|)*+
. ,
(?>|)* ,
. *
,
.
,
. ,
,
. ,
(|)*+
(?>(|)*).
, , (|)*+ (?>|)*,
,
(
. 220).
,
, , ,
.
4 . 213 :
<>
(
(?! </?> )
.
)*
</>
#
#
#
#
#
#
<>
...
<> </>...
...
( )
...
[^<], (?!</?B>)<,
:
<>
(?> [*<]* )
(?>
# <>
# ""...
# ...
330
6.
(?! </?> ) # <> </>,
<
# ""
[^<]*
# ""
)*
#
</>
# </>
,
.
,
( 235), ^\w+=([^\n\\]|
\\.)*. , :
^ \w+ =
# '='
# ( ) ...
(
(?> [^ \n\\]* )
# ""*
(?> \\. [^\n\\]* )* # ( "" ""* )*
)
,
,
.
CSV
5 CSV.
. 270:
(?:^|,)
(?: # ( )
" # ( )
( (?: [^"] | "" )* )
" # ( )
|
# ...
...
( [^",]* )
)
\G
,
,
. ,
, , .
, CSV Microsoft, (?:[^"]
|"")* . ,
: [^"] "". ,
Perl
:
331
while ($line =~ m{
\G(?:^|,)
(?:
# ( "")
" #
( (?> [^"]* ) (?> "" [^"]* )* )
" #
# ..
|
# , ....
( [^",]* )
)
}gx)
{
if (defined $2) {
$field = $2;
} else {
$field = $1;
$field =~ s/""/"/g;
}
print "[$field]"; #
$field...
}
, ,
.
. /*,
*/ (
). ,
,
,
. ,
,
.
...
, ,
90 .
,
,
.
Perl ,
: /\*.*?\*/ .
,
, ,
332
6.
. ,
, Perl
( ,
, 50% 3,6 ).
Perl
,
50% 5,5 .
, Perl
/\*.*?\*/.
,
?
,
. ,
:
60 !
,
, .
,
, , ,
, .
, ,
*/ .
/\*[^*]*\*/ ,
/** some comment here **/,
*. .
, , /\*[^*]*\*/
,
, *, .
\ .
, ,
//, /**/.
/\*[^*]*\*/ /x[^x]*x/.
,
.
5 ( 246) ,
:
1. .
2. ( ,
).
333
3. .
, /xx/ .
, ,
.
,
,
, .
,
,
(?:(?!x/).)*. ,
(, x/)*.
,
/x(?:(?!x/).)*x/. ,
( ,
, ).
,
, ,
. ,
, /x.*?x/
.
.
/?
. x
. ,
x, x ,
/. , ,
:
, x: [^x]
x, /: x[^/]
([^x]|x[^/])*,
/x([^x]|x[^/])*x/.
, .
, /
, , x.
, ,
:
, /: [^/]
/, x: [^x]/
([^/]|[^x]/)*,
/x([^/]|[^x]/)*x/.
, .
/x([^x]|x[^/])*x/. /fooxx/
foo x x[^/],
334
6.
. xx/, x
.
x/ ( ,
).
/x([^/]|[^x]/)*x/, /x/
foo// ( ,
).
, / (
). , ,
, , /x([^/]|[^x]/)*x/
years = days /x divide //365; / assume nonleap year x/
(
).
.
, x[^/] xx/
, /x([^x]|x+[^/])*x/.
, + x+[^/]
x, , /.
, , /
x. x+
x, ,
, .
, :
/ / foo() / /
, ,
: .
x, ,
/, , , /,
x, : x+[^/x].
, xxx/
x , .
x,
, xxx/
. ,
,
x, +: x+/.
: /x([^x]|x+[^/x])*x+/.
! , ?
, x,
: /\*([^*]|\*+[^/*])*\*+/. ,
.
335
. 332,
, ,
:
x, /: x[^/]
/, x: [^x]/
. , ?
, ,
regex. x,
,
x[^/]. ,
,
, , regex.
.
,
x, /
x(?!/). ,
x([^/]|$).
, x,
.
, /, x,
(?<!x)/.
(^|[^x])/.
,
.
,
. . 6.3 ,
.
, *
. ,
( ) .
.
, ,
.
336
6.
6.3.
*( *)*
[^x]*x+
x
, [^/x]
( x)
/x
,
:
.
(
. 326): /x[^x]*x+,
()*. ,
, , x
, .
, .
(, x), ,
x. ,
.
/x[^x]*x+([^/x][^x]*x+)*
. , ,
/**/, /xx/.
x \* (
x *):
/\*[^*]*\*+([^/*][^*]*\*+)*/
,
.
, .
( egrep)
. , , ,
, ,
.
.
,
337
. ,
:
const char *cstart = "/*". *cend = "*/";
, ,
,
. , Perl
:
$prog =~ s{/\*[^*]*\*+(?:[^/*][^*]*\*+)*/}{}g; # C
# ( !)
$prog,
, (. . ).
,
, :
char *CommentStart = "/*": /* */
char *CommentEnd = "*/"; /* */
,
, , .
.
( ),
.
, ,
.
,
, ...
.
:
$COMMENT = qr{/\*[^*]*\*+(?:[^/*][^*]*\*+)*/}; #
#
$DOUBLE = qr{"(?:\\.|[^\\"])*"};
#
#
$text =~ s/$DOUBLE|$COMMENT//g;
. ,
$DOUBLE|$COMMENT ,
Perl qr//.
338
6.
3 ( 135), ,
, .
Perl qr//,
,
.
,
. 2 ( 127) ,
.
m// s///,
( 101); .
, $DOUBLE
.
, $DOUBLE,
.
. ,
:
,
; ...
,
; ...
.
.
. ,
, .
, :
$COMMENT = qr{/\*[^*]*\*+(?:[^/*][^*]*\*+)*/}; #
#
$DOUBLE = qr{"(?:\\.|[^\\"])*"};
#
#
$text =~ s/($DOUBLE)|$COMMENT/$l/g;
$1 ,
.
, $1 .
$1. ,
,
(
,
). ,
339
$1 ,
, , .1
, ,
('\t' . .).
.
C++/Java/C# ( //),
//[^\n]*, :
$COMMENT = qr{/\*[^*]*\*+(?:[^/*][^*]*\*+)+/}; #
$COMMENT2 = qr{//[^\n]*};
# C++ //
$DOUBLE = qr{"(?:\\.|[^\\"])*"};
#
$SINGLE = qr{'(?:\\.|[^'\\])*'};
#
$text =~ s/($DOUBLE|$SINGLE)|$COMMENT|$COMMENT2/$1/g;
:
(
) . :
16 ,
500 000 , Perl
16.4 . ? , .
=
,
.
.
,
.
,
. ,
, .
, ,
,
.
, ,
1
$1 Perl ,
undef.
undef ,
, . Perl
, ,
undef .
, no warnings;
Perl: $text =~ s/($DOUBLE)|$COMMENT/defined($1) ? $1 : ""/ge;
340
6.
. ,
,
[^'"/] .
, ,
[^'"/]+.
, ,
. , ()*
,
( ,
).
:
$OTHER = qr{[^"'/]}; # ,
#
.
.
.
$text =~ s/($DOUBLE|$SINGLE|$OTHER+)|$COMMENT|$CQMMENT2/$1/g;
, , +
$OTHER ( , $OTHER).
.
75%!
.
, (,
/ 3.14).
.
.
, $OTHER+,
.
POSIX ,
,
.
, ,
, ,
, ?
, ,
$OTHER,
.
$OTHER* , ,
$OTHER
/g.
.
,
,
,
341
.
,
.
, $OTHER,
,
, ,
$OTHER (,
) .
, , , $OTHER
, ,
. $OTHER
*, !
:
($OTHER+|$DQUBLE$OTHER*|$SINGLE$OTHER*)|$COMMENT|$COMMENT2
, regsub,
5%.
. $OTHER*
, $OTHER+ (
)
:
1. s///g,
.
2. .
:
2, $OTHER* ?
, ,
, (. . )
().
, $OTHER+
, ? ,
,
, .
.
,
.
,
.
! ,
, ,
342
6.
,
. SINGLE DOUBLE :
$DOUBLE = qr{"[^\\"]*(?:\\.[^\\"]*)*"};
$SINGLE = qr{'[^'\\]*(?:\\.[^'\\]*)*'};
15% .
16,4 2,3
!
,
.
( $DOUBLE) ,
. ,
(, ),
.
Perl qr//,
,
.
, ,
.
3 . 135.
, .
:
([^"'/]+|"[^\\"]*(?:\\.[^\\"]*)*"[^"'/]*|'[^'\\]*
(?:\\.[^'\\]*)*'[^"'/]*)|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^\n]*
: !
, ,
. GNU Emacs
,
dont, Im, well . ., .
,
\<\w+ Emacs '([tdm]|re|ll|ve).
, , \<\w+
\w.
\w, \w+ ;
,
(
). \w 10 .
, ,
. ,
.
7
Perl
Perl , .
,
, ,
,
Windows, UNIX Mac.
Perl
,
. Perl Perl
. Perl
,
. , , !
Perl!
.
100 000 , :
% perl pi e 's{([+]?\d+(\.\d*)?)F\b}{sprintf "%.0fC".($132)*5/9)eg' *.txt
*.txt
(
2).
Perl1,
,
. , ,
, , , ,
1
Perl 5.8.8.
344
7. Perl
Perl ( 2, ,
, ).
,
, ,
. ,
Perl
Programming Perl, OReilly.1
Perl . ,
.
. Perl
, ,
Perl, ,
,
. ,
.
. ,
.
, ,
.
,
:
Perl ( 347)
,
Perl,
, .
rl ( 355)
Perl,
.
,
,
.
.
Perl :
qr// ( 366)
( 370)
( 383)
( 386)
Perl ( 392)
Perl,
., ., . Perl, 3
. . . .: , 2002.
345
Perl
.
Perl ( 416) ,
. Perl
,
, 6. ,
, Perl,
,
Perl; .
Perl
Perl :
2 Perl
.
3 Perl ( 120),
,
Perl, , ( 140),
( 145) ( 149).
4 ,
Perl,
Perl.
5 ,
4. Perl,
Perl.
6 Perl,
.
,
Perl,
,
.
Perl.
Perl
, .
Perl
,
, Perl.
Perl
,
, Perl
, . 7.1.
346
7. Perl
Perl ,
.
7.1. Perl
m// (370)
s/// (381)
qr// (366)
split() (386)
quotemeta (351)
reset (372)
study (429)
/x /o
(354, 417)
/s /m /i
(354)
/g /c /e
(362)
$1, $2 . .
$^N $+
/ $1, $2...
@ @+
$` $& $'
,
( .
Perl, 426)
$_
(372)
$^R
(365)
Perl
, , Perl.
,
,
Perl
347
, .
Programming Perl , Perl
, .... , m/
/
, ,
.
Perl
Perl
. ,
,
,
, .1 ,
, , .
Programming Perl ,
, : ...
. Perl
,
.
Perl
. 7.2
Perl. Perl ,
,
Perl.
3, Perl
, ( . 7.2
).
7.2. Perl
151 (C)
\ \b \e \f \n \r \t \ \x \x{} \c
155
: [] [^] (
POSIX [:alpha:] 166)
156
, : (
/s )
157
: \
,
!
348
7. Perl
157
(
): \C
158 (C)
: \w \d \s \W \D \S
159 (C)
, : \p{},
\P{}
169
/ : ^ \A
169
/ : $ \z \Z
379
: \G
174
: \b \B
175
176
: (?).
: s m i (354)
177
(?:)
177
: (?#) # ( /x,
# )
, ,
178
: () \1 \2 ...
178
: (?:)
180
: (?>)
181
: |
182
: (?if then|else) if
,
()
183
184
393
: (?{})
393
: (??{})
351 (C)
: $ @
351 (C)
: \l \u
352 (C)
: \U \L \
352 (C)
: \Q \E
351 (C)
: \M{}
(C)
Perl
349
.
\b Backspace ()
; ( 174).
, .
\x
( ,
, ). \x{}
.
\w, \d \s .
\s ASCII
( 151).
Perl 4.1.0.
.
Is,
( 164). In,
,
.
\p{L&},
350
7. Perl
. 7.2
.
m//,
.
, /
. Perl ,
,
.
.
, (
. $num 20,
m/:.{$num}:/ :.{20}:.
.
;
, \U\E
. : m/abc\Uxyz\E/
abcXYZ. ,
abcXYZ, ,
: $tag
title, m{</\U$tag\E>} </TITLE>.
?
.
:
$MatchFleld = "^Subject:"; #
.
.
.
if ($text =~ $MatchFleld) {
.
.
.
$MatchField =~,
.
,
\Q\E .
:
$text =~ $MatchFleld
$text =~ m/$MatchF1eld/
Perl
351
.
,
$MatchField.
, ,
, \U\E $var
(
354).
(
) .
. 418.
:
. ,
$ @, .
,
,
(, $",
).
Perl %,
,
% .
.
use charnames ':full';,
\N{}. ,
\N{LATIN SMALL LETTER SHARP S} .
, Perl, Uni(
codeData.txt unicore:
use Config;
print "$Config{privlib}/unicore/UnicodeData.txt\n";
.
\l \u
.
. , $title
mr., m/\u$title/ Mr..
Perl lcfirst()
ucfirst().
352
7. Perl
.
\L \U
\. ,
$title m/\U$title\E/
MR..
Perl lc() uc().
: m/\L\u$title\E/ ,
Mr.
.
. \Q
(. .
\)
\. :
, (
( , \U , ,
\). ,
\ (
, \F \H). \Q\E
unrecognized escape.
,
\Q\E
,
. , $title Mr.,
m/\Q$title\E/ Mr\.,
, $title ,
.
. ,
m/\Q$UserInput\E/i
, $UserInput ( ,
).
, \Q\E,
Perl quotemeta().
. (overloading)
.
,
. , . 409.
( )
Perl ,
Perl
353
.
(m//, s/// qr//),
,
. .
m!!
m{}
m,,
m<>
s| | | m[]
qr##
m()
.
,
(. .
).
,
m() m[] , . ,
/x :
m{
#
#
};
,
( ,
, ). :
s{}{}
s{}!!
s<>()
s[]//
.
. 384.
( .)
(
). ,
( 372).
. 350,
.
, .
m'' ,
(, \Q\E) ,
\N{}. m''
, @,
.
354
7. Perl
/ ?,
m. ,
:
$text =~ m//;
$text =~ //;
m.
,
Perl
. Perl ,
.
:
1. (/i
. .). /x,
.
2. .
3. ,
.
;
.
,
\N{}.
4. . . (
\Q\E).
5. .
;
Perl. 2
, , ,
this$|that$ .
Perl
,
(, i m//i, s///i qr//i).
,
. 7.3.
, 3,
( 176) ( 177).
rl
355
, ,
, ( ,
, ).
7.3. ,
/i
145
/x
146
/s
146
/m
147
/o
418
, /o,
. , . 418.
,
, .1 ,
/
m/<title>/i, m|<title>|i, m{<title>}i
m<<title>>i.
/, /i.
rl
Perl,
.
.
. Perl ,
. , Perl ,
while ,
print . Perl
,
.
.
. Perl
1
, ,
. , learn/by/osmosis
( , learn).
osmosis
( /e !) ,
.
356
7. Perl
.
; ,
.
, $1 ,
, .
Perl,
.
: , (
. ,
. ,
,
.
. (
.
:
$s = _1;
@a = _2;
$s (. .
, ), _1, ,
. ,
@ ,
_2 .
,
. .
, localtime
, , , , . .
Mon Jan 20 22:05:15 2003.
: (, <MYDATA>)
,
() .
Perl
, .
, m//
/, .
.
,
Perl , ,
,
rl
357
.
, Perl
.
,
, .
, @ = 42 @ = (42).
,
. , :
$var = ($this, &is, 0xA, 'list');
$var , 'list'.
$var = @array $var .
Perl ( ),
,
. ,
.
.
Perl : (priva(
te). my().
,
.
,
() .
,
Perl, my
, my.
,
,
$1, $_ @ARGV.
,
my, . Perl
, ,
.
$Debug Acme::Widget
$Acme::Widget::Debug,
.
use strict;, ( )
, ,
358
7. Perl
our ( our , ;
Perl).
,
. ,
, . ,
Perl ,
,
.
(
, .
,
,
. , Acme::Widget
,
$Acme::Widget::Debug.
:
..
.
{
local($Acme::Widget::Debug) = 1; #
# Acme::Widget
..
.
}
# $Acme::Widget::Debug
..
.
local
.
, local .
, local :
1. .
2. (undef ,
local).
3.
, local.
,
, ,
.
. ,
(
).
,
.
359
rl
, , , local.
local ,
, . 7.4.
lo
cal($SomeVar);
$SomeVar undef. ,
, .
7.4. local
Perl
{
my $TempCopy = $SomeVar;
$SomeVar = undef;
local($SomeVar); #
$SomeVar = 'MyValue';
..
.
$SomeVar = $TempCopy;
}
,
.
Use of uninitialized warnings. ,
Perl, w, , ,
. ,
,
w?
$^W ( ^W
, W, Control+W):
{
local $^W = 0; # .
UnrulyFunction();
}
# $^W.
local
$^W, .
$^W . Unru
lyFunction Perl $^W,
.
.
, local .
UnrulyFunction
$^W.
.
360
7. Perl
,
(. 7.4), local .
, ,
local my.1 my ,
.
,
(. . my ).
,
.
, UnrulyFunction.
$^W ,
,
( UnrulyFunction
Perl
$^W, , ).
local :
,
. , ,
, .
. ,
, local.
,
. local Perl
,
, (. .
).
, . local
,
, :
, !
, .
? .
. ,
, , , $&
1
Perl my ,
.
rl
361
( ) $1 (,
).
.
, , ,
, ,
.
, ,
(. . ),
, .
:
if (m/()/)
{
DoSomeOtherStuff();
print "the matched text was $1.\n";
}
$1
,
, DoSomeOtherStuff $1 .
, $1 ,
, , , ,
. ,
, print .
:
If ($result =~ m/ERROR=(.*)/) {
warn "Hey, tell $Config{perladmin} about $l!\n";
}
( Config
%Config, $Config{perladmin}
Perl.)
$1 , .
, %Config ;
,
. ,
$Config{},
.
$1,
$1,
. , ,
$Config{}, .
, local
.
362
7. Perl
local,
.
, my()
.
,
( , my
local). : local , my ,
, .
,
, ,
.
,
, ,
.
(. . , )
( ,
). . 7.5.
,
:
$&
, .
( $` $',
)
( . 426).
$&
, .
$`
,
(. . ).
/g , $`
, .
$`
.
$'
, (. .
).
"$`$&$'"
.1 $`
.
1
,
, (
, ), "$`$&$'"
, .
.
363
rl
7.5. ,
1 2
2 3
4 31
$`
Piis
$&
3.14159
$'
,roughly
$1
, 3.14159
$2
, undef
$3
, 3.14159
$4
, .14159
$+
$1, $2 . . .14159
$^N
$1, $2 . ., 3.14159
@
(6, 6, undef, 6, 7)
@+
$1,$2,$3,...
, , , . .
( : $0
,
).
,
,
.
,
s///. ,
( 393).
(
\1 ). (
$1 ? . 366.)
364
7. Perl
(\w+) (w)+.
, ,
, . ,
tubby.
$1 tubby, y:
+ ,
.
, (x)? (x?).
, $1
x, .
(x?)
.
, ,
x? .
, (x?) $1 x
. .
$1
$1
"::" =~ m/:(A?):/
"::" =~ m/:(\w*):/
"::" =~ m/:(A)?:/
undef
"::" =~ m/:(\w)*:/
undef
":A:" =~ m/:(A?):/
":A:" =~ m/:(A)?:/
":Word:" =~ m/:(\w)*:/ d
, ,
, .
(
),
, $1.
$+
$1, $2, ( ),
. :
$ur1 =~ m{
href \s* = \s*
(?: "([^"]*)"
| '([^']*)'
| ([^'"<>]+) )
}ix;
#
#
#
#
$+
$1, $2 $3 , undef.
(
),
.
rl
$^N
365
$1, $2...,
, (. .
$1, $2...,
).
( ),
.
. 413.
@ @+
(
) .
.
; ,
@ ( $[0])
, . ,
$text = "Version 6 coming soon?";
.
..
$text =~ m/\d+/;
$[0] 8,
( Perl
).
@+ ( $+[0])
.
9,
. , substr($text, $[0],
$+[0] $[0]) $&, $text
, ,
$& ( 426). @:
1 while $line =~ s/\t/' ' (8 $[0] % 8)/;
.1
. , $[1]
$+[1] $1, $[2]
$+[2] $2 . .
$^R.
.
1
:
.
,
.
, (142).
366
7. Perl
: if (?if then|else)
( 182) $^R.
,
, ,
,
. ,
,
.
/g
. , , $1
s///g
.
$1 ?
Perl , \1
( $1). $1
,
. , \1
,
,
,
\1. , \1,
,
.
: $1
? (
( 393),
. $1
, :
. , $1
,
.
qr//
2 ( 106) 6 ( 337)
qr//,
.
,
qr//
367
split,
.
,
(
).
. 352,
, qr{} qr!!. ,
/i, /, /s, /m /.
,
2 ( 106):
my $HostnameRegex = qr/[az09]+(?:\.[az09]+)*\.(?:com|edu|info)/i;
my $HttpUrl = qr{
http:// $HostnameRegex \b
#
(?:
/ [az09_:\@&?=+,.!/~*'%\$]* #
(?<![.,?!])
# [.,?!]
)?
}ix;
$HostnameRegex.
HTTP URL,
$HttpUrl.
, :
if ($text =~ $HttpUrl) {
print "There is a URL\n";
}
HTTP URL :
while ($text =~ m/($HttpUrl)/g) {
print "Found URL: $1\n";
}
, , $Host
nameRegex 5 ( 256):
my $HostnameRegex = qr{
# , ...
(?: [az09]\. | [az09][az09]{0,61}[az09]\. )*
# ...
368
7. Perl
(?: com|edu|gov|int|mil|net|org|biz|info||aero|[az][az] )
}xi;
( ^ $,
), ,
.
$HtppUrl .
qr// ,
. 355.
,
m// . ,
:
my $WordRegex = qr/\b \w+ \b/; # / !
.
.
.
if (Stext =~ m/^($WordRegex)/x) (
print "found word at start of text: $1\n";
}
, /x
$WordRegex, ,
( ) qr//
$WordRegex. ,
.
:
my $WordRegex = qr/\b \w+ \b/x; # !
.
.
.
if ($text =~ m/^($WordRegex)/) {
print "found word at start of text: $1\n";
}
:
my $WordRegex = '\b \w+ \b'; #
.
.
.
if ($text =~ m/^($WordRegex)/x) (
print "found word at start of text: $1\n";
}
, $WordRegex
. ,
$WordRegex ,
m//.
,
qr//
369
(, ,
$WordRegex /x).
,
:
my $WordRegex = '(?x:\b \w+ \b)' ; #
.
.
.
if ($text =~ m/^($WordRegex)/) {
print "found word at start of text: $1\n";
}
m//
^((?x:\b\w+\b)),
, .
,
,
(/i, /x, /m /s) ,
qr/\b\w+\b/x (?xism:\b\w+\b).
: (?xism:) /x ,
/i, /s /m . ,
qr// ( ,
).
,
( (?xism:) ).
,
, Perl . :
% perl e 'print qr/\b \w+ \b/x, "\n"'
(?xism:\b \w+ \b)
$HttpUrl . 367:
(?ixsm:
http:// (?ixsm:
# , ...
(?: [az09]\. | [az09][az09]{0,61}[az09]\. )*
# ...
(?: com|edu|gov|int|mil|net|org|biz|info||aero|[az][az] )
) \b #
(?:
/ [az09_:\@&?=+,.!/~*'%\$]* #
(?<![.,?!]) # [.,?!]
)?
)
370
7. Perl
.
.
6.
,
/, qr// ( 418).
$text =~ m//
Perl. Perl ,
(
) .
, ( 356), .
,
. ,
. :
,
,
,
:
=~
,
.
.
( ,
371
, ).
,
.
m// //.
m ,
/ ! (
, ).
m,
. , m
( 352).
,
, . 355.
/g /c,
.
,
qr//. :
my $regex = qr//;
..
.
if ($text =~ $regex) {
..
.
m//. :
, ,
. if
:
if ($text =~ m/$regex/) {
..
.
,
/g
( ,
m//, ,
,
397).
, m// ( m/$Some
Var/, $SomeVar
372
7. Perl
), Perl
.
,
( 366).
??
??
. :
m?? m??
, reset.
Perl 1 ,
,
,
, Perl.
?? ( //) m
: ?? m??.
, ,
=~, $text =~ m//. =~
;
,
, awk.
=~ m//
, ,
. , :
$text =~ m// ; # , .
if ($text =~ m // ) { #
#
..
.
$result = ( $text =~ m // ); # Sresult $text
$result = Stext =~ m// ;
# ; =~
# , =
$copy = $text;
# $text $result...
$copy =~ m // ;
# ... $result
( $copy = $text ) =~ m// ; #
$_, $_
=~ . , $_
.
373
,
$text =~ m//;
$text;
, .
~,
$text = m//;
: $_
; $text.
, :
$text = m/ /;
$text = ($_ =~ m//);
, , ,
( ).
, :
while (<>)
{
if (m//){
.
.
.
} elslf (m//){
.
.
.
.
=~ !~,
.
, ,
!~
(true false). :
if ($text !~ m//)
if (not $text =~ m//)
unless ($text =~ m//)
.
$1. , !~ ,
....
374
7. Perl
,
. ( 356)
/g.
/g
(,
if) :
if ($target =~ m//) {
#
.
.
.
} else {
#
.
.
.
}
:
my $success = Starget =~ m//;
.
.
.
if ($success) {
.
.
.
}
/g
/g
.
,
.
69/8/31:
my ($year, $month. $day) = $date =~ m{^ (\d+) / (\d+) / (\d+) $}x;
( $1, $2 $3.).
; .
, . ,
m/(this)|(that)/
375
.
undef.
,
/g (1).
, :
my @parts = $text =~ m/^(\d+)(\d+)(\d+)$/;
, (
). :
my ($word)
my $success
= $text =~ m/(\w+)/;
= $text =~ m/(\w+)/;
,
(
).
; ,
$success .
:
if ( my ($year, $month. $day) = Sdate =~ m{^ (\d+) / (\d+) / (\d+) $}x ) {
# ;
# $year .
} else {
# ...
}
(
my() =),
$1, $2 . .
( if), Perl
. 0 0 (. .
) .
/g
,
(
, ),
, /g,
.
:
my @nums = $text =~ m/\d+/g;
376
7. Perl
:
my $ip = join '.', map { hex($_) } $hex_ip =~ m/../g
:
my @nums = $text =~ m/\d+(?:\.\d+)?|\.\d+/g;
,
. , ,
:
my @Tags = $Html =~ m/<(\w+)/g;
@Tags HTML,
$Html (,
<).
. , Unix
, :
alias Jeff
alias Perlbug
alias Prez
jfriedl@regex.info
perl5porters@perl.org
president@whitehouse.gov
m/^alias\s+(\S+)\s+(.+)/m (
/g).
('Jeff', 'jfriedl@regex.info').
, /g. :
( 'Jeff', 'jfriedl@regex.info', 'Perlbug',
'perl5porters@perl.org', 'Prez', 'president@whitehouse.gov' )
/,
,
().
my $alias = $text =~ m/^alias\s+(\S+)\s+(.+)/mg;
Jeff $alias{Jeff}.
377
/g
m//g
, .
m//, ,
m//g
. m//g
.
, .
:
$text = "WOW! This is a SILLY test.";
$text =~ m/\b([az]+\b)/g;
print "The first alllowercase word: $1\n";
$text =~ m/\b([AZ]+\b)/g;
print "The subsequent alluppercase word: $1\n";
/g, :
The first alllowercase word: is
The subsequent alluppercase word: SILLY
/g
:
, ,
, ,
. /g
, ,
/g,
WOW.
while.
:
while ($ConfigData =~ m/^(\w+)=(.*)/mg) {
my($key, $value) = ($1, $2);
.
.
.
}
, (
, ) .
,
while .
/g,
/g .
:
while ($text =~ m/(\d+)/) { # !
378
7. Perl
print "found: $l\n";
}
/g,
. , $text IP
, ,
:
found:
found:
found:
found:
64
156
215
240
,
found: 64. /g
(\d+) $text, 64,
. /g
(\d+) $text
.
pos()
Perl ,
.
.
,
,
. /g
.
Perl
pos(). :
my $i = "64.156.215.240";
while ($ip =~ m/(\d+)/g) {
printf "found '$1' ending at location %d\n", pos($ip);
}
:
found:
found:
found:
found:
: ,
2 .
379
/g $+[0] ( @+ 365)
, pos .
pos() ,
$_.
pos() ,
.
, (,
/g). ,
, Yahoo!,
; 32 ,
.
,
^.{32} :
if ($logline =~ m/^.{32}(\S+)/) {
$RequestedPage = $1;
}
. ,
32
. , ,
:
pos($logline) = 32; # 32
# ...
if ($logline =~ m/(\S+)/g) {
$RequestedPage = $1;
}
, . (
,
.
33 \S,
, ,
,
. , \S+,
. ,
, .
\G
\G ,
.
, :
pos($logline) = 32; # 32 ,
# .
if (Slogline =~ m/\G(\S+)/g) {
380
7. Perl
$RequestedPage = $1;
}
\G :
, .
\G
3 ( 171), 5
( 265).
Perl \G ,
,
. , 6
CSV ( 329)
\G(?:^|,).
^ \G
, (?:|\G,).
, Perl ,
.1
/gc
m//g pos
. /g
/c,
. /c /g,
/gc.
m//gc \G
, .
HTML $html:
while (not $html =~ m/\G\z/gc) # ...
{
if ($html =~ m/\G( <[^>]+>
)/xgc) { print "TAG: $1\n"
}
elsif ($html =~ m/\G( &\w+; )/xgc) { print "NAMED ENTITY: $1\n" }
elsif ($html =~ m/\G( &\#\d+; )/xgc) { print "NUMERIC ENTITY: $1\n" }
elsif ($html =~ m/\G( [^<>&\n]+ )/xgc) { print "TEXT: $1\n"
}
elsif ($html =~ m/\G \n
/xgc) { print "NEWLINE\n"
}
elsif ($html =~ m/\G( .
)/xgc) { print "ILLEGAL CHAR: $1\n" }
else {
die "$0: oops, this shouldn't happen!";
}
}
,
HTML ( ).
1
\G, ,
\G
\G (. 302).
381
, (
/gc), (
\G). ,
.
pos $html
, .
, m/\G\z/gc,
. . \G (\z).
,
.
( ),
,
pos $html.
else;
, (
) ,
else .
(, <>),
.
. , \G(.) .
, ,
<script>:
$html =~ m/\G ( <script[^>]*>.*?</script> )/xgcsi
(, !)
, <[^>]+>,
, <[^>]+>
<script> .
/gc
3 ( 172).
pos:
pos .
pos
pos
m//
( pos )
undef
undef
m//g
pos undef
m//gc
pos
pos
undef ( , ).
382
7. Perl
,
Perl.
, , . ,
(. .
)
.
, .
:
,
, $1 @+ ( 362).
( 371).
m??, (
m??)
( , reset 372).
,
.
:
pos ( 378).
/o
,
( 421).
,
. ,
:
, (,
),
,
.
pos()
pos ,
, ,
/g. , \G.
383
, ( 371).
study
study ,
( ) .
study ( 429).
m?? reset
reset /
m?? ( 372).
!
,
.
while, if foreach
. , ?
while ("Larry Curly Moe" =~ m/\w+/g) {
print "WHILE stooge is $&.\n";
}
print "\n";
if ("Larry Curly Moe" =~ m/\w+/g) {
print "IF stooge is $&.\n";
}
print "\n";
foreach ("Larry Curly Moe" =~ m/\w+/g) {
print "FOREACH stooge is $&.\n";
}
. .
Perl, s///,
. :
$text =~ s///
, ,
. /g
,
, .
,
=~ ,
$_. m
, s .
384
7. Perl
,
,
pos .
:
( ),
, , .
,
. 355,
: /g /.
s///
,
m//.
(, <>),
(
). , s{}{}, s[]//
s<>'' .
,
.
/x /e:
$test =~ s{
... ...
} {
... Perl, ...
};
.
,
( 352), .
(
/g ), $1
. . .
,
:
,
(. . ).
/e,
, Perl,
.
,
.
385
. 383.
:
WHILE stooge is Larry.
WHILE stooge is Curly.
WHILE stooge is Moe.
IF stooge is Larry.
FOREACH stooge is Moe.
FOREACH stooge is Moe.
FOREACH stooge is Moe.
: print foreach
$& $_,
while. ,
m//g, ('Larry', 'Curly', ''),
.
$&,
, m//g
.
/e
/e ,
Perl ( eval{}).
Perl
, .
,
. :
$text =~ s/time/localtime/ge;
time
Perl localtime (. .
Mon Sep 25 18:36:51 2006).
,
$1 . . , URL
%,
.
, :
$url =~ s/(["azAZO9])/sprintf('%%%02x', ord($1))/ge;
:
$url =~ s/%([09af][09af])/pack("C", hex($1))/ige;
386
7. Perl
sprintf('%%%02x', d())
, pack("C", )
; Perl.
/e
( ), /e
. ,
. ,
Perl,
.
,
. ,
(,
). , $var ,
$var $var.
:
$data =~ s/(\$[azAZ_]\w*)/$1/eeg;
/e
$var, .
/e $1 ,
$var,
( ).
/e ,
$var .
.
,
/g.
^
, (
) .
(, if)
,
, .
split ( (
)
m//g ( 375). ,
, split
, .
387
. ,
split(/:/, $text) :
('IO.SYS', '225558', '951003', 'ash', 'optional')
: . split
, .
,
. ,
@Paragraphs = split(m/\s*<p>\s*/I, $html);
HTML $html ,
<p> <P>,
.
, ; :
@Lines = split(m/^/m, $lines);
.
,
, split , .
, ,
.
, .
// ,
. , split(//, "short test")
: ("s", "h", "", , "s", "t").
"" (, )
, m/\s+/
.
, split("", "ashorttest")
a, short test.
,
split.
split :
split (, _, )
.
( ).
388
7. Perl
split .
:
($var1, $var2, $var3, ) = split();
@array = split();
for my $item (split() ) {
..
.
}
()
,
,
. , //,
m{} ,
, .
, . 355.
,
(?:). ,
split .
( )
split
. ,
$_.
()
, split . ,
split(/:/, $text, 3) :
('IO.SYS', '225558', '951003:ash:optional')
, split
/:/,
.
,
.
,
,
;
,
. , split(/:/, $text, 99)
. , split(/:/,
$text) split(/:/, $text, 99) ,
.
389
, (
, .
,
:
('IO.SYS', '225558', '951003', 'ash:optional')
.
. ,
:
($filename, $size, $date) = split(/:/, $text);
Perl
.
,
.
, , split
, ,
:
.
split, ,
,
( , . . "").
:
@nums = split(m/:/, "12:34: :78");
:
("12", "34", "", "78")
: ,
. ,
.
. ,
@nums :
("12", "34", "", "78")
390
7. Perl
, ,
.
split
. , Perl
,
.
(
split ,
).
, ,
,
1,
: split(/:/, $text, 1)
, .
,
, grep{length} split. grep
, (. . ):
my @NonEmpty = grep { length } split(/:/, $text);
:
@nums = split(m/:/, ":12: :34: :78");
@nums :
("", "12", "34", "", "78")
,
. :
(. .
), /
. : split(/\b/,
"a simple test")
simple test . ,
, : ("", "", "simple", "",
"test"). ,
@Lines= split(m/^/m, $lines) . 387.
split
split
, ,
:
391
, split,
, .
split, ,
split(//, "short test") : ("s",
"h", "", , "s", "t").
, (
!), , .
/\s+/,
.
awk,
, ,
.
, m/
\s+/ . ,
1 .
,
, (
). , split
split('', $_, 0).
^
/m (
). $ .
m/^/m ,
. split m/^/m
.
split ,
, .
split
.
split $&, $', $1 . .
, split
.1
, ,
, ,
. (void) split
@_ (
,
split ). use warnings
w
split .
392
7. Perl
split
split
split.
, ,
, .
, , split
, .
, HTML split(/(<[^>]*>)/)
and<B>very<FONTcolor=red>very</FONT>much</B>effort
( 'and', '<B>', 'very', '<FONTcolor=red>',
'very', '</FONT>', 'much', '</B>', 'effort' )
, split(/<[^>]*>/)
:
( 'and', 'very', 'very, 'much', 'effort' )
( ,
, ).
.
,
undef.
Perl
,
, Perl.
, (
) , ,
, \A, \Z \z),
, \G .
Perl .
Perl ,
, Perl.
.
Perl,
.
,
Perl.
Perl
393
(??{ perl })
, Perl. (
,
) .
^(\d+)(??{ "{$1}" })$
.
,
X .
, 3 12XXXXXXXXXXXX, 3X
7XXXX.
3XXX, ,
(\d+) 3XXX,
$1 3.
,
"X{$1}" X{3}.
X{$1},
3XXX.
$ 3XXX
.
,
.
(?{ perl })
, ,
Perl ,
,
.
(
$^R 365).
, , ,
:
if (? if then | else) ( 182).
,
then, else.
,
. ,
; :
"have a nice day" =~ m{
(?{ print "Starting match.\n" })
\b(?: the | an | a )\b
}x;
394
7. Perl
,
.
,
.
.
Perl. , Perl
(
\b),
\< \>
, Perl.
, .
(
) .
Perl (
)
, ,
my ( 405).
(
,
).
.
,
, ,
.
,
, : \(([^()])*\).
,
( ,
).
:
my $Level0 = qr/ \( ( [^()] )* \) /x; #
.
.
.
if ($text =~ m/\b( \w+$Level0 )/x) {
Perl
395
substr($str, 0, 3),
substr($str, 0, (3+2))
. ,
, . .
.
,
, .
,
( [^()]),
, . ,
: $Level0.
, :
my $Level0 = qr/ \( ( [^()]
)* \) /; #
my $Levell = qr/ \( ( [^()]| $Level0 )* \) /: #
$Level0 ,
$Level1,
$Level0.
.
,
$Level2,
$Level1 (, , $Level0):
my $Level0 = qr/ \( ( [^()]
)* \) /; #
my $Levell = qr/ \( ( [^()]| $Level0 )* \) /: #
my $Level2 = qr/ \( ( [^()]| $Level1 )* \) /: #
:
my $Level3 = qr/ \( ( [^()]| $Level2 )* \) /; #
my $Level4 = qr/ \( ( [^()]| $Level3 )* \) /: #
my $Level5 = qr/ \( ( [^()]| $Level4 )* \) /: #
. 7.1
.
,
. $Level3:
\(([^()]|\(([^()]|\(([^()]|\(([^()])*\))*\))*\))*\)
.
, (
).
Level ,
: ,
$Level.
( :
396
7. Perl
\(
( [ ^ ( ) ]|
)* \)
\(
( [ ^ ( ) ]|
)* \)
\(
( [ ^ ( ) ]|
)* \)
\(
( [^()]
)* \)
. 7.1.
X ,
+1 ).
.
, $Level
:
.
,
.
,
,
.
.
$Level,
(
Perl ,
; , ,
, ). $Level
$LevelN,
(??{$LevelN}):
my $LevelN; # ,
# .
$LevelN = qr/ \ (( [^()] | (??{ $LevelN }) )* \) /;
, $Level0
:
Perl
397
, ? ,
, ,
.
.
( ,
), [^()] [^()]+ (
,
279).
, \( \) ,
.
, .
:
$LevelN = qr/ (?> [^()]+ | \( (??{ $LevelN }) \) )* /;
\(\) ,
$LevelN.
:
. , (
, ,
:
if ($text =~ m/\b( \w+ \( $LevelN \) )/x) {
print "found function call: $1\n";
}
if (not $text =~ m/^ $LevelN $/x) {
print "mismatched parentheses !\n";
}
$LevelN . 411.
.
,
POSIX. ,
(, ,
POSIX),
.
.
398
7. Perl
"abcdefgh" =~ m{
(?{ print "starting match at [$`|$']\n" })
(?:d|e|f)
}x;
:
starting
starting
starting
starting
match
match
match
match
at
at
at
at
[|abcdefgh]
[a|bcdefgh]
[ab|cdefgh]
[abc|defgh]
print "starting match at [$`|$']\n"
,
. $` $' ( 362)1,
,
|, ,
. ,
( 191).
(?{ print "matched at [$`<$&>$']\n" })
:
matched at [abc<d>efgh]
,
,
(?:d|e|f) [def]:
"abcdefgh" =~ m{
(?{ print "starting match at [$`;$']\n" })
[def]
}x;
,
:
starting match at [abc|defgh]
? Perl ,
[def]
1
399
Perl
( 303) ,
.
, , .
.
panic: top_env
,
,
panic: top_env
, .
Perl
. ,
.
Perl ,
,
.
Perl
. , ,
oneself . 225:
"oneselfsufficient" =~ m{
one(self)?(selfsufficient)?
(?{ print "matched at [$`<$&>$']\n" })
}x;
,
matched at [<oneself>sufficient]
,
oneselfsufficient.
: print
, .
,
. ,
,
.
(?!) ?
(?!) .
(
matched),
.
400
7. Perl
,
, :
matched at [<oneself>sufficient]
matched at [<oneselfsufficient>]
matched at [<one>selfsufficient]
,
. (?!) Perl
,
.
, ?
"123" =~ m{
\d+
(?{ print "matched at [$`<$&>$']\n" })
(?!)
}x;
:
matched
matched
matched
matched
matched
matched
at
at
at
at
at
at
[<123>]
[<12>3]
[<1>23]
[1<23>]
[1<2>3]
[12<3>]
,
. ,
. (?!)
.
,
,
(
4).
, , .
, (?!) ,
.
, ; .
.
, .
.
oneself:
$longest_match = undef; #
Perl
401
"oneselfsufficient" =~ m{
one(self)?(selfsufficient)?
(?{
# , ($&)
#
if (not defined($longest_match)
or
length($&) > length($longest_match))
{
$longest_match = $&;
}
})
(?!) # ,
#
}x;
# ,
if (defined($longest_match)) {
print "longest match=[$longest_match]\n";
} else {
print "no match\n";
}
, longest match=
[oneselfsufficient]. ,
,
(?!) :
my $RecordPossibleMatch = qr{
(?{
# , ($&)
#
if (not defined($longest_match)
or
length($&) > length($longest_match))
{
$longest_match = $&;
}
})
(?!) # ,
#
}x;
9938,
:
$longest_match = undef; #
"8009989938" =~ m{ \d+ $RecordPossibleMatch }x;
# ,
if (defined($longest_match)) {
print "longest match=[$longest_match]\n";
} else {
402
7. Perl
print "no match\n";
}
,
,
, .
POSIX ( 225).
.
,
( ),
,
.
Perl
, ,
, $longest_match
. (?{ defined
$longest_match}), ,
.
.
, if
if , then .
$RecordPossibleMatch:
"8009989938" =~ m{ $BailIfAnyMatch \d+ $RecordPossibleMatch }x;
800 ,
( POSIX).
local
local
. ,
Perl
403
( 357)
4,
,
( 202). (
) .
, \w+ \s+,
, \w+ \d+\b:
my $Count = 0;
Stext =~ m{
^ (?> \d+ (?{ $Count++ }) \b | \w+ | \s+ )* $
}x;
123abc739271xyz,
$Count 3. 123abc73xyz
2, 1.
, $Count
73 ( \d+),
,
\b. ,
,
.
(?>) ( 180) , ,
( 328)
. ,
\b \d+
.
\b
$Count, ,
.
local,
Perl . :
our $Count = 0;
$text =~ m{
^ (?> \d+ (?{ local ($Count) = $Count + 1 }) \b | \w+ | \s+ )* $
}x;
, $Count
my (
use strict, ,
our).
404
7. Perl
Perl
Perl
(?{})
(??{}) (
, $RecordPossib
leMatch . 400). ,
m{ (?{ print "starting\n" }) };
, :
my $ShowStart = '(?{ print "starting\n" })';
..
.
m{ $ShowStart };
,
,
,
Perl, .
.
,
use re 'eval';
( use re
; 431.)
, ,
Perl
.
\(\s*\?+[p{].
,
. \s+ ,
/x
( , ,
). \?,
. , p
(?p{},
(??{}).
, Perl
,
,
.
Perl
405
$Count.
: (
, , local, (
( (
). , 1cal($Count) = $Count+1
73 \d+,
$Count 2,
,
local. \b ,
local,
$Count 1.
.
, local , $Count
. (?{ print
"Final count is $Count.\n" }),
. $Count ,
( ,
, ).
:
my $Count = undef;
our $TmpCount = 0;
$text =~ m{
^ (?> \d+ (?{ local($TmpCount) = $TmpCount + 1 }) \b | \w+ | \s+ )* $
(?{ $Count = $TmpCount }) # $Count
#
}x;
if (defined $Count) {
print "Count is $Count.\n";
} else {
print "no match\n";
}
, ,
.
. 413.
my
my
,
, ,
.
, ,
406
7. Perl
,
, .
:
.
:
sub CheckOptimizer
{
my $text = shift; #
my $start = undef; #
my $match = $text =~ m{
(?{ $start = $[0] if not defined $start}) #
#
\d #
}x;
if (not defined $start) {
print "The whole match was optimized away.\n";
if ($match) {
# !
print "Whoa, but it matched! How can this happen!?\n";
}
} elsif ($start == 0) {
print "The match start was not optimized.\n";
} else {
print "The optimizer started the match at character $start.\n"
}
}
my,
$start (
). $start
, ;
,
$start , ,
.
$[0] ( @ 365).
,
CheckOptimizer("test 123");
:
The optimizer started the match at character 5.
, ,
:
The whole match was optimized away.
Whoa, but it matched! How can this happen!?
(
), . ,
Perl
407
. ?
$start,
, .
, $start, ,
,
my .
, my
(, )
my,
(
. 418). CheckOptimizer (
$start, $start
,
. , $start,
, ,
.
(closure).
Programming Perl Object Oriented Perl ,
. ,
Perl .
.
? my
, ,
, ,
, . ,
my $NestedStuffRegex SimpleConvert,
. 413, , ,
$NestedStuffRegex.
my , ,
.
. 394 ,
. ,
,
, .
:
, ,
.
.
,
:
408
7. Perl
my $NestedGuts = qr{
(?>
(?:
# ,
[^()]+
#
| \(
#
| \)
)*
)
)x;
([]+|)* ( 280),
$NestedGuts ,
. , $Nested
Guts m/^\( $NestedGuts \)$/x
(thisismissingtheclose,
,
.
:
:
(?{ local $OpenParens = 0 })
:
(?{ $OpenParens++ })
, 1 (
1 ). 0,
,
,
(?!), :
(?(?{ $OpenParens }) (?{ $OpenParens }) | (?!) )
(? if then |
else ) ( 182), if
.
,
. ,
,
:
(?(?{ $OpenParens != 0 })(?!))
Perl
409
, :
my $NestedGuts = qr{
(?{ local $OpenParens = 0 }) #
(?> #
(?:
# ,
[^()]+
#
| \( (?{ $OpenParens++ })
#
#
| \) (?(?{ $OpenParens != 0 }) (?{ $OpenParens }) | (?!) )
)*
)
(?(?{ $OpenParens != 0 })(?!)) #
# ,
}x;
$LevelN
( 396).
local
, $OpenParens
.
local ,
.
,
,
, $OpenParens
.
(overloading)
.
.
Perl \< \>,
. , ,
\b .
.
\< \>
(?<!\w)(?=\w) (?<=\w)(?!\w) .
(, MungeRegexLiteral),
:
410
7. Perl
sub MungeRegexLiteral($)
{
my ($RegexLiteral) = @_; #
$RegexLiteral =~ s/\\</(?<!\\w)(?=\\w)/g; # \<
$RegexLiteral =~ s/\\>/(?<=\\w)(?!\\w)/g; # \>
return $RegexLiteral; # (, )
}
\<,
(?<!\w)(?=\w). ,
s/// ,
, \w, \\w.
, (
, MyRegexStuff.pm) Perl:
package MyRegexStuff; #
use strict; #
use warnings; #
use overload; # Perl
#
sub import { overload;:constant qr => \&MungeRegexLiteral }
sub MungeRegexLiteral($)
{
my ($RegexLiteral) = @_; #
$RegexLiteral =~ s/\\</(?<!\\w)(?=\\w)/g; # \<
$RegexLiteral =~ s/\\>/(?<=\\w)(?!\\w)/g; # \>
return $RegexLiteral; # (, )
}
1; # . , use
# true
MyRegexStuff.pm Perl (.
PERLLIB Perl),
Perl, .
,
use lib '.';
#
use MyRegexStuff; # !
.
.
.
$text =~ s/\s+\</ /g; #
# .
.
.
.
use MyRegexStuff ,
, MyRegexStuff.pm
Perl
411
(, MyRegexStuff.pm
,
use MyRegexStuff , ).
MyRegexStuff.pm
x++ ( 184).
, , (. .
) .
,
+
. , *+ (?>(
* ) ( 220).
,
, \w \x{1234},
. ,
?+, *+
++ . $LevelN
(. 396) MungeRegexLiteral :
$RegexLiteral =~ s/( \( $LevelN \)[*+?] )\+/(?>$1)/gx;
!
:
$text =~ s/"(\\.|[^"])*+"//; #
,
. :
$RegexLiteral =~ s{
(
# ...
(?: \\[\\abCdDefnrsStwWX] # \n, \w .
| \\. # \
| \\x[\dafAF]{1,2} # \xFF
| \\x\{[\dafAF]*\) # \{1234}
| \\[]\{[^{}]+\} # \p{Letter}
| \[\]?[^]]+\]
#
| \\\W
# \*
| \( $LevelN \)
# ()
| [^()*+?\\]
#
)
# ......
(?: [*+?] | \{\d+(?:,\d*)?\} )
)
\+ # ... '+' .
}{(?>$1)}gx;
412
7. Perl
:
, +
(?>). ,
Perl.
,
. ,
,
Perl. ,
\(blah\)++,
, ++ , \).
. ,
,
(
, . 172).
,
. ,
,
.
, Perl
( ),
, .
( , ), , ,
. ,
,
. , m/
($MyStuff)*+/ MungeRegexLiteral
(()
()*+). $MyStuff .
,
.
\< \>, ,
,
.
, \< \>
. , ,
.
, \>,
Perl
413
\\>, \,
>.
,
. ,
/x,
.
,
(
\N{} 351).
,
,
. ( 180) Perl ,
$^N ( 365),
(,
Perl $^N ,
).
:
$Http
Url, . 367.
, $HttpUrl, $url.
$^N $1 (
) ;
$1 . , ,
:
my $SaveUrl = qr{
($HttpUrl)
# HTTP URL ...
(?{ $url = $^N }) # ... $url
};
$text =~ m{
http \s*=\s* ($SaveUrl)
| src \s*=\s* ($SaveUrl)
}xi;
$url URL.
(,
$+ 364), $SaveUrl
,
, URL
.
414
7. Perl
, ,
$url, ,
, . ,
, . 406.
. (?<Num>\d+) ,
\d+, %^N $^N{Num}.
Perl
%^N, ,
.
package MyRegexStuff;
use strict;
use warnings;
use overload;
sub import { overload::constant('qr' => \&MungeRegexLiteral) }
my $NestedStuffRegex; #
# . .
$NestedStuffRegex = qr{
(?>
(?: # , '#' '\' ...
[^()\#\\]+
# ...
| (?s: \\. )
# ...
| \#.*\n
# ,
# ...
| \( (??{ $NestedStuffRegex }) \)
)*
)
}x;
sub SimpleConvert($); # ,
#
sub SimpleConvert($)
{
my $re = shift; #
$re =~ s{
\(\?
# "(?"
< ( (?>\w+) ) >
# <$1 > $1
( $NestedStuffRegex ) # $2
\)
# ")"
}{
Perl
415
my $id = $1;
my $guts = SimpleConvert($2);
#
#
(?<id>guts)
#
#
(?: (guts) #
#
(?{
#
local($^N{$id}) = $guts #
#
# %^T
#
})
#
)
"(?:($guts)(?{ local(\$^T{'$id'}) = \$^N }))"
}xeog;
return $re; #
}
sub MungeRegexLiteral($)
{
my ($RegexLiteral) = @_; #
# print "BEFORE: $RegexLiteral\n"; #
my $new = SimpleConvert($RegexLiteral);
if ($new ne $RegexLiteral)
{
my $before = q/(?{ local(%^T) = () })/; #
#
my $after = q/(?{ %^N = %^T })/; #
# ""
$RegexLiteral = "$before(?:$new)$after";
}
# print "AFTER: $RegexLiteral\n"; #
return $RegexLiteral;
}
1;
%NamedCapture, %^N,
. , $^N.
, our
use strict. , ,
Perl ,
, %^N
. , %^N
,
, ( 362).
,
.
,
,
416
7. Perl
(
. .).
Perl
Perl ,
, 6:
, . Perl.
, , Perl.
.
. Perl
,
.
Perl,
.
;
.
, qr//, /o !
.
.
/o, ,
(qr//)
, .
$&. $`, $& $'
. ,
. ,
,
.
Study. study() Perl .
, , study
, ,
. , .
. ,
.
, ,
, . Perl
, ,
, .
, , ,
Perl
417
.
.
Perl ,
. , ,
Perl .
,
,
Perl.
IP (18.181.0.24) ,
(018.181.000.024).
:
$i = sprintf ("%03d.%03d.%03d.%03d", split(/\./, $ip)):
, . . 7.6
( ). ,
,
,
. ,
.
IP ,
,
.
, . ,
,
. ,
, 1 13 (
,
). 3, 4 ( 1) 8 ( 13).
.
?
?
( 4),
Perl ( 6) Perl (
, sprintf ). /e
,
.
: 34 814.
. $&
418
7. Perl
8, ,
. . 425.
7.6. IP(
1.
1.0
2.
1.3
substr($ip,
substr($ip,
substr($ip,
substr($ip,
substr($ip,
substr($ip,
substr($ip,
3.
1.6
4.
1.8
5.
1.8
$ip = sprintf("%03d.%03d.%03d.%03d",
$ip =~ m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/);
6.
2.3
$ip =~ s/\b(?=\d\b)/00/g;
$ip =~ s/\b(?=\d\d\b)/0/g;
7.
3.0
8.
3.3
9.
3.4
$ip =~ s/(?:(?<=\.)|^)(?=\d\b)/00/g;
$ip =~ s/(?:(?<=\.)|^)(?=\d\d\b)/0/g;
10.
3.4
11.
3.4
$ip =~ s/\b(\d\b)/00$1/g;
$ip =~ s/\b(\d\d\b)/0$1/g;
12.
3.4
13.
3.5
14.
3.5
15.
3.6
16.
4.0
,
/, qr//
,
Perl,
, Perl ,
.
.
Perl
419
,
m//, s// qr//. Perl
, ,
. ,
, .
6 ( 296), Perl
.
Perl .
1. .
,
,
( 354). ,
.
2. .
,
,
(
).
Perl
.
46.
.
,
(, ), Perl
.
, ,
,
.
Perl
:
.
, Perl ,
,
420
7. Perl
()
,
.
,
. , ,
,
.
,
,
, .
my ,
,
. 405.
.
.
:
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
# $today ("Mon", "Tue" ..)
while (<LOGFILE>) {
if (m/^$today:/i) {
.
.
.
m/^$today:/ ,
,
.
,
Perl
.
,
, ,
. ,
. ,
.
?
.
$HttpUrl . 367 (
$HostnameRegex). ,
(, ,
Perl
421
),
, .
.
(
m//)
.
( ) 25
. (
)
1000 !
,
, ,
1000
, 0,00026 (
3846 ,
3,7 ).
,
. ,
.
/o
/o ,
,
,
. /o
,
.
,
.
, .
/o:
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
while (<LOGFILE>) {
if (m/^$today:/io) {
.
.
.
,
$today , .
,
Perl
: $today , Perl
422
7. Perl
.
/o,
, ,
,
.
/o
/o
. ,
:
sub CheckLogfileForToday()
{
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
while (<LOGFILE>) {
if (m/^$today:/io) { # !
.
.
.
}
}
}
/o ,
. CheckLogfileForToday()
, , .
, $today
, ;
.
, ,
,
.
, ,
,
. (
,
,
.
qr// ( 366).
:
sub CheckLogfileForToday()
{
Perl
423
,
.
, ,
.
,
.
;
,
, .
; ,
( ),
, /o
,
CheckLogfileForToday().
,
. qr//
, , .
=~, .
m//
if ($_ =~ $RegexObj) {
if (m/$RegexObj/) {
,
. ,
,
. .
, , m//
. ,
$_, ,
424
7. Perl
, .
, /g
.
/o qr//
/o qr// (
, , ).
, qr//o
,
, $RegexObj
$today.
m//o . 422.
( 371)
,
.
. :
sub CheckLogfileForToday()
{
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
# ,
# .
"Sun:" =~ m/^$today:/i or
"Mon:" =~ m/^$today:/i or
"Tue:" =~ m/^$today:/i or
"Wed:" =~ m/^$today:/i or
"Thu:" =~ m/^$today:/i or
"Fri:" =~ m/^$today:/i or
"Sat:" =~ m/^$today:/i;
while (<LOGFILE>) {
if (m//) { #
.
.
.
}
}
}
,
(
$today). ,
,
.
Perl
425
Perl
.
, ,
. , ,
(
).
,
Perl ,
, .
$& ,
$Subject. , $Subject
$& .
:
if ($Subject =~ m/^SPAM:(.+)/i) {
$Subject = " spam subject removed ";
$SpamCount{$1}++;
}
$1 $Subject .
, Perl
.
426
7. Perl
$1, $2, $3 . .
?
$1 ,
. ,
?
...
$` $& $'
$`, $& $' .
,
. Perl
,
,
.
, ,
Perl ,
( ),
. , !
$`, $& $' !
, !
!
$`, $& $' ,
. .
, m/c/
130 000 Perl.
,
, ,
. :
. ,
, .
,
40%. ,
.
, ( )
.
, .
,
Perl
427
. ,
130 000
3,5 .
.
, c
. , .
,
, .
7000 !
,
.
, , Perl
.
, .
, Perl
, .
, Perl ,
.
.
. ,
$`, $& $'
. $&
$1. , HTML
s/<\w+>/\L$&\E/g
s/(<\w+>)/\L$1\E/g.
$` $'
.
:
$`
substr( , 0, $[0])
$&
$'
substr( , $+[0])
@ @+ ( 365)
, , .
$&.
$1,
428
7. Perl
. $&
,
,
. $&
,
.
. ,
$`, $& $',
, . ,
Perl, .
English;
, :
use English 'no_match_vars';
.
CPAN ,
, .
.
$&
,
$`, $& $'
. ,
Mre=debug (431)
Enabling $` $& $' support
Omitting $` $& $' support. ,
.
,
eval, Perl .
Devel::SawAmper
sand CPAN (http://www.cpan.org):
END {
require Devel::SawAmpersand;
if (Devel::SawAmpersand::sawampersand) {
print "Naughty variable was used!\n";
}
}
Devel::SawAmpersand Devel::FindAmper
sand, ,
. , Perl
Perl
429
. ,
,
(
http://regex.info/).
,
:
use Time::HiRes;
sub CheckNaughtiness()
{
my $text = 'x' x 10_000; # .
# .
my $start = Time::HiRes::time();
for (my $i = 0; $i < 5_000; $i++) { }
my $overhead = Time::HiRes::time() $start;
# .
$start = Time::HiRes::time();
for (my $i = 0; $i < 5_000; $i++) { $text =~ m/^/ }
my $delta = Time::HiRes::time() $start;
# $delta $overhead 5 ,
# ( ).
printf "It seems your code is %s (overhead=%.2f, delta=%.2f)\n",
($delta > $overhead+5) ? "naughty" : "clean", $overhead, $delta;
}
study
study() ,
.
( )
.
study :
while (<>)
{
study($_); # $_
#
if (m/ 1/) { }
if (m/ 2/) { }
if (m/ /) { }
if (m/ 4/) { }
study ;
,
. ,
.
430
7. Perl
Perl ,
, (
, , ).
study Perl
,
.
.
study
, .
, study .
study
, Perl
. , study
(m/foo/);
10 000 ! /i ,
/i study (
).
study
study ,
/i,
(?i) (?i:),
study.
study ,
( 303). ?
.
, ,
,
study. ,
study .
study
( ,
study ).
, ,
study, .
, study,
.
study ,
( 312).
,
, study . ,
, study index,
.
Perl
431
study
study ,
,
.
, .
,
SGML (
troff, , , PostScript).
( 475 ).
,
.
, study.
. Perl Benchmark,
(perldoc Benchmark). ,
,
.
use Time::HiRes 'time';
:
my $start = time;
.
.
.
my $delta = time $start;
printf "took %.1f seconds\n", $delta;
,
,
, .
6 ( 286). ,
,
.
, Perl .
6 ( 294),
.
,
432
7. Perl
(
).
Perl ,
.
Perl ,
.
,
.
, .
use re 'debug';
no re 'debug';.
,
( use re
404).
,
Mre=debug.
. ( ,
, ):
Perl c (
), w (
, Perl;
) Mre=debug ( ).
e , , m/^Subject:(.*)/,
Perl,
.
( ,
Perl)
. Perl
(,
/ 300).
, Perl. ,
. ,
.
Perl
433
.
, :
anchored '' at
, (
.
'' $, .
floating '' at ..
, (
, .
'' $, .
strclass ''
, .
anchored(MBOL), anchored(BOL), anchored(SBOL)
^. MBOL
/m, BOL SBOL (
BOL SBOL Perl; SBOL
$*, ).
anchored(GPOS)
\G.
implicit
Perl anchored(MBOL),
.*.
minlen
.
with eval
(?{}) (??{}).
.
, Perl
DDEBUGGING.
Perl , $&
( 426).
( 398),
Perl . c,
Perl .
Match rejected by optimizer,
,
,
, . :
434
7. Perl
% perl w Mre=debug e '"this is a test" =~ m/^Subject:/; '
.
.
.
Did not find anchored substr "Subject:'
Match rejected by optimizer
, . :
% perl w Mre=debug e 'use warnings'
... ...
.
.
.
warnings,
.
use re 'debug'; Mre=debug. de
bug debugcolor,
ANSI,
, .
Perl
, Mre=debug
Dr.
, ,
Perl, , ,
. , , Perl,
.
,
Perl.
, Perl
, .
, ,
, . ,
( 180), .
,
;
. Perl
( 164),
( 166).
435
( 184). Perl
,
, , ,
,
. , ,
.
(, \v),
,
( x+\v x++
(?>x+)).
. :
,
. , \V.
\V. ,
,
.
. 402.
, . 404,
.
Perl
, .
8
Java
Java
java.util.regex 1.4, 2002
.
,
,
,
,
CharSequence.
java.util.regex
. ,
,
, .
Java 1.4 Java 1.4.2,
Java 1.5.0 ( Java 5.0), Java 1.6.0
(Java 6.0, Mustang) .
Java 1.5.0,
Java 1.4.2 1.6 ,
. ( ,
477).1
, ,
, 16. ,
Java,
1
437
,
. 1, 2 3 ,
,
, 4, 5 6 ,
java.util.regex.
, ,
,
, ,
.
8.1.
454 appendReplacement
446 matcher
452 replaceFirst
454 appendTail
465 requireEnd
444 compile
468 reset
450 end
470 split
448 find
450 start
470 flags
470 quote
469 text
450 group
452 quoteReplacement
450 toMatchResult
450 groupCount
460 region
463 hasAnchoringBounds
460 regionEnd
462 hasTransparentBounds
460 regionStart
463 useAnchoringBounds
465 hitEnd
451 replaceAll
468 usePattern
449 lookingAt
455 replaceAllRegion
462 useTransparentBounds
8.1 . API
. 443.
, ,
, . 439 . 149 160 3,
.
.
java.util.regex
( 112, 128, 131, 271, 289),
.
,
,
java.util.regex, ,
.
438
8. Java
java.util.regex ,
, 4,
5 6. . 8.2 .
,
(?) (?:)
. 8.3, . 440.
,
,
. 8.2.
\b,
(backspace) .
( 174).
\
\\,
Java. ,
\n
"\\n" Java. .
( 135).
\##
(, \xFCber ber).
\u####
(, \u00FCber ber,
a \u20AC ).
\0
.
\c
0x40
. ,
, ,
\ca \cA .
\01,
. \ca \21 !.
\w, \d \s ( )
ASCII
. , \d
[09], \w [09azAZ_],
a \s [\t\n\f\r\x0B] (\0 ASCII
).
( 159): \w \{L}, \d
\p{Nd}, \s \p{Z} ( \W, \D \S
\{} ).
439
8.2.
java.util.regex
151 (C)
155 (C)
: [] [^] (
164)
156
: (
)
158 (C)
: \w \d \s \W \D \S
157 (C)
: \{} \{}
442
/ : ^ \A
442
/ : $ \z \Z
171
: \G
174
: \b \
175
176
: (?().
: d s m i u
177
: (?(:)
177
: # ( )
177
: \Q\E
178
: () \1 \2 ...
178
: (?:)
180
: (?>)
181
: |
183
184
184
(C) . .
\p{} \P{}
, Java.
. .
440
8. Java
\b \B
, \w \W. ,
\w \W ASCII.
,
,
. , , ?
, a * +
. 3 . 175.
# NL
Pattern.COMMENTS ( 440)
x. (
, . 476.)
ANSI . :
, ,
.
\Q\E ,
Java 1.6.
8.3. java.regex.util
(?)
Pattern.UNIX_LINES
^ (442)
Pattern.DOTALL
(146)
Pattern.MULTILINE
^ $ (442)
Pattern.COMMENTS
(102). (
)
Pattern.CASE_INSENSITIVE
ASCII
Pattern.UNICODE_CASE
,
ASCII
Pattern.CANON_EQ
(
143)
Pattern.LITERAL
,
441
, \p{Lu}. (
. 159.)
\p{Lowercase_Letter} .
\pL
\p{L}.
Java 1.5 Pf Pi
, ,
\p{P}. ( Java 1.6.)
\p{C} ,
\p{Cn}.
\p{L&} .
\p{all}, (?s:).
\p{assigned} \p{unassigned} :
\P{Cn} \p{Cn}.
In.
,
\p{} \P{} ,
. 479.
Java 1.5
, 3.0.0 4.0.0,
, :
4.0.0 Combining Diacritical Marks for Symbols Greek
and Coptic Combining Marks
for Symbols Greek.
Java 1.5 , Java 1.4.2
Arabic Presentation Forms!B Latin Ex!
tended!B ( 480).
Java
Java 1.5.0 \p{} \P{}
isSomething java.lang.Charac
ter.
\p{} \P{} is
java. , , ja
442
8. Java
va.lang.Character.isJavaIdentifierStart,
\p{javaJavaIdentifierStart}. (
java.lang.Character.)
(ASCII LF) ., ^, $ \Z. Java
( 144)
.
Java, ,
.
U+000A
LF
\n
(ASCII)
U+000D
CR
\r
(ASCII)
U+000D U+000A
CR/LF
\r\n
/ (ASCII)
U+0085
NEL
()
U+2028
LS
()
U+2029
PS
()
., ^, $ \Z
( 440):
, !
UNIX_LINES
^.$ \Z
.
MULTILINE
^ $
,
^
$.
.
DOTALL
CR/LF
. (. . UNIX_LI
NES) CR/LF
, . .
.
, $ \Z
. ASCII LF
java.util.regex
443
, $ \Z ,
CR/LF (. .
LF CR).
$ ^ MULTILINE
: ^ CR
, CR LF, $
LF,
CR.
DOTALL
CR/LF ( DOTALL
.,
), UNIX_LINES
( LF ,
, ).
java.util.regex
java.util.re
gex .
,
:
java.util.regex.Pattern
java.util.regex.Matcher
java.util.regex.MatchResult
java.util.regex.PatternSyntaxException
Pattern Matcher.
.
Pattern ,
,
Matcher ,
.
Java 1.5 MatchResult
.
Matcher,
,
MatchResult.
PatternSyntaxException
(
, [oops)). java.lang.IllegalArgument
Exception.
:
public class SimpleRegexTest {
public static void main(String[] args)
{
444
8. Java
String myText = "this is my 1st test string";
String myRegex = "\\d+\\w+"; // \d+\w+
java.util.regex.Pattern p = java.util.regex.Pattern.compile(myRegex);
java.util.regex.Matcher m = p.matcher(myText);
if (m.find()) {
String matchedText = m.group();
int
matchedFrom = m.start();
int
matchedTo = m.end();
System.out.println("matched [" + matchedText + "] " +
"from " + matchedFrom +
" to " + matchedTo + ".");
} else {
System.out.println("didn't match");
}
}
}
.
import.
java.regex.util
, .
m, Matcher,
Pattern ,
( find),
( group, start end).
,
.
Pattern.compile()
Pattern Pattern.com
pile. ,
( 135).
, . 8.3 . 440.
Pattern ,
myRegex, :
Pattern pat = Pattern.compile(myRegex,
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Pattern.compile()
445
, (
, Pattern.CASE_INSENSITIVE),
1,
( 145). . 451 (?x)
(?s) (?i) . 475.
,
. ,
. 458 Pattern.compile
Pattern.UNIX_LINES | Pattern.CASE_INSENSITIVE
(?id) .
,
.
6 ( 296),
,
.
,
;
, .
,
, ,
, (,
)
Pattern.
Pattern.compile
:
PatternSyntaxException,
IllegalArgumentException.
Pattern.matcher()
Pattern ,
( 470),
matcher.
, .2
,
Pattern .
matcher Matcher.
1
,
!
, java.util.regex
,
CharSequence (, String, StringBuffer CharBuffer).
446
8. Java
Matcher
Matcher,
,
.
, m Matcher, m.find()
. , m.group()
, .
Matcher, ,
.
. ,
,
.
, :
Matcher
Pattern.
usePattern() ( 468). Pattern
pattern().
Matcher
( ,
CharSequence).
reset() ( 468).
( 458).
,
region.
( )
.
regionStart regionEnd ( 460). reset
( 468)
, ,
reset ( 468).
.
,
(\A ^ $ \z \Z).
true,
useAnchoringBounds ( 463)
hasAnchoringBounds . reset
.
Matcher
447
.
,
(
, )
.
false,
useTransparentBounds
( 462) hasTransparentBounds . reset
.
, :
.
groupCount ( 450).
, .
( find 448).
, ,
( 453).
,
,
.
hitEnd ( 465).
. ,
,
( 450). :
( group()),
( start() end())
,
( group(), start() end()).
MatchResult,
toMatchResult. MatchResult
group, start end,
Matcher ( 450).
, ,
,
( ).
true,
.
requireEnd ( 465).
, ,
, ,
.
448
8. Java
, ( 437).
Matcher
:
boolean find()
, ,
( 458)
.
.
:
String regex = "\\w+"; // \w+
String text = "Mastering Regular Expressions";
Matcher m = Pattern.compile(regex).matcher(text);
if (m.find ())
System.out.println("match [" + m.group() + "]");
:
match [Mastering]
if
while,
while (m.find())
System.out.println("match [" + m.group() + "]");
:
match [Mastering]
match [Regular]
match [Expressions]
boolean find(int )
find ,
( )
. ,
IndexOutOfBoundsException.
find ,
( reset
468) ,
.
find
, . 478 ( ,
. 476).
Matcher
449
boolean matches()
( 458).
(
).
, matches
, \A(?:)\z,
.
( 458),
matches
, ( 463).
, ,
,
CharBuffer, ,
. ,
,
m.usePattern(urlPattern).matches() ,
URL ( URL,
).
String matches:
"1234".matches("\\d+"); //
"123!".matches("\\d+"); //
boolean lookingAt()
, ,
,
. lookingAt
matches, ,
.
Matcher
.
,
IllegalStateException. ,
(
), IndexOutOfBoundsException
.
: start end
,
.
450
8. Java
,
.
String group()
,
.
int groupCount()
. ,
, group, start end,
.1
String group(int )
,
, null, .
, group(0)
group().
int start(int )
( )
.
, 1.
int start()
;
start(0).
int end(int )
( )
.
, 1.
int end()
;
end(0).
MatchResult toMatchResult()
Java 1.5.0. MatchResult,
.
group, start, end groupCount,
Matcher.
, toMatchResult
IllegalStateException.
1
groupCount ,
, ,
, .
451
Matcher
,
, .
URL :
(http https), ( ):
String url = "http://regex.info/blog";
String regex = "(?x) ^(https?):// ([^/:]+) (?:(\\d+))?";
Matcher m = Pattern.compile(regex).matcher(url);
if (m.find())
{
System.out.print(
"Overall [" + m.group() + "]" +
" (from "
+ m.start() + " to " + m.end() + ")\n" +
"Protocol [" + m.group(1) + "]" +
" (from "
+ m.start(1) + " to " + m.end(1) + ")\n" +
"Hostname [" + m.group(2) + "]" +
" (from "
+ m.start(2) + " to " + m.end(2) + ")\n"
);
// ,
//
if (m.group(3) == null)
System.out.println("No port; default of '80' is assumed");
else {
System.out.print("Port is [" + m.group(3) + "] " +
"(from "
+ m.start(3) + " to " + m.end(3) + ")\n");
}
}
:
Overall
Protocol
Hostname
No port;
,
, Matcher
.
String replaceAll(String )
, ,
,
. 453.
(
reset). . 455
,
.
452
8. Java
replaceAll
String.
string.replaceAll(, );
Pattern.compile().matcher().replaceAll()
String replaceFirst(String )
replaceAll,
( ).
replaceFirst
String.
static String quoteReplacement(String )
, Java 1.5,
,
.
,
,
. (
Matcher.quoteReplacement.)
Java 1.5
Java 5.0:
String text = "Before Java 1.5 was Java 1.4.2. After Java 1.5 is Java 1.6";
String regex = "\\bJava\\s*1\\.5\\b";
Matcher m = Pattern.compile(regex).matcher(text);
String result = m.replaceAll("Java 5.0");
System.out.println(result);
:
Before Java 5.0 was Java 1.4.2. After Java 5.0 is Java 1.6
Matcher ,
, result:
Pattern.compile("\\bJava\\s*1\\.5\\b").matcher(text).replaceAll("Java 5.0")
(
,
Pattern 444.)
( ,
) ,
Java 1.6 Java 6.0.
Pattern.compile("\\bJava\\s*1\\.([56])\\b").matcher(text).
replaceAll("Java $1.0")
Matcher
453
:
Before Java 5.0 was Java 1.4.2. After Java 5.0 is Java 6.0
replaceAll replaceFirst.
, replaceFirst
.
replaceFirst ,
,
. ( ,
.)
replaceAll replaceFirst ( appendReplacement,
) ,
,
:
$1, $2 . . ,
($0
).
, $, ASCII,
IllegalArgumentException.
$ ,
. ,
, $25
$2, 5.
$6 IndexOutOf
BoundsException.
\ ,
$ \$.
\\ \. (
Java, \
"\\\\".)
, , 12
, ,
2, $1\2.
,
Matcher.quoteReplacement,
, .
, uRegex,
uRepl,
:
Pattern.compile(uRegex).matcher(text).replaceAll(Matcher.
quoteReplacement(uRepl))
454
8. Java
Matcher
.
StringBuffer. ,
,
.
,
.
Matcher appendReplacement(StringBuffer , String )
(, find).
:
, ,
,
, .
, m Matcher,
\w+ >one+test<.
while
while (m.find())
m.appendReplacement(sb, "XXX")
find >one+test<.
appendReplacement sb ,
, . . >,
sb XXX.
find >one+test<.
appendReplacement ,
+, XXX.
sb >+,
m >one+test <.
appendTail,
.
StringBuffer appendTail(StringBuffer )
, (
).
.
m.appendTail(sb);
sb <.
>+<.
Matcher
455
replaceAll
( ,
).
public static String replaceAll(Matcher m, String replacement)
{
m.reset(); // Matcher
StringBuffer result = new StringBuffer(); //
//
while (m.find())
m.appendReplacement(result, replacement);
m.appendTail(result);
return result.toString(); //
}
, replaceAll
( 458),
.
replaceAll,
.
:
public static String replaceAllRegion(Matcher m, String replacement)
{
Integer start = m.regionStart();
Integer end = m.regionEnd();
m.reset().region(start, end); // Matcher,
//
StringBuffer result = new StringBuffer(); //
//
while (m.find())
m.appendReplacement(result, replacement);
m.appendTail(result);
return result.toString(); //
}
reset region (
, . 463.
metric,
, :
// Matcher ""
// "Metric"
// : (\d+(?:\.\d*)? )C\b
Matcher m = Pattern.compile("(\\d+(?:\\.\\d*)?)C\\b").matcher(metric);
StringBuffer result = new StringBuffer();//
//
while (m.find())
456
8. Java
{
float celsius = Float.parseFloat(m.group(1)); //
int fahrenheit = (int)(celsius + 9/5 + 32); //
//
m.appendReplacement(result, fahrenheit + "F");//
}
m.appendTail(result);
System.out.println(result.toString());
//
, metric from
40.1C., from 97F to 104F..
36.3
to
java.util.regex
String, Matcher
, CharSequence,
.
, CharSequence,
StringBuffer StringBuilder,
,
.
String, String
.
StringBuilder,
StringBuffer.
,
, ,
, ,
StringBuilder:1
StringBuilder text = new StringBuilder("It's SO very RUDE to shout!");
Matcher m = Pattern.compile("\\b[\\p{Lu}\\p{Lt}]+\\b").matcher(text);
while (m.find())
text.replace(m.start(), m.end(), m.group().toLowerCase());
System.out.println(text);
:
It's so very rude to shout!
text.replace.
,
1
\b[\p{Lu} \p{Lt}]+\b.
3 , \p{Lu}
, \p{Lt}
. ASCII :
\b[AZ]+\b.
Matcher
457
(
), (
, ).
, ,
.
, ,
.
,
. ,
,
Matcher, (
, find )
.
, find ,
.
, <b></b>
:
StringBuilder text = new StringBuilder("It's SO very RUDE to shout!");
Matcher m = Pattern.compile("\\b[\\p{Lu}\\p{Lt}]+\\b").matcher(text);
int matchPointer = 0;//
while (m.find(matchPointer)) {
matchPointer = m.end(); // ,
//
text.replace(m.start(), m.end(), "<b>"+ m.group().toLowerCase() +"</b>");
matchPointer += 7;
// '<b>' '</b>'
}
System.out.println(text);
:
It's <b>so</b> very <b>rude</b> to shout!
Matcher
Java 1.5 Matcher
,
. ,
region
.
HTML,
<img>, ALT.
Matcher,
( HTML),
: <img>, ALT.
458
8. Java
Matcher ,
, ,
<img>
ALT. , start
end <img>,
Matcher, ALT,
find.
, ALT
, .
// <img>. 'html' HTML
Matcher mImg = Pattern.compile("(?id)<IMG\\s+(.*?)/?>").matcher(html);
// ALT
// ( ALT IMG 'html')
Matcher mAlt = Pattern.compile("(?ix)\\b ALT \\s* =").matcher(html);
// <img>...
while (mImg.find()) {
// ALT
mAlt.region( mImg.start(1), mImg.end(1) );
// , ALT ,
// ,
if (! mAlt.find())
System.out.println("Missing ALT attribute in: " + mImg.group());
}
,
(, mAlt),
( mAlt.region).
: mAlt
, mAlt.reset(html).region().
reset
, ,
, , .
,
mAlt , find
,
ALT= HTML.
HTML, <img> ALT.
HTML ,
<img>, .
:
// <img>. 'html' HTML
Matcher mImg = Pattern.compile("(?id)<IMG\\s+(.*?)/?>").matcher(html);
Matcher
459
// ALT
// ( ALT IMG 'html')
Matcher mAlt = Pattern.compile("(?ix)\\b ALT \\s* =").matcher(html);
// Matcher
Matcher mLine = Pattern.compile("\\n").matcher(html);
// <img>...
while (mImg.find()) {
// ALT
mAlt.region( mImg.start(1), mImg.end(1) );
// , ALT ,
// ,
if (! mAlt.find()) {
//
mLine.region(0, mImg.start());
int lineNum = 1; // 1
while (mLine.find())
lineNum++; //
// 1
System.out.println("Missing ALT attribute on line " + lineNum);
}
}
, mAlt
start(1), ,
HTML <img>.
start()
,
<img> (
).
, ,
, ,
reset
, .
:
matches
lookingAt
find() ( )
Matcher
:
find() ( )
replaceAll
replaceFirst
reset ( )
460
8. Java
,
(. . , start end)
,
.
:
Matcher region(int , int )
,
. ,
Matcher
, find
.
,
reset (
, reset 468).
Matcher,
( 463).
, , IndexOut
OfBoundsException.
int regionStart()
Matcher. 0.
int regionEnd()
Matcher. .
region (
, ,
. . 8.4
.
8.4.
Java
m.region(, m.regionEnd());
m.region(m.regionStart(), );
m.reset().region(, m.region
End());
m.region(0, );
Matcher
461
,
,
. ,
, ^
, .
.
( ,
) , (
,
/ (
).
, ,
,
.
, ,
, ,
.
,
, CharBuffer. ,
,
,
. ,
. ,
:
Madagas car is much too large to see on foot, so you'll need a car.
\bcar\b
automobile.
(
) ... ,
, Mada
gascar. ,
( false)
\b ,
. ,
. ,
\b s c
, \b.
462
8. Java
Matcher useTransparentBounds(boolean b)
true false,
.
false.
Matcher,
( 463).
boolean hasTransparentBounds()
true,
, false.
Matcher
false,
, ,
. , ,
,
.1 ,
, \b
,
, .
false ( ):
String regex = "\\bcar\\b"; // \bcar\b
String text = "Madagascar is best seen by car or bike.";
Matcher m = Pattern.compile(regex).matcher(text);
m.region(7, text.length());
m.find();
System.out.println("Matches starting at character " + m.start());
:
Matches starting at character 7
,
, Mada
gas car, ,
. .
:
m.useTransparentBounds(true);
1
Matcher
463
find, :
Matches starting at character 27
,
s
\b .
bycarorbike.
,
,
. , reset
.
:
Matcher useAnchoringBounds(boolean b)
true false
. true.
Matcher,
( 463).
boolean hasAnchoringBounds()
true,
, false .
Matcher
true, (^ \A $ \z \Z)
,
. false
,
.
,
,
.
,
,
.
, reset .
,
Matcher
:
464
8. Java
Pattern p = Pattern.compile(regex); //
Matcher m = p.matcher(text);
//
//
//
m.region(5, text.length());
//
//
m.useAnchoringBounds(false);
//
//
m.useTransparentBounds(true);
//
//
.
Matcher.
.
^
.
.
,
,
Matcher ( ),
:
Matcher m = Pattern.compile(regex).matcher(text);
m.region(5, text.length());
//
// .
m.useAnchoringBounds(false);
// ^
// .
m.useTransparentBounds(true);
//
// .
,
Matcher, ( ,
,
):
Matcher m = Pattern.compile(regex).matcher(text).region(5,text.length())
.useAnchoringBounds(false).useTransparentBounds(true);
,
.
,
, , , ,
.
. 475.
Java 1.5 Matcher hitEnd requireEnd,
.
.
, , ,
var<34
IDENTIFIER LESS_THAN INTEGER.
,
.
, true, ,
Matcher
465
,
. , (, ,
)
<. ,
=,
LESS_THAN LESS_THEN_OR_EQUAL.
,
, , ,
.
hitEnd Java 1.5,
. , Java 1.6
, , , Java 1.5
, .
,
,
. ( ,
java.util.Scanner.)
boolean hitEnd()
( Java 1.5;
. 467.)
( ,
).
, \b $.
hitEnd true,
(
,
). , false
,
,
,
, , .
, true,
hitEnd , ,
,
.
hitEnd true, ,
,
.
boolean requireEnd()
,
,
466
8. Java
,
. , requireEnd true,
,
, .
, true,
requireEnd, ,
, .
hitEnd requireEnd
.
,
hitEnd requireEnd
. 8.5 , hitEnd requ
ireEnd lookingAt.
,
.
8.5. , hitEnd requireEnd
lookingAt
hitEnd() requireEnd()
\d+\b | [><]=?
1234
1234
true
true
\d+\b | [><]=?
1234>567
1234 >567
false
false
\d+\b | [><]=?
>
>
true
false
\d+\b | [><]=?
>567
> 567
false
false
\d+\b | [><]=?
>=
>=
false
false
\d+\b | [><]=?
>=567
>= 567
false
false
\d+\b | [><]=?
oops
false
se
true
set
set
true
setu
true
setup
setup
true
true
setx=3
set x=3
false
false
setupx
setup x
false
false
self
false
oops
false
true
. 8.5
: ,
Matcher
467
, .
, set setup.
, , ,
.
, 5 . 8.5: ,
,
hitEnd false. ,
,
( ,
).
hitEnd
hitEnd Java 1.5 ( Java 1.6)1
, hitEnd
: ,
(,
).
, >=?
(
)
, = . ,
a|an|the (
)
, a
, ,
.
: values?
\r?\n\r?\n.
,
:
( , ),
, .
, >=? (?i:>=?),
( 145)
( ).
1
, Sun ,
5.0u9, . . Java 1.5 Update 9. (
, . 436 ,
Java 1.5 Update 7.) , Java 1.6 beta
.
468
8. Java
, a|an|the
[aA]|an|the,
, Pattern.CASE_INSENSITIVE.
Matcher
Matcher ,
:
Matcher reset()
Matcher:
,
,
( 458).
( 463).
reset Match
er: replaceAll, replaceFirst find ( ),
.
Matcher,
( 463).
Matcher reset(CharSequence )
Matcher, reset(), , ,
String ( ,
CharSequence).
(,
),
reset ,
Matcher.
Matcher,
( 463).
Pattern pattern()
pattern Pattern,
Matcher. ,
m.pattern().pattern(),
pattern Pattern (
, ).
Matcher usePattern(Pattern p)
, Java 1.5, Pattern,
Matcher, , p.
reset Matcher,
Matcher
469
Matcher.
usePattern . 475.
Matcher,
( 463).
String toString()
, Java 1.5, ,
Matcher,
.
, ,
, Java 1.6 :
Matcher m = Pattern.compile("(\\w+)").matcher("ABC 123");
System.out.println(m.toString());
m.find();
System.out.println(m.toString());
:
java.util.regex.Matcher[pattern=(\w+) region=0,7 lastmatch=]
java.util.regex.Matcher[pattern=(\w+) region=0,7 lastmatch=ABC]
Matcher
Matcher ,
,
, :
// ,
// .
static final Pattern pNeverFail = Pattern.compile("^");
// , Matcher.
public static String text(Matcher m)
{
// , .
Integer regionStart = m.regionStart();
Integer regionEnd = m.regionEnd();
Pattern pattern = m.pattern();
// .
String text = m.usePattern(pNeverFail).replaceFirst("");
// ,
// ( ).
m.usePattern(pattern).region(regionStart, regionEnd);
//
return text;
}
470
8. Java
String, replaceFirst,
.
Matcher,
. ( ;
, String,
);
, Sun
.
Pattern
compile, Pattern
:
split
,
.
String pattern()
,
Pattern.
String toString()
Java 1.5.
pattern.
int flags()
( ),
compile Pattern.
static String quote(String )
Java 1.5.
,
Pattern.compile. , Pat
tern.quote("main()") \Qmain()\E,
\Qmain()\E,
main().
static boolean matches(String , CharSequence )
,
, (,
matcher, ,
CharSequence, String).
:
Pattern.compile(regex).matcher(text).matches();
(
Pattern
471
), ,
.
(,
,
), ,
Pattern, ,
.
split Pattern
String[] split(CharSequence )
(CharSequence) ,
.
split String.
:
String[] result = Pattern.compile("\\.").split("209.204.146.22"):
,
, split, (. .
, ). ,
,
, ,
. :
String[] result = Pattern.compile("\\s*,\\s*").split(", one, two , ,, 3");
.
: , one, two, 3.
, ,
:
472
8. Java
String[] result = Pattern.compile(":").split(":xx:");
: xx.
split
, .
split Pattern
String[] split(CharSequence , int )
split
Pattern,
.
.
split
,
.
,
String[] result = Pattern.compile(":").split(":xx:", 1);
( , xx
).
split
,
split (. . ).
split
split ,
. ,
1 (,
3 , ).
1 ,
,
.
,
Friedl,Jeffrey,Eric Francis,America,Ohio,Rootstown
,
( ).
Stnng[] Namelnfo = Pattern compile(",") split(Text, 4);
// NameInfo[0]
// NameInfo[l]
// NameInfo[2] ( )
// NameInfo[3] . ,
//
473
, , . .,
?
.
WIDTH HEIGHT <img>
,
<img> HTML
WIDTH HEIGHT. (HTML
StringBuilder, StringBuffer
, CharSequence.)
, <img>
, .
,
,
.
, .1
<img>
SRC, WIDTH HEIGHT .
(WIDTH HEIGHT) ,
,
.
WIDTH
HEIGHT,
.
, ,
, ,
. (,
WIDTH ,
, HEIGHT
;
.)
,
.
1
Yahoo! :
<img> !.
, ,
, <img>
.
474
8. Java
( 458)
( 463). :
// Matcher <img>
Matcher mImg = Pattern.compile("(?id)<IMG\\s+(.*?)/?>").matcher(html);
// Matcher SRC, WIDTH HEIGHT
// ( )
Matcher mSrc
= Pattern.compile("(?ix)\\bSRC =(\\S+)").matcher(html);
Matcher mWidth = Pattern.compile("(?ix)\\bWIDTH =(\\S+)").matcher(html);
Matcher mHeight = Pattern.compile("(?ix)\\bHEIGHT=(\\S+)").matcher(html);
int imgMatchPointer = 0; //
while (mImg.find(imgMatchPointer))
{
imgMatchPointer = mImg.end(); // <img>
// ,
//
Boolean hasSrc
=
mSrc.region( mImg.start(1), mImg.end(1) ).find();
Boolean hasHeight = mHeight.region( mImg.start(1), mImg.end(1) ).find();
Boolean hasWidth = mWidth.region( mImg.start(1), mImg.end(1) ).find();
// SRC WIDTH / HEIGHT...
if (hasSrc && (! hasWidth || ! hasHeight))
{
java.awt.image.BufferedImage i = //
javax.imageio.ImageIO.read(new java.net.URL(mSrc.group(1)));
String size; // WIDTH / HEIGHT
if (hasWidth)
// ,
//
size = "height='" + (int)(Integer.parseInt(mWidth.group(1)) *
i.getHeight() / i.getWidth()) + "' ";
else if (hasHeight)
// , ,
//
size = "width='" + (int)(Integer.parseInt(mHeight.group(1)) *
i.getWidth() / i.getHeight()) + "' ";
else
//
//
size = "width='" + i.getWidth() + "' " +
"height='" + i.getHeight() + "' ";
html.insert(mImg.start(1), size); // HTML
imgMatchPointer += size.length(); //
}
}
,
.
,
,
475
. ,
,
. ( . 253
Perl
HTML.)
URL ,
,
.
,
.
HTML
Matcher
Java Perl,
HTML ( 172).
usePattern .
, \G.
. 172.
Pattern
Pattern
Pattern
Pattern
Pattern
Pattern
Pattern
pAtEnd
pWord
pNonHtml
pImgTag
pLink
pLinkX
pEntity
=
=
=
=
=
=
=
Pattern.compile("\\G\\z");
Pattern.compile("\\G\\w+");
Pattern.compile("\\G[^\\w<>&]+");
Pattern.compile("\\G(?i)<img\\s+([^>]+)>");
Pattern.compile("\\G(?i)<A\\s+([^>]+)>");
Pattern.compile("\\G(?i)</A>");
Pattern.compile("\\G&(#\\d+;\\w+);");
476
8. Java
// ( ), HTML
//
} else {
// ,
// . .
// ,
//
m.usePattern(Pattern.compile("\\G(?s).{1,12}")).find();
System.out.println("Bad char before '" + m.group() + "'");
System.exit(1);
}
}
if (needClose) {
System.out.println("Missing Final </A>");
System.exit(1);
}
java.util.regex , ,
pNonHtml
,
,
. ,
,
,
. Sun.
find, ? ,
.
CSV
CSV 6
(330) java.util.regex.
:
(184) .
String regex = // (1),
// (2)
" \\G(?:^|,)
" (?:
"
# ...
"
\" #
"
( [^\"]*+ (?: \"\" [^\"]*+ )*+ )
"
\" #
" | # ......
"
# ...
"
([^\",]*+)
" )
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n"+
\n";
// Matcher CSV ,
// , .
Java
477
else
// ,
// .
field = mQuote.reset(mMain.group(1)).replaceAll("\"");
// ...
System.out.println("Field [" + field + "]");
}
, . 273,
; ,
( 6, . 330),
, Matcher
.
Java
,
Java 1.5.0.
Java 1.4.2 Java 1.6
( , ).
, ,
, 1.4.2 1.5.0
(Update 7) 1.5.0 59g
Java 1.6.
1.4.2 1.5.0
Java 1.5.0 Java 1.4.2
.
Matcher.
, .
.1
(?1), Ja
va 1.4.2, Java 1.5.0.
PCRE (564),
Java.
478
8. Java
Java 1.5.0
Matcher,
Java 1.4.2:
region
regionStart
regionEnd
useAnchoringBounds
hasAnchoringBounds
useTransparentBounds
hasTransparentBounds
find,
. 476.
java.util.regex,
. 476,
,
find .
find , ,
.
, . 476,
:
Pattern
Pattern
Pattern
Pattern
Pattern
Pattern
Boolean
Matcher
pWord
= Pattern.compile("\\G\\w+");
pNonHtml = Pattern.compile("\\G[^\\w<>&]+");
pImgTag = Pattern.compile("\\G(?i)<img\\s+([^>]+)>");
pLink
= Pattern.compile("\\G(?i)<A\\s+([^>]+)>");
pLinkX = Pattern.compile("\\G(?i)</A>");
pEntity = Pattern.compile("\\G&(#\\d+;\\w+);");
needClose = false;
m = pWord.matcher(html); // Matcher
// Pattern
Integer currentLoc = 0;
//
while (currentLoc < html.length())
{
if (m.usePattern(pWord).find(currentLoc)) {
... m.group(),
...
} else if (m.usePattern(pImgTag).find(currentLoc)) {
... <img> ...
} else if (! needClose && m.usePattern(pLink).find(currentLoc)) {
... ...
needClose = true;
Java
479
find,
Matcher, ,
.
,
region
find, :
m.usePattern(pWord).region(start,end).find(currentLoc)
toMatchResult
hitEnd
requireEnd
usePattern
toString
Java 1.4.2:
Pattern.quote
1.4.2 1.5.0
Java 1.4.2 Java 1.5.0,
:
480
8. Java
1.5.0 1.6.0
Java 1.6, (
),
1.5.0, :
Java 1.6 Pi Pf,
.
Java 1.6 \Q\E,
.
9
.NET
Microsoft .NET Framework, Visual Basic,
C# C++ ( ),
,
.
, ,
.1
, ,
.NET. Visual Basic.
,
,
16. , ,
.NET, ,
(
) . 1, 2 3
, ,
, 4, 5 6
,
.NET.
, ,
, , ,
.
1
.NET 2.0 (
Visual Studio 2005).
482
9. .NET
, , ,
. 483, , . 149 160 3,
.
.
.NET.
, ,
.NET.
.NET,
,
, .
(shared assembly).
.NET
.NET
, ,
4, 5 6. . 9.1
.NET.
3.
( 145),
,
,
(?) (?:).
. 9.2 . 485.
\w.
VB.NET ("\w") (ver
batim) C# (@"\w"). ,
(, C++),
\ \
("\\w"). .
( 135).
. 9.1:
\b Backspace ;
( 174).
\x##
(, \xFCber ber).
\u####
(, \u00FCber ber,
a \u20AC ).
483
.NET
155
: [] [^]
156
: (
)
158 (C) : \w \d \s \W \D \S
157 (C) : \{} \{}
169
/ : ^ \A
169
/ : $ \Z \z
171
: \G
174
: \b \
175
176
177
: (?).
: s m i n (485)
: (?:)
177
: (?#)
178
: () \1 \2 ...
515
: (?<>)
485
178
: (?<>)
\k<>
: (?:)
180
: (?>)
181
: |
183
184
486
(C) .
484
9. .NET
\w, \d \s ( )
, RegexOptions.ECMAScript ( 489)
ASCII.
\w \p{Ll}, \p{Lu},
\p{Lt}, \p{Lo}, \p{Nd} \{}. : \p{Lm}
( . 160).
\s [\f\n\r\t\v\x85\p{Z}]. U+0085
, a \p{Z}
( 159).
\{} \P{}
4.0.1.
.
Is ( . 164)
. , \p{Is_Greek_Extended} \p{Is Greek Exten
ded} ; \p{IsGreekEx
tended}.
, \p{Lu};
\p{Lowercase_Letter} .
(. . \pL \{L}
) ( . 159 160).
\p{L&},
\p{All}, \p{Assigned} \p{Unassigned}
, (?s:), \P{Cn}
\p{Cn} .
\G ,
,
( 171).
.
.NET
,
( 175).
RegexOptions.ExplicitCapture,
(?n),
(). (
(?<num>\d+)) ( 180).
()
(?:).
485
.NET
9.2. .NET
RegexOptions (?)
.Singleline
(146)
.Multiline
^ $ (147)
.IgnorePatternWhitespace
( 103).
.IgnoreCase
ASCII
.ExplicitCapture
(),
(?<>)
.ECMAScript
\w, \d \s
ASCII; ,
(489).
.RightToLeft
,
(. .
).
,
(488)
.Compiled
,
( 486)
.NET
.
.NET ( 180)
(?<>) (?'').
; ,
<> ,
.
\k<> \k''.
( , Match
.NET . 494)
Groups()
Match ( # Groups[]).
486
9. .NET
( 502)
${}.
, ,
.
, :
1 1 3
3 2
, \d+,
Groups("Num") Groups(3).
( ).
.
,
.
Split
( 504) $+
( 502).
if (? if then | else) ( 182)
,
( ).
( )
(. .
(?=)).
: ,
(?<Num>), (Num)
(?(Num) then | else) (?=Num), . .
Num). ,
if .
. (?=)
,
,
if .
, ,
, . .NET
(parsing),
487
.NET
,
.
:
.
,
,
.
( 296).
. RegexOptions.Compiled
,
. ,
,
MSIL (Microsoft Intermediate Language), , ,
JIT
.
,
.
.
.
( ) Regex ,
DLL.
;
.
( 513).
RegexOptions.Com
piled ,
, :
RegexOptions.Compiled RegexOptions.Compiled
( 60 )
(515
)
10
( ,
RegexOptions.Compiled) .
550
1500 . Regex
Options.Compiled 25
, 10
. ,
488
9. .NET
.
, RegexOpti
ons.Compiled ,
,
. ,
,
.
.
DLL
Regex. ,
,
(
DLL,
). ,
,
. 514.
(. . ,
)
. ,
, .
?
? ,
,
?
\d+
123and456. ,
123, ,
456. ,
,
.
( ),
\d+ 456 .
, 45 6,
6, \d+. ,
6.
.NET
489
RegexOptions.RightToLeft
.NET. ?
... ,
. (
123and456)
456.
, ,
.
, RegexOptions.RightToLeft
, ,
.
\+
\, ,
, .
, Regex
Options.ECMAScript.
,
\k<>, (, \08).
RegexOptions.ECMAScript.
RegexOptions.ECMAScript ,
\1\9
,
, ,
(, \012 ASCII
).
, (. .
). , \000
\377, . , \12
,
12 ,
.
RegexOptions.ECMAScript
.
ECMAScript
ECMAScript JavaScript1,
1
490
9. .NET
. .NET, RegexOpti
ons.ECMAScript, .
, ECMAScript
, .
RegexOptions?
RegexOptions.ECMAScript
:
RegexOptlons.lgnoreCase;
RegexOptions.Multiline;
RegexOptions.Compiled.
.NET
.NET ,
,
. Microsoft
, .
, , ,
. ,
. ,
.NET.
.NET, .
, ,
. , ;
.
,
( 492),
Imports System.Text.RegularExpressions
,
.
.NET
491
,
TestStr String. ,
.
:
If Regex.IsMatch (TestStr, "^\s*$")
Console.WriteLine("line is empty")
Else
Console.WriteLine("line is not empty")
End If
:
If Regex.IsMatch(TestStr, "^subject:", RegexOptions.IgnoreCase)
Console.WriteLine("line is a subject line")
Else
Console.WriteLine("line is not a subject line")
End If
,
. , TheNum
.
Dim TheNum as String = Regex.Match (TestStr, "\d+").Value
If TheNum <> ""
Console.WriteLine("Number is:" & TheNum}
End If
:
Dim ImgTag as String = Regex.Match(TestStr, "<img\b[^>]*>", _
RegexOptions.IgnoreCase).Value
If ImgTag <> ""
Console.WriteLine("Image tag:" & ImgTag)
End If
(. . $1)
:
Dim Subject as String = _
Regex.Match(TestStr, "^Subject: (.*)").Groups(l).Value
If Subject <> ""
Console.WriteLine("Subject is:" & Subject)
End If
C# Groups(1) Groups[1].
To :
492
9. .NET
Dim Subject as String = _
Regex.Match(TestStr, "^subject: (.*)", _
RegexOptions.IgnoreCase).Groups(l).Value
If Subject <> ""
Console.WriteLine("Subject is: " & Subject)
End If
To , :
Dim Subject as String = _
Regex.Match(TestStr, "^subject: (?<Subj>.*)", _
RegexOptions.IgnoreCase).Groups("Subj").Value
If Subject <> ""
Console.WriteLine("Subject is: " & Subject)
End If
HTML, HTML
:
TestStr = Regex.Replace(TestStr, "&", "&")
TestStr = Regex.Replace(TestStr, "<", "<")
TestStr = Regex.Replace(TestStr, ">". ">")
Console.WriteLine("Now safe in HTML: " & TestStr)
( ) ,
. 502. , $&
, .
, ,
<B></B>:
TestStr = Regex.Replace(TestStr, "\b[AZ]\w*", "<B>$&</B>")
Console.WriteLine("Modified string: " & TestStr)
<B></B> ( )
<I></I>:
TestStr = Regex.Replace(TestStr, "<b>(.*?)</b>", "<I>$1</I>", _
RegexOptlons.IgnoreCase)
Console.WriteLine("Modified string: " & TestStr)
.NET
. ,
,
:
Option Explicit On '
Option Strict On ' , .
.NET
493
' ,
' .
Imports System.Text.RegularExpressions
Module SimpleTest
Sub Main()
Dim SampleText as String = "this is the 1st test string"
Dim R as Regex = New Regex("\d+\w+") ' .
Dim M as Match = R.match(SampleText) ' .
If not M.Success
Console.WriteLine("no match")
Else
Dim MatchedText as String = M.Value ' ...
Dim MatchedFrom as Integer = M.Index
Dim MatchedLen as Integer = M.Length
Console.WriteLine("matched [" & MatchedText & "]" & _
" from char#" & MatchedFrom.ToString() & _
" for " & MatchedLen.ToString() & " chars.")
End If
End Sub
End Module
\d+\w+ :
matched [1st] from char#12 for 3 chars.
Imports.System.Text.RegularExpressions
? VB,
,
.
# :
using System.Text.RegularExpressions; // #
.
:
Dim R as Regex = New Regex("\d+\w+") ' .
Dim M as Match = R.match(SampleText) ' .
:
Dim M as Match = Regex.Match(SampleText, "\d+\w+") '
' no .
,
,
.
( 511).
,
494
9. .NET
,
Regex.Match , .
,
, :
Option Explicit On
Option Strict On
Imports System.Text.RegularExpressions
VB,
. 129, 132, 257, 273 291.
,
.NET.
,
. .NET
, ,
. 9.1.
\s+(\d+) y16, 1998.
"\s+(\d+)"
Regex
Match ( "Mar 16, 1998")
NextMatch()
Match
Index
Length
Value
Groups (0)
Success
Groups.Count
Groups (1)
2
Group
Group
true
"16"
Index
Length
Value
Groups (0)
Success
true
Match.Empty
Groups.Count
Success
Groups ( 1)
2
false
Group
Group
Index
Index
Length
Length
Value
Value
Success
Success
3
4
3
2
" 16"
NextMatch()
Match
Index
Index
Length
Length
Value
Value
Success
Success
7
8
5
4
" 1998"
true
"1998"
true
. 9.1. .NET
.NET
495
Regex
Regex:
Dim R as Regex = New Regex("\s+(\d+)")
,
\s+(\d+), R.
Regex Match(),
:
Dim M as Match = R.Match("May 16, 1998")
Match
Match() Regex
Match. Match
, Success (
) Value ( ).
Match .
Match ,
.
Perl ,
, $1. .NET
.
Groups Match, Groups(1).Value
Perl $1 ( : #
, Groups[1].Value).
Result ( . 508).
Group
Groups(1)
Group, .Value Value
(. . , ).
Group;
,
.
, MatchObj.Value MatchObj.Groups(0)
. ,
,
,
MatchObj.Groups.Count ( ,
Match).
\s+(\d+) MatchObj.Groups.Count (
$1).
Capture
Capture .
. 516.
496
9. .NET
( , . .)
Match .
Match, Group (
), .
. , Regex,
Match
Group.
Regex, , ,
.NET
Regex.
, .NET.
,
Object.
Regex
Regex ,
( ),
(
). :
Dim StripTrailWS = new Regex("\s+$") '
Regex
; .
:
Dim GetSubject = new Regex("^subject: (.*)", RegexOptions.IgnoreCase)
RegexOptions,
,
OR:
Dim GetSubject = new Regex("^subject: (.*)", _
RegexOptions.IgnoreCase OR RegexOptions.Multiline)
ArgumentException.
, ,
,
497
(,
). :
Dim R As Regex
Try
R = New Regex(SearchRegex)
Catch e As ArgumentException
Console.WriteLine("*ERROR* bad regex: " & e.ToStrlng)
Exit Sub
End Try
,
.
Regex
Regex :
RegexOptions.IgnoreCase
,
( 145).
RegexOptions.IgnorePatternWhitespace
( 146). #,
,
.
VB.NET
chr(10), :
Dim R as Regex = New Regex( _
"# ...
" \d+(?:\.\d*)? # ...
" |
# ...
" \.\d+
#
RegexOptlons.IgnorePatternWhitespace)
; VB.NET
(?#).
Dim R as Regex = New Regex( _
"(?# ...
" \d+(?:\.\d*)? (?# ...
" |
(?# ...
" \.\d+
(?#
RegexOptions.IgnorePatternWhitespace)
)" &
)" &
)" &
) ",
_
_
"
_
RegexOptions.Multiline
,
.
^ $
, .
498
9. .NET
RegexOptions.Singleline
,
( 146).
,
.
RegexOptions.ExplicitCapture
(),
, ()
(?:).
(?<>).
,
RegexOptions.ExplicitCapture
.
RegexOptions.RightToLeft
( 488).
RegexOptions.Compiled
, .
,
.
(
), RegexOptions.Compi
led ,
Regex.
, ,
.
, . 291,
.
. 513.
RegexOptions.ECMAScript
ECMAScript ( 489). ,
ECMAScript ,
.
RegexOptions.None
;
RegexOptions.
,
OR.
499
Regex
,
.
Regex,
.
RegexObj.IsMatch()
RegexObj.IsMatch(, )
: Boolean
IsMatch ,
,
. :
Dim R as RegexObj = New Regex("^\s*$")
..
.
If R.IsMatch(Line) Then
' ...
..
.
Endif
IsMatch ,
.
RegexObj.Match()
RegexObj.Match(, )
RegexObj.Match(, , _)
: Match
Match ,
, Match.
Match (
, . .)
.
Match . 507.
Match ,
.
_,
, .
,
, ^ , a $
. ,
.
Match ,
.
500
9. .NET
,
_.
RegexObj,
\d\d
^\d\d
^\d\d$
16
99
99 99 99
RegexObj.Matches()
RegexObj.Matches(, )
:
MatchCollection
Matches Match,
Match, ,
Match, .
MatchCollection.
, :
Dim R as New Regex("\w+")
Dim Target as String = "a few words"
Dim BunchOfMatches as MatchCollection = R.Matches (Target)
Dim I as Integer
For I = 0 to BunchOfMatches.Count 1
Dim MatchObj as Match = BunchOfMatches.Item(I)
Console.WriteLine("Match: " & MatchObj.Value)
Next
:
Match: a
Match: few
Match: words
, ,
MatchCollection:
Dim MatchObj as Match
For Each MatchObj in R.Matches(Target)
Console.WriteLine("Match: " & MatchObj.Value)
Next
, ,
Match ( Matches):
Dim MatchObj as Match = R.Match(Target)
While MatchObj.Success
Console.WriteLine("Match: " & MatchObj.Value)
MatchObj = MatchObj.NextMatch()
End While
501
RegexObj.Replace(, )
RegexObj.Replace(, , )
RegexObj.Replace(, , , )
: String
Replace
(, ).
, Regex,
Match .
. ,
MatchEvaluator.
,
, . :
Dim R_CapWord as New Regex("\b[AZ]\w*")
..
.
Text = R_CapWord.Replace(Text, "<B>$0</B>")
, ,
<B></B>.
Replace ,
(
). ,
, 1.
,
, Replace
. 1
( ,
).
Replace ,
.
.
, (. .
)
:
Dim AnyWS as New Regex("\s+")
..
.
Target = AnyWS.Replace (Target, " ")
somerandomspacing
somerandomspacing.
, :
Dim AnyWS
as New Regex("\s+")
Dim LeadingWS as New Regex("^\s+")
..
.
Target = AnyWS.Replace(Target, " ", 1. LeadingWS.Match (Target).Length)
502
9. .NET
somerandomspacing so
merandomspacing. , LeadingWS,
( , )
.
Match, LeadingWS.Match(Target):
Length
, .
Length ; , AnyWS
.
Regex.Replace Match.Result ,
.
:
$&
( $0)
,
${}
$`
,
$'
$$
$_
$+
(. )
$+
.
Perl $+,
,
( . 253). .NET
, .
(485).
$
, ,
.
503
. (
).
, .
,
.
MatchEvaluator
. Match,
,
,
.
,
:
Target = R.Replace(Target, "<<$&>>"))
Function MatchFunc(ByVal M as Match) as String
return M.Result("<<$&>>")
End Function
Dim Evaluator as MatchEvaluator = New MatchEvaluator(AddressOf MatchFunc)
.
.
.
Target = R.Replace(Target, Evaluator)
<<>>.
,
. Match
Func :
Function MatchFunc(ByVal M as Match) as String
' $1
'
Dim Celsius as Double = Double.Parse(M.Groups(1).Value)
Dim Fahrenheit as Double = Celsius * 9/5 + 32
Return Fahrenheit & "F" ' "F"
End Function
Dim Evaluator as MatchEvaluator = New MatchEvaluator(AddressOf MatchFunc)
.
.
.
Dim R_Temp as Regex = New Regex("(\d+)C\b", RegexOptions.IgnoreCase)
Target = R_Temp.Replace(Target, Evaluator)
504
9. .NET
RegexObj.Split()
RegexObj.Split(, )
RegexObj.Split(, , )
:
String
Split ,
, , (
. :
Dim R as New Regex("\.")
Dim Parts as String() = R.Split("209.204.146.22")
R.Split
(209, 204, 146 22), \. .
Replace ,
(
,
). , Split
, .
,
:
Dim R as New Regex("\.")
Dim Parts as String() = R.Split("209.204.146.22", 2)
Split
, ,
( , ,
). :
20061231 04/12/2007
[/]:
Dim R as New Regex("[/]")
Dim Parts as String() = R.Split(MyDate)
( ).
([/,]) Split : MyDate
20061231, 2006, , 12,
31.
($1), .
505
, (
,
485).
Split
,
. .NET :
,
.
, .
,
,
.
(\s+)?,(\s+)?. Split this,that
: this, , that.
this,that
, (
) ,
, this that. ,
,
.
(\s*),(\s*) (
).
.
RegexObj.GetGroupNames()
RegexObj.GetGroupNumbers()
RegexObj.GroupNameFromNumber()
Rgbj.GroupNumberFromName()
(
)
.
,
.
.
RegexObj.ToString()
RegexObj.RightToLeft
RegexObj.Options
Regex
.
ToString() , .
RightToLeft ,
, Re
506
9. .NET
16 Singleline
1 IgnoreCase
32 IgnorePatternWhitespace
2 Multiline
64 RightToLeft
4 ExplicitCapture
256 ECMAScript
8 Compiled
128 Microsoft
.
.
Regex,
R:
' Regex, R
Console.WriteLine("Regex is: " & R.ToString())
Console.WriteLine("Options are: " & R.Options)
If R.RightToLeft
Console.WriteLine("Is RightToLeft: True")
Else
Console.WriteLine("Is RightToLeft: False")
End If
Dim S as String
For Each S in R.GetGroupNames()
Console.WriteLine("Name """ & S & """ is Num #" & _
R.GroupNumberFromName(S))
Next
Console.WriteLine("")
Dim I as Integer
For Each I in R.GetGroupNumbers()
Console.WriteLine("Num #" & I & " is Name """ & _
R.GroupNameFromNumber(I) & """")
Next
Regex,
New Regex("^(\w+)://[^/]+)(/\S*)")
New Regex("^(?<proto>\w+)://(?<host>[^/]+)(?<page>/\S*)",
RegexOptions.Compiled)
507
:
Regex is: ^(\w+)://([^/]+)(/\S*)
Option are: 0
Is RightToLeft: False
Name "0" is Num #0
Name "1" is Num #1
Name "2" is Num #2
Name "3" is Num #3
Num #0 is Name "0"
Num #1 is Name "1"
Num #2 is Name "2"
Num #3 is Name "3"
Match
Match Match Regex,
Regex.Match (. ) NextMatch
Match. ,
.
Match:
MatchObj.Success
.
, Match.Empty
( 489).
MatchObj.Value
MatchObj.ToString()
.
MatchObj.Length
.
MatchObj.Index
, ,
. ,
( )
( ) .
, ,
Match, RegexOptions.RightToLeft.
MatchObj.Groups
GroupCollection
Group. ,
508
9. .NET
Count Item,
Group. ,
M.Groups(3) Group,
, M.Groups("HostName")
HostName (,
(?<HostName>) ).
# M.Groups[3] M.Groups["HostName"].
, MatchObj.Gro
ups(0).Value MatchObj.Value.
MatchObj.NextMatch()
NextMatch()
Match.
MatchObj.Result()
,
, . 502.
:
Dim M as Match = Regex.Match(SomeString, "\w+")
Console.WriteLine(M.Result("The first word is '$&'"))
:
M.Result("$`") '
M.Result("$'") '
M.Result("[$`<$&>$']"))
Match, \d+
May 16, 1998, May <16>, 1998,
.
MatchObj.Synchronized()
Match, ,
.
MatchObj.Captures
Captures ,
. 516.
Group
Group
(
). Group.
509
GroupObj.Success
.
; ,
(this)|(that)
. . 181.
GroupObj.Value
GroupObj.ToString()
, .
.
GroupObj.Length
, .
.
GroupObj.Index
, ,
. ,
( )
( ) .
,
, Match, RegexOpti
ons.RightToLeft.
GroupObj.Captures
Group Captures, . 516.
. 490,
Regex.
:
Regex.IsMatch(, )
Regex.IsMatch(, , )
Regex.Match(, )
Regex.Match(, , )
Regex.Matches(, )
Regex.Matches(, , )
Regex.Replace(, , )
Regex.Replace(, , , )
Regex.Split(, )
Regex.Split(, , )
Regex , .
510
9. .NET
Regex,
( ,
, ).
, :
If Regex.IsMatch(Line, "^\s*$")
.
.
.
, :
If New Regex("^\s*$").IsMatch(Line)
.
.
.
,
.
( 127).
( ).
,
.
(,
)
( 296).
Regex
. ,
, .NET ,
:
.
Regex
, ,
.
, , .NET,
.
Regex,
,
.
511
,
.
, .NET
,
.
6 ( 299), ,
, Regex
,
Regex.
15
. 15 ,
, , ,
Regex .
(15 )
,
:
Regex.CacheSize = 123
,
.
, ,
:
Regex.Escape()
Regex.Escape()
.
.
, SearchTerm ,
.
:
Dim UserRegex as Regex = New Regex("^" & Regex.Escape(SearchTerm) & "$", _
RegexOptions.IgnoreCase)
,
,
. Escape
SearchTerm, :),
ArgumentException ( 496).
512
9. .NET
Regex.Unescape()
,
,
\ . , \:\\)
:).
Unescape
. \n,
. \u1234,
.
, Unescape,
. 483.
Match.Empty
Match,
. ,
Match,
( ,
). :
Dim SubMatch as Match = Match.Empty ' ,
'
..
.
Dim Line as String
For Each Line in EmailHeaderLines
' , ...
Dim ThisMatch as Match = Regex.Match(Line, "^Subject:\s*(.*)", _
RegexOptions.IgnoreCase)
If ThisMatch.Success
SubMatch = ThisMatch
End If
..
.
Next
..
.
If SubMatch.Success
Console.WriteLine(SubMatch.Result("The subject is: $1"))
Else
Console.WriteLine("No subject!")
End If
EmailHeaderLines
( Subject),
SubMatch , ,
, SubMatch
.
Match.Empty
.
.NET
513
Regex.CompileToAssembly()
, Regex,
.
.NET
.NET:
,
, .NET, Capture.
.NET Regex .
. , , ,
.
bin
JfriedlsRegexLibrary.DLL.
.
Visual Studio .NET ,
Project > Add Reference.
,
Imports jfriedl
,
:
Dim FieldRegex as CSV.GetField = New CSV.GetField '
Regex
.
.
.
Dim FieldMatch as Match = FieldReges.Match(Line) '
'
While FieldMatch.Success
Dim Field as String
If FieldMatch.Groups(l) Success
Field = FieldMatch.Groups("QuotedField").Value
Field = Regex.Replace(Field, """""", """") '
Else
Field = FieldMatch.Groups("UnquotedField").Value
End If
Console.WriteLine("[" & Field & "]" )
' Field
FieldMatch = FieldMatch.NextMatch
End While
514
9. .NET
jfriedl,
jfriedl.CSV. Regex
:
Dim FieldRegex as GetField = New GetField ' Regex
.
:
Dim FieldRegex as jfriedl.CSV.GetField = New jfriedl.CSV.GetField
,
. ,
.
. ,
(DLL) Regex:
jfriedl.Mail.Subject, jfriedl.Mail.From jfriedl.CSV.GetField.
.
,
.
: RegexOptions.Compiled ,
.
Option Explicit On
Option Strict On
Imports System.Text.RegularExpressions
Imports System.Reflection
Module BuildMyLibrary
Sub Main()
' RegexCompilationInfo
' , ,
' . ,
' , ,
' "jfriedl.Mail.Subject".
Dim RCInfo() as RegexCompilatlonInfo = {
New RegexCompilationlnfo(
"^Subject:\s*(.*)", RegexOptions.IgnoreCase,
"Subject", "jfriedl.Mail", true),
New RegexCompilationInfo(
"^From:\s*(.*)", RegexOptions.IgnoreCase,
"From", "jfriedl.Mail", true),
New RegexCompilationInfo(
_
_
_
_
_
_
_
_
515
.NET
"\G(?:^|,)
" &
"(?:
" &
" (?# ... )
" &
" "" (?# )
" &
" (?<QuotedField> (?> [^""]+ | """" )* )
" &
" "" (?# )
" &
" (?# ...... )
" &
" |
" &
" (?# ...
... )
" &
" (?<UnquotedField> [^"",]* )
" &
")",
RegexOptions.IgnorePatternWhitespace,
"GetField", "jfriedl.CSV", true)
'
Dim AN as AssemblyName = new AssemblyName()
AN.Name = "JfriedlsRegexLibrary"
' DLL
AN.Version = New Version("l.0.0.0")
Regex.CompileToAssembly(RCInfo, AN) '
End Sub
End Module
_
_
_
_
_
_
_
_
_
_
_
_
_
Microsoft
( ,
).
, .
,
:
Dim R As Regex = New Regex(" \(
" (?>
"
[^()]+
"
|
"
\( (?<DEPTH>)
"
|
"
\) (?<DEPTH>)
" )*
" (?(DEPTH)(?!))
" \)
RegexOptions.IgnorePatternWhitespace)
" &
" &
" &
" &
" &
" &
" &
" &
" &
",
_
_
_
_
_
_
_
_
_
_
(, before (nope (yes
(here) okay) after. ,
.
516
9. .NET
:
1. ( (?<DEPTH>)
1 (
\(
).
2. ) (?<DEPTH>)
1 .
3. (?(DEPTH)(?!)) ,
,
\).
. (?<DEPTH>)
() , .
\(,
( )
.
, DEPTH,
, . ,
,
.
.NET (?<DEPTH>),
DEPTH.
, (?<DEPTH>) ,
.
, (?(DEPTH)(?!)) , (?!)
DEPTH.
,
,
(?<DEPTH>).
( ),
(?!) (
,
).
,
.NET.
Capture
.NET
Capture, . ,
,
Capture .
Capture Group;
,
517
.NET
. Group, Value (
), Length ( ) Index (
).
Group Capture ,
Group Captures,
, , .
^(..)+ abc
defghijk:
Dim M as Match = Regex.Match("abcdefghijk", "^(..)+")
(..), . .
: abcdefghijk.
, +
,
ij (. . M.Groups(1).Value ij). , M.Groups(1)
Captures
ab, cd, ef, gh ij, :
M.Groups(1).Captures(0).Value
M.Groups(1).Captures(1).Value
M.Groups(1).Captures(2).Value
M.Groups(1).Captures(3).Value
M.Groups(1).Captures(4).Value
M.Groups(1).Captures.Count is
is
is
is
is
is
5
'ab'
'cd'
'ef'
'gh'
'ij'
: ij,
, M.Groups(1).Value. , Group.Value
. M.Groups(1).Value
:
.Groups(1).Captures( M.Groups(1).Captures.Count 1 ).Value
Capture:
M.Groups(1).Captures CaptureCollection
Items Count.
Items ,
M.Groups(1).Captures(3) (M.Groups[1].Captures[3]
#).
Capture Success;
Success Group.
, Capture
Group. Match Cap
tures, . .Captures
Capture (
, .Captures M.Group(0).Captures).
518
9. .NET
,
, Cap
ture. M.Captures M.Group(0).Captures
Group,
.
Capture ,
, ,
.
.NET , ,
. ,
.
Capture, , , ,
.
,
,
;
, .
Capture
, , , Group Capture (
GroupCollection CaptureCollection)
Match. ,
, . ,
Cap
ture .
10
PHP
90 , ,
Web boom, PHP
, .
,
. ,
, PHP
, , ,
. , PHP
,
,
.
PHP preg, ereg mb_ereg.
preg.
,
. (
preg .)
,
,
16. , ,
PHP, ,
(
) . 1, 2 3
, ,
, 4, 5 6
,
preg
520
10. PHP
PHP. ,
,
, , ,
.
, , ,
. 522 . 149 160 3,
.
.
preg,
, .
preg,
, ,
.
preg
preg preg,
, ,
: Perl Regular Expressions ( Perl).
(Andrei Zmievski),
,
ereg. ( ereg extended re
gular expressions, . . PO
SIX ,
,
.)
preg,
PCRE (Perl Compatible Regular Expressions
, Perl)
,
Perl , .
PCRE,
Perl, ,
PHP.
, , , ,
, .
Perl,
,
,
.
, (Philip Hazel)
,
Perl, PCRE
( 3 . 123).
PHP
521
, ,
. ,
,
,
PHP.
Perl
PCRE, PHP.
PHP 4.4.3 5.1.4.
PCRE 6.6.1
, PHP,
, 4.x 5.x,
PHP 5.x .
PHP
, , PHP
5.x PCRE,
PHP 4.x.
PHP
. 10.1
preg. . 10.1:
\b backspace ()
; ( 174).
, 8 ,
.
\0, , NUL.
\x
. \x{}
.
\x{FF}
u ( 528).
\x{FF} .
UTF8 (
u) , \w,
ASCII.
\w
\pL ( 159), \d
\pN, \s \pZ.
1
PHP PCRE,
, ,
PHP 4.4.3 5.1.4 (
).
.
522
10. PHP
: [] [^] (
POSIX [:alpha:] 166)
156
, : (
s )
157 (U) : \
158 (C) : \w \d \s \W \D \S ( 8
)
159 (C) ,
(U) \P{}
157
\p{},
(
): \C
169
/ : ^ \A
169
/ : $ \Z \z
171
: \G
174
: \b \B ( 8 )
175
527
: (?(). :
s m i X U
527
: (?(:)
177
: (?#) ( x, #
)
, ,
178
: () \1 \2 ...
180
: (?P<>) (?P=)
178
: (?:)
180
: (?>)
181
: |
563
182
: (?if then|else) if
, (R) ()
183
184
PHP
523
184
PHP 4.1.0.
Is In, : \p{Cyrillic} ( 159).
, \p{Lu}, \p{L} \pL ( 159).
\p{Letter} .
preg
, \C
, (?s:.) s
.. , u,
preg
UTF8, . . ,
6 . \C
.
. 157.
\z \Z
, \Z
.
$ m D
( 527): \$ \Z (
, );
m
; D \z (
).
, m D, D .
,
,
( 174).
x (
) ASCII.
.
524
10. PHP
preg
PHP
( 127),
, . 10.2.
,
,
.
10.2. PHP
Preg
531 preg_match
536 preg_match_all
542 preg_replace
548 preg_replace_callback
551 preg_split
556 preg_grep
,
/
557 preg_quote
;
.
538 reg_match
preg_match,
,
558 preg_regex_to_pattern
preg
562 preg_pattern_error
preg
562 preg_regex_error
, ,
.
, ,
,
PHP:
/* , HTML <table> */
if (preg_match('/^<table\b/i', $tag))
preg
525
, Lar
ry,Curly,Moe, : Lar
ry, Curly Moe.
preg
, ,
,
. '/<table\b/i',
<table\b,
(),
i ( ).
PHP
,
PHP .
3 ( 137),
,
. PHP
, ,
. \' \\,
' \ .
\\
,
.
526
10. PHP
\ \\, ,
\\, \\\\. ,
!
(
. 560.)
,
Windows, C:\.
: ^[AZ]:\\$,
: '^[AZ]:\\\\$'.
. 240 ( 5) ,
^.*\\ '/^.*\\\/'
. , ,
:
print
print
print
print
'/^.*\/';
'/^.*\\/';
'/^.*\\\/';
'/^.*\\\\/';
:
:
:
:
/^.*\/
/^.*\/
/^.*\\/
/^.*\\/
,
. \/,
, ,
,
.
\\
\. ,
,
\/,
. , ,
.
, PHP
, ,
, .
preg ,
,
Perl ,
.
,
,
, , . (
. 530.)
,
,
preg
527
ASCII .
! #.
:
{ ( < [
:
} ) > ]
, . .
: '((\d+))'.
,
,
.
: '/(\d+)/'.
, ,
. ,
'/<B>(.*?)<\/B>/i' ,
'!<B>(.*?)</B>!i',
!!, '{<B>(.*?)</B>}i',
{}.
,
,
( PHP ).
i, .
:
!
i
m
s
x
145
147
146
146
(?i)
(?m)
(?s)
(?x)
528
u
X
(?X)
528
PCRE
528
10. PHP
!
543
PHP ( preg_replace)
567
(study)
(?U)
528
* *?,
528
528 $ (EOS),
. (
m.)
(, (?i)
, (?i) 176).
, ,
.
(
( 177), (?i:)
, (?sm:)
s m .
, : si
:
if (preg_match('{<title>(.*?)</title>}si', $html, $captures))
, PHP
3 ( 145). e
preg_replace
( 543).
u
,
UTF8.
preg
529
,
. (. . u), preg
8
( 119). u,
,
UTF8,
. UTF8,
ASCII, ,
u ,
.
X
PCRE, :
, .
, \k
PCRE, k (
,
). X
unrecognized character follows \ (
\).
, PHP
PCRE,
.
( )
,
. X
,
.
S study PCRE,
,
.
,
, . 566.
:
A ,
,
\G.
4,
.
D , $
\z ( 147), . . $
.
530
10. PHP
U
: * *? , +
+? , . . ,
, ,
.
, ,
, un
known modifier ( ).
,
.
, HTML :
preg_match('<(\w+)([^>]*)>', $html)
, <
, preg_match
,
( , ).
< (\w+)([^ >]*)>,
,
, ,
.
(\w+)([^
,
,
]*)>
. ,
,
:
Warning: Unknown modifier ']'
, ,
:
preg_match('/<(\w+)(.*?)>/', $html)
PHP,
, ,
, . ,
, ; ,
, .
531
preg
, PHP 5
:
Warning: preg_match(): Unknown modifier ']'
,
.
, ,
, (
. :
preg_match('<(\w+)(.*?)>', $html)
,
(\w+)(.*?)
. , ,
, .
.
preg
, , preg_match,
:
?.
preg_match
preg_match(, [, [, [, ]]])
,
,
( 525).
, .
,
.
,
.
PREG_OFFSET_CAPTURE ( 535).
,
.
( 536).
532
10. PHP
true,
false.
,
preg_match($pattern, $subject)
true, $pattern
$subject.
:
if (preg_match('/\.(jpe?g|png|gif|bmp)$/i', $url)) {
/* URL */
}
if (preg_match('{^https?://}', $uri)) {
/* http https */
}
if (preg_match('/\b MSIE \b/x', $_SERVER['HTTP_USER_AGENT'])) {
/* IE */
}
preg_match
, .
,
$matches. , $matches
,
,
preg_match.
preg_match true,
$matches :
$matches[0]
$matches[1] ,
$matches[2] ,
.
.
.
(
).
, 5
( 241):
/* */
if (preg_match('{ / ([^/]+) $}x', $WholePath, $matches))
$FileName = $matches[1];
533
preg
$matches ( ,
)
, preg_match true. false
(,
).
$matches ,
, ,
,
$matches preg_match.
,
:
/* , URL */
if (preg_match('{^(https?):// ([^/:]+) (?: :(\d+) )? }x', $url, $matches))
{
$proto = $matches[1];
$host = $matches[2];
$port = $matches[3] ? $matches[3] : ($proto == "http" ? 80 : 443);
print "Protocol: $proto\n";
print "Host : $host\n";
print "Port : $port\n";
}
,
, ,
$matches.1 , (
, ,
$matches.
, (\d+) ,
$matches[3] , $match
es[3] .
( 180).
, :
/* , URL */
if (preg_match('{^(?P<proto> https? ) ://
(?P<host> [^/:]+ )
(?: : (?P<port> \d+
) )? }x', $url, $matches))
1
,
NULL,
. 538.
534
10. PHP
{
$proto = $matches['proto'];
$host = $matches['host'];
$port = $matches['port'] ? $matches['port'] : ($proto== "http" ? 80 : 443);
print "Protocol: $proto\n";
print "Host
: $host\n";
print "Port
: $port\n";
}
, ,
$matches
. ,
, , $match
es. ,
:
/* , URL */
if (preg_match('{^(?P<proto> https? )://
(?P<host> [^/:]+ )
(?: : (?P<port> \d+
) )? }x', $url, $UrlInfo))
{
if (! $UrlInfo['port'])
$UrlInfo['port'] = ($UrlInfo['proto'] == "http" ? 80 : 443);
echo "Protocol: ", $UrlInfo['proto'], "\n";
echo "Host
: ", $UrlInfo['host'], "\n";
echo "Port
: ", $UrlInfo['port'], "\n";
}
$matches
, . ,
$url, http://re
gex.info/, $UrlInfo :
array
(
0
'proto'
1
'host'
2
)
=>
=>
=>
=>
=>
'http://regex.info',
'http',
'http',
'regex.info',
'regex.info'
,
,
.
$matches,
$matches[0], .
: 3 port
,
535
preg
(
533).
,
, : (?P<2>).
PHP 4 PHP 5
, , .
.
:
PREG_OFFSET_CAPTURE
preg_match ,
PREG_OFFSET_CAPTURE (
preg_match), $matches
,
.
, ,
( 1,
).
, .
preg_match $offset
, ,
.
,
u ( 528).
HREF
<a>. HTML ,
.
,
, :
preg_match('/href \s*=\s* (?: "([^"]*)" ; \'([^\']*)\' ; ([^\s\'">]+) )/ix',
$tag,
$matches,
PREG_OFFSET_CAPTURE);
, $tag
<a name=bloglink href='http://regex.info/blog/' rel="nofollow">
$matches :
array
(
/* */
0 => array ( 0 => "href='http://regex.info/blog/'",
1 => 17 ),
/* */
1 => array ( 0 => "",
536
10. PHP
1 => 1 ),
/* */
2 => array ( 0 => "http://regex.info/blog/",
1 => 23 )
)
$matches[0][0]
, $matches[0][1]
.
,
$matches[0][0], :
substr($tag, $matches[0][1], strlen($matches[0][0]));
$matches[1][1] 1, ,
.
, ,
( 533), ,
, $matches.
preg_match ,
(, ,
).
(. . ).
:
, u.
(
) ,
.
, ^
,
,
. , ,
, .
preg_match_all
preg_match_all(, , [, [, ]])
,
,
( 525).
, .
537
preg
,
.
,
:
PREG_OFFSET_CAPTURE ( 540).
/ :
PREG_PATTERN_ORDER ( 538)
PREG_SET_ORDER ( 539)
,
. .
( , preg_match
536.)
preg_match_all .
preg_match_all preg_match,
,
, .
,
,
.
:
if (preg_match_all('/<title>/i', $html, $all_matches) > 1)
print "whoa, document has more than one <title>!\n";
(,
) preg_match_all ,
preg_match.
preg_match_all , ,
.
preg_match ,
. preg_match
,
$matches . preg_match_all
,
, $matches,
. ,
preg_match_all
$all_matches, $matches,
preg_match.
538
10. PHP
preg_match $matches
,
( ,
,
$matches).
, ,
,
, NULL.
preg_match (
reg_match), PREG_OFFSET_CAPTURE
,
NULL
$matches:
function reg_match($regex, $subject, &$matches, $offset = 0)
{
$result = preg_match($regex, $subject, $matches,
PREG_OFFSET_CAPTURE, $offset);
if ($result) {
$f = create_function('&$X', '$X = $X[1] < 0 ? NULL : $X[0];');
array_walk($matches, $f);
}
return $result;
}
, reg_match,
, preg_match ,
, , (
, ,
NULL.
preg_match_all
$all_matches,
: PREG_PATTERN_ORDER PREG_SET_ORDER.
PREG_PATTERN_ORDER
,
PREG_PATTERN_ORDER (
). ,
,
, :
$subject = "
Jack A. Smith
539
preg
Mary B. Miller";
/*
PREG_PATTERN_ORDER */
preg_match_all('/^(\w+) (\w\.) (\w+)$/m', $subject, $all_matches);
$all_matches :
array
(
/* $all_matches[0] */
0 => array ( 0 => "Jack A. Smith", /* */
1 => "Mary B. Miller" /* */ ),
/* $all_matches[1] , */
1 => array ( 0 => "Jack", /* */
1 => "Mary" /* */ ),
/* $all_matches[2] , */
2 => array ( 0 => "A.", /* */
1 => "B."
/+ */ ),
/* $all_matches[3] , */
3 => array ( 0 => "Smith", /* */
1 => "Miller" /* */ )
)
,
.
,
($all_matches[0]),
($all_matches[1]) . .
$all_matches, PREG_SET_ORDER
.
PREG_SET_ORDER
PREG_SET_ORDER
. ,
, $all_matches[0],
, , $all_matches[1],
. . ,
preg_match,
$matches $all_matches.
,
PREG_SET_ORDER:
$subject = "
Jack A. Smith
540
10. PHP
Mary B. Miller";
preg_match_all('/^(\w+) (\w\.) (\w+)$/m', $subject,
$all_matches, PREG_SET_ORDER);
$all_matches :
array
(
/* $all_matches[0] , $matches,
preg_match */
0 => array ( 0 => "Jack A. Smith", /* */
1 => "Jack", /* */
2 => "A.",
/* */
3 => "Smith" /* */,
/* $all_matches[1] , $matches,
preg_match */
1 => array ( 0 => "Mary B. Miller", /* */
1 => "Mary", /* */
2 => "B.",
/* */
3 => "Miller" /* */,
)
PREG_PATTERN_ORDER
.
$all_matches[__][_]
PREG_SET_ORDER
.
$all_matches[_][__]
preg_match_all PREG_OFFSET_CAPTURE
preg_match_all, preg_match,
PREG_OFFSET_CAPTURE,
(leaf element) $all_matches
( ), . . $all_matches
( ).
: PREG_OFFSET_CAPTURE PREG_SET_
ORDER, :
preg_match_all($pattern, $subject, $all_matches,
PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
preg_match_all
$all_matches
,
( preg_match
533).
541
preg
$subject = "
Jack A. Smith
Mary B. Miller";
/*
PREG_PATTERN_ORDER */
preg_match_all('/^(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+)$/m',
$subject, $all_matches);
$all_matches :
array
(
0
1 => "Mary" ),
1 => "Mary" ),
1 => "B." ),
1 => "B." ),
1 => "Miller" ),
1 => "Miller" )
PREG_SET_ORDER:
$subject = "
Jack A. Smith
Mary B. Miller";
preg_match_all('/^(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+)$/m',
$subject, $all_matches, PREG_SET_ORDER);
$all_matches :
array
(
0 => array ( 0
542
10. PHP
,
, . . ,
.
, , .
preg_replace
preg_replace(, , [, [, ]])
,
, .
,
.
, ,
. e
( )
PHP (543).
, .
(
).
,
( 544).
,
( PHP 5 544).
,
( ,
). ,
(
, ).
PHP
. ,
, str_rep
lace str_ireplace,
preg_replace.
HTML.
:
? (
preg
543
) ,
,
,
?1 ,
:
$card_number = preg_replace('/\D+/', '', $card_number);
/* $card_number
*/
.
, preg_replace
$card_number,
( ), ,
, $card_number.
preg_replace
,
(, )
, . ,
, preg_repla
ce ,
, ,
,
.
$0
, $1
, $2 . . :
$ ,
, ,
preg_replace. ,
, , : ${0} ${1},
,
.
, ,
, HTML
<b></b>:
$html = preg_replace('/\b[AZ]{2,}\b/', '<b>$0</b>', $html);
e (
preg_replace),
1
, ,
No Dashes Or Spaces .
: http://www.unixwiz.net/ndos(shame.html.
544
10. PHP
PHP,
,
.
, , <b></b>,
:
$html = preg_replace('/\b[AZ]{2,}\b/e',
'strtolower("<b>$0</b>")',
$html);
, HEY,
$0.
: strtolower("<b>HEY</b>"),
PHP,
<b>hey</b> .
e
:
.
PHP.
e
,
,
.
PHP
htmlspecialchars():
$replacement = array ('&'
'<'
'>'
'"'
=>
=>
=>
=>
'&',
'<',
'>',
'"');
,
$replacement PHP
, preg_rep
lace .
, PHP ,
preg_replace.
S
( 567).
preg_replace (
, (
,
). 1,
.
(
PHP 4), ,
545
preg
preg_replace .
, ,
,
(
.
,
,
,
.
.
, .
,
,
.
, .
,
, ,
, ,
,
, .
:
,
.
, ,
, .
preg_replace,
.
PHP
htmlspecialchars(),
HTML:
$cooked = preg_replace(
/* */ array('/&/', '/</', '/>/', '/"/' ),
/*
*/ array('&', '<', '>', '"'),
/* */ $text
);
546
10. PHP
$text :
AT&T > "baby Bells"
$cooked :
AT&T > "baby Bells"
, .
(
):
$patterns
= array('/&/', '/</', '/>/', '/"/' );
$replacements = array('&', '<', '>', '"');
$cooked = preg_replace($patterns, $replacements, $text);
, preg_replace
(
),
. , , .
, PHP,
.
,
:
$result_array = preg_replace($regex_array, $replace_array, $subject_array);
:
$result_array = array();
foreach ($subject_array as $subject)
{
reset($regex_array); // ,
reset($replace_array); //
while (list(,$regex) = each($regex_array))
{
list(,$replacement) = each($replace_array);
// ,
//
$subject = preg_replace($regex, $replacement, $subject);
}
//
//
$result_array[] = $subject; // ... .
}
, , ,
preg
547
, . . ,
(,
, , (
, . .). , ,
array(), , ,
, :
$subject = "this has 7 words and 31 letters";
$result = preg_replace(array('/[az]+/', '/\d+/'),
array('word<$0>', 'num<$0>'),
$subject);
print "result: $result\n";
[az]+ words<$0>,
\d+ num<$0>, :
result: word<this> word<has> num<7> word<words> word<and> num<31>
word<letters>
,
,
(. . , ).
, preg_replace
,
each,
, .
,
ksort(),
.
,
, ,
.
, ,
. , ,
(
) ? ,
?
$subject = "this has 7 words and 31 letters";
$result = preg_replace(array('/\d+/', '/[az]+/'),
array('num<\0>', 'word<\0>'),
$subject);
print "result: $result\n";
, .
548
10. PHP
preg_replace_callback
preg_replace_callback(, , [, [, ]])
,
,
( 525). ,
.
,
,
.
, .
(
).
,
( 544).
,
( PHP 5.1.0
544).
,
( ,
). ,
(
, ).
preg_replace_callback preg_re
place, , ( )
preg_replace_callback .
preg_replace, e
( 543), (
,
).
PHP, ,
,
. preg_replace_callback
,
( $matches).
, .
549
preg
. 547.
. 547 ( ,
):
result: word<this> word<has> word<num><7> word<words>
word<and> word<num><31> word<letters>
, ,
, , preg_replace
(. ., (
)
, .
/
num<>,
num , num
word<num>,
.
,
preg_replace.
.
: ,
. :
PHP create_function.
.
,
,
,
( ).
. 544,
preg_replace_callback .
:
$replacement = array ('&' => '&',
'<' => '<',
'>' => '>',
'"' => '"');
/*
* $matches ,
* $matches[0] ,
* HTML.
* HTML.
* ,
550
10. PHP
*
* .
*/
function text2html_callback($matches)
{
global $replacement;
return $replacement[$matches[0]];
}
$new_subject = preg_replace_callback('/[&<">]/S', /* */
"text2html_callback", /* */
$subject);
$subject
"AT&T" sounds like "ATNT"
$new_subject :
"AT&T" sounds like "ATNT"
text2html_callback PHP,
preg_replace_callback, text2html_
callback $matches (
, ,
$matches).
,
( PHP
create_function). ,
$replacement , .
,
,
preg_replace_callback:
$new_subject = preg_replace_callback('/[&<">]/S',
create_function('$matches',
'global $replacement;
return $replacement[$matches[0]];'),
$subject);
e
e preg_replace, preg_replace_callback.
, ,
e ,
,
PHP. ,
preg_replace_callback (
).
551
preg
preg_split
preg_split(, [, , [ ]])
,
,
( 525).
, .
,
, .
,
.
:
PREG_SPLIT_NO_EMPTY
PREG_SPLIT_DELIM_CAPTURE
PREG_SPLIT_OFFSET_CAPTURE
. 554.
( . 539).
.
preg_split
. (
(
).
, .
preg_split
preg_match_all, preg_split ,
. ,
preg_split , ,
,
. preg_split
(
) explode.
, ,
.
explode:
$tickers = explode(' ', $input);
552
10. PHP
,
. preg_split
\s+ :
$tickers = preg_split('/\s+/', $input);
, ,
: ,
(
), , YHOO, MSFT, GOOG.
:
$tickers = preg_split('/[\s,]+/', $input);
$tickers
: YHOO, MSFT GOOG.
, ,
( Web 2.0),
\s*,\s*,
:
$tags = preg_split('/\s*,\s*/', $input);
\s*,\s*
[\s,]+ .
( ),
, .
$input, 123,,,456,
( ),
: 123,
456.
, [\s,]+
, .
123,,,456
, :
123 456.
preg_split,
, .
,
,
.
HTTP
.
\r\n\r\n, ,
\n\n. , preg_split
preg
553
. ,
$response,
$parts = preg_split('/\r? \n \r? \n/x', $response, 2);
$parts[0], parts[1].
( S
567.)
, 2, ,
, .
, ( ,
) .
(. . ),
,
.
( 1 (
, ) preg_split
, ,
.
,
, ,
(
PREG_SPLIT_DELIM_CAPTURE ,
).
,
. :
,
. (
) ().
, ,
2.
, ,
, ,
preg_split. , ,
$data , ,
\s*,\s* ( , , . .),
.
3,
, :
$fields = preg_split('/ \s*,\s*/x', $data, 3);
, ar
ray_pop .
preg_split
( )
554
10. PHP
,
1, . C ,
1
. 0
, 1,
.
preg_split
preg_split ,
.
(
. 538).
PREG_SPLIT_OFFSET_CAPTURE
PREG_OFFSET_CAPTURE,
preg_match preg_match_all,
,
.
PREG_SPLIT_NO_EMPTY
preg_split ,
.
,
.
Web 2.0 ( 552),
$input party,,fun,
$tags : party,
fun.
.
PREG_SPLIT_NO_
EMPTY,
$tags = preg_split('/ \s* , \s*/x', $input, 1, PREG_SPLIT_NO_EMPTY);
party
fun.
PREG_SPLIT_DELIM_CAPTURE
,
, . :
,
preg
555
, and or ,
:
DLSR camera and Nikon D200 or Canon EOS 30D
PREG_SPLIT_DELIM_CAPTURE
$parts = preg_split('/ \s+ (and|or) \s+ /x', $input);
$parts :
array ('DLSR camera', 'Nikon D200', 'Canon EOS 30D')
, , .
PREG_SPLIT_DELIM_CAPTURE ( 1 (
):
$parts = preg_split('/ \s+ (and;or) \s+ /x', $input, 1,
PREG_SPLIT_DELIM_CAPTURE);
$parts ,
:
array ('DLSR camera', 'and', 'Nikon D200', 'or', 'Canon EOS 30D')
,
. $parts
and or.
,
(, '/\s+
(?:and|or)\s+/') PREG_SPLIT_DELIM_CAPTURE
,
.
,
. 552:
$tickers = preg_split('/[\s,]+/', $input);
PREG_SPLIT_
DELIM_CAPTURE,
$tickers = preg_split('/([\s,]+)/', $input, 1, PREG_SPLIT_DELIM_CAPTURE);
$input ,
$tickers.
$tickers ,
([\s,]+).
, , ,
,
.
,
,
556
10. PHP
PREG_SPLIT_DELIM_CAPTURE. ,
(
,
).
, ,
. , ,
( . 533),
( )
. ,
,
, . :
PREG_SPLIT_NO_EMPTY , . .
preg_split ,
.
preg_grep
preg_grep(, _ [, ])
,
, .
_
,
, .
PREG_GREP_INVERT .
, _,
( , , ,
PREG_GREP_INVERT).
preg_grep (
_, ,
( , PREG_
GREP_INVERT) . ,
, .
:
preg_grep('/\s/', $input);
$input, .
:
preg
557
$in
put, . :
:
preg_grep('/^\S+$/', $input);
,
( ).
preg_quote
preg_quote( [, ])
,
( 525).
,
.
preg_quote ,
. ,
.
,
,
preg_quote
, .
preg_quote ,
, .
preg_quote ,
,
:
/*
$MailSubject */
$pattern = '/^Subject:\s+(Re:\s*)*' . preg_quote($MailSubject, '/') . '/mi';
, $MailSubject :
**Super Deal** (Act Now!)
$pattern :
/^Subject:\s+(Re:\s*)*\*\*Super Deal\*\* \(Act Now\!\)/mi
558
10. PHP
preg.
{,
(. . }),
.
, # ,
x.
preg_quote
.
,
,
preg.
.
preg
PHP preg
,
.
preg_match ( 538).
,
, ,
,
(,
, ).
,
.
,
.
.
,
,
, : http://regex.info/.
preg_regex_to_pattern
(,
),
preg, ,
.
preg
559
. , ,
[az]+ /[az]+/,
.
, .
, ,
^http://([^/:]+),
/^http://([^/:]+)/,
Unknown modifier /.
. 530,
,
, (
, )
.
,
.
,
.
.
{} . 524, 532 532 (
).
( )
, ,
, . ,
,
:
,
.
,
,
. ,
,
.
,
, ,
preg. (
PHP
)
560
10. PHP
; . (
,
. 525.)
/*
* (, ,
* ), ,
* preg.
* ,
* .
*/
function preg_regex_to_pattern($raw_regex, $modifiers = "")
{
/*
* ,
* (
* ) .
*
* ,
* ,
* .
*
*
* ,
* . ,
* '\/',
* '\\/',
* : '/\\//'.
*
*
* : ,
* ( ) .
* ,
* .
*/
if (! preg_match('{\\\\(?:/|$)}', $raw_regex)) /* '\',
'/' */
{
/* ,
* ,
* */
$cooked = preg_replace('!/!', '\/', $raw)regex);
}
else
{
/* $raw_regex.
* , ,
* . */
$pattern = '{ [^\\\\/]+ | \\\\. | ( / | \\\\$ ) }sx';
/* $pattern $raw_regex
* . $matches[1]
preg
*
*
*
$f
561
,
. ,
. */
= create_function('$matches', '
//
if (empty($matches[1])) // ,
return $matches[0]; //
else
//
return "\\\\" . $matches[1]; // .
');
/* $pattern $raw_regex,
* $cooked */
$cooked = preg_replace_callback($pattern, $f, $rawRregex);
}
/* $cooked ,
* */
return "/$cooked/$modifiers";
}
, ,
.
( ,
preg).
, preg_replace_callback,
,
,
.
,
,
.
,
*.txt , ,
( 27),
preg_regex_to_pattern /*.txt/.
,
(
):
Compilation failed: nothing to repeat at offset 0
PHP
,
.
562
10. PHP
preg_pattern_error
preg_match .
,
preg_match.
/*
* ,
* .
* ( ) false.
*/
function preg_pattern_error($pattern)
{
/* ,
* .
* ,
* (,
* $php_errormsg). ,
* 'track_errors' , $php_errormsg
* .
* 'track_errors' , (
* ), .
*/
if ($old_track = ini_get("track_errors"))
$old_message = isset($php_errormsg) ? $php_errormsg : false;
else
ini_set('track_errors', 1);
/* track_errors . */
unset($php_errormsg);
@ preg_match($pattern, ""); /* ! */
$return_value = isset($php_errormsg) ? $php_errormsg : false;
/* , ,
* . */
if ($old_track)
$php_errormsg = isset($old_message) ? $old_message : false;
else
ini_set('track_errors', 0);
return $return_value;
}
,
(
).
,
, false.
/*
* ,
563
* .
* false.
*/
function preg_regex_error($regex)
{
return preg_pattern_error(preg_regex_to_pattern($regex));
}
3,
.
(?R)
, (?)
.
(?P>).
. ,
(. 570).
, , ,
.
: (?: [^()]++ | \( (?R) \) )*.
.
[^()]++ ,
.
+,
( 280) (?:)*.
, \((?R)\),
.
(
) .
, ,
(?R)
.
,
, ,
,
(?R).
564
10. PHP
.
^$,
.
,
.
(?R)
, , (?),
. (?)
,
(
.1 (?), ,
(?0) (?R).
:
^$,
, (?R) (?1).
, (?1),
,
.
^ $ ,
: ^((?:[^()]++|\((?1)\))*)$.
,
, (?1).
PHP, ,
$text :
if (preg_match('/^ ( (?: [^()]++ | \( (?1) \) )* ) $/x', $text))
echo "text is balanced\n";
else
echo "text is unbalanced\n";
,
( 180),
1
, (?) (
, . . (?)
,
.
.
565
(?)
(?P>).
:
,
x
( 527):
$pattern = '{
# ...
^
(?P<stuff>
# , , "stuff."
(?:
[^()]++ # ,
|
\( (?P>stuff) \) # . , " " .
)*
)
$
# .
}x'; # 'x' .
if (preg_match($pattern, $text))
echo "text is balanced\n";
else
echo "text is unbalanced\n";
.
(?:)*
, [^()]++
.
, , ( )
.
( 317) ,
: (?:[^()]|\((?R)\))*.
,
.
, ,
6 ( 319), [^()]*(?:\((?R)\)
[^()]*)*.
566
10. PHP
preg , ,
,
( 317). ,
,
( ).
,
,
,
. , ,
.
. , , ,
(
): \((?:[^()]++|(?R))*\).
, ,
. ,
,
(?R)
, (?1) (
).
PHP
preg PHP PCRE
.
, 46,
,
,
. PHP
6 ( 288).
, ,
,
e (551),
.
,
, PHP
4096 ( 297),
PHP
567
.
S:
(studies)
. (
study Perl, ,
429).
S: Study
S
1 ,
.
,
.
,
S ,
: , 6 (
( 303).
, ,
. S
,
.
S
<(\w+).
, <.
( preg
)
, <
(
<,
).
,
,
.
, .
1
.
,
S, .
568
10. PHP
,
,
. , <i>|</i>|<b>|</b>
, <(\w+),
, .
, .
S
preg
, ,
. S
,
,
.
,
,
S :
<(\w+) | &(\w+);
< &
[Rr]e:
R r
(Jan|Feb||Dec)\b
A D F J M N O S
(Re:\s*)? SPAM
R S
\s*,\s*
[&<">]
\r?\n\r?\n
\r \n
, S
:
, ( ^
\b),
. ,
\b,
.
, ,
\s*.
, ( )
, (?: [^()]++ | \( (?R) \) )*,
569
. 566.
, ),
, .
,
, .
S
,
.
.
, .
CVS PHP
PHP CSV
(, ) 6 ( 330).
( 184) ,
.
:
$csv_regex = '{
\G(?:^|,)
(?:
# ...
" #
( [^"]*+ (?: "" [^"]*+ )*+ )
" #
| # ......
# ... , ...
( [^",]*+ )
)
}x';
CSV,
$line:
/* , $all_matches
*/
preg_match_all($csv_regex, $line, $all_matches);
/* $Result , $all_matches */
570
10. PHP
$Result = array ();
/* ... */
for ($i = 0; $i < count($all_matches[0]); $i++)
{
/*
* */
if (strlen($all_matches[2][$i]) > 0)
array_push($Result, $all_matches[2][$i]);
else
{
/* ,
* */
array_push($Result, preg_replace('/""/', '"', $all_matches[1][$i]));
}
}
/* $Result */
: XML (XHTML
)
.
:
,
(
, <br/> XML),
, .
:
^((?:<(\w++)[^>]*+(?<!/)>(?1)</\2>|[^<>]++|<\w[^>]*+/>)*+)$
,
( ).
,
.
^()$, ,
.
,
.
(
),
(?:)*+,
571
. ,
.
(. .
, ).
,
,
* ,
.
,
, .
, ,
, , ( 317).
...
:
, : [^<>]++.
, .
, , (?:)*+
.
, ,
. (
,
. ,
317.)
:
, <\w[^>]*+/>,
, <br/> <img /> (
/,
). ,
, .
:
: <(\w++)
[^>]*+(?<!/)>(?1)</\2>.
( )
: (\w++)
.
( (\w++)
, .)
572
10. PHP
(?<!/)
( 175), /
.
>,
, , <hr/> (
,
).
,
(?1)
.
, .
,
, ,
(
). </ </\2>
, ,
\2 .
HTML ,
,
(?i) i.
!
\w++ <(\w++)
[^>]*+(?<!/)>.
,
( 180), \b (\w+),
: <(\w+)\b[^>]*(?<!/)>.
\b , ,
li <link></li>.
nk
,
\2, .
, \w+
.
, ,
\w+
,
<link></li>. \b
.
573
, preg PHP
,
\w++
, \b,
.
XML
XML ,
.
XML, CDATA
.
XML
<!.*?> (?s)
s,
.
CDATA, <![CDATA[]]>,
<!\[CDATA\[.*?]]>, (
XML, <?xmlversion="1.0"?>,
<\?.*?\?>.
, <!ENTITY>,
<!ENTITY\b.*?>. XML
,
<!ENTITY\b.*?> <![AZ].*?>.
, , ,
XML.
, :
$xmlRregex = '{
^(
(?: <(\w++) [^>]*+ (?<!/)> (?1) </\2> #
|[^<>]++
#
| <\w[^>]*+/>
#
| <!.*?>
#
| <!\[CDATA\[.*?]]>
# cdata
| <\?.*?\?>
#
| <![AZ].*?>
# .
)*+
)$
}sx';
if (preg_match($xml_regex, $xml_string))
echo "block structure seems valid\n";
else
echo "block structure seems invalid\n";
574
10. PHP
HTML?
HTML ,
, :
,
< >. HTML
,
, <script>.
HTML XML
<!.*?> s.
<script> ,
< >, ,
<script> </script>.
<script\b[^>]*>.*?</script>.
,
< >, ,
. <script>
, ,
.
PHP
HTML:
$htmlRregex = '{
^(
(?: <(\w++) [^>]*+ (?<!/)> (?1) </\2> #
|[^<>]++
#
| <\w[^>]*+/>
#
| <!.*?>
#
| <script\b[^>]*>.*?</script>
#
)*+
)$
}isx';
if (preg_match($html_regex, $html_string))
echo "block structure seems valid\n";
else
echo "block structure seems invalid\n";
\0, 153
\1, 366
Perl, 67
?
, 42
, 206
?!, 329
??, 372
(?i), 528
(?i), 527
(?m), 527
(?s), 527
(?U), 528
(?X), 527
(?x), 527
(?!), 399, 401
(?#), 150, 177
(?1), 564
Java, 477
PCRE, 564
PHP, 564, 572
(?n), 484
(?P<>), 535
(?R)
PCRE, 563
PHP, 563
\(\), 178
//, 387
\ \ \ \, 240
@
, Perl, 351
, 107
@, 363, 365
@+, 363, 365
@"", 137
., Java, 442
=~, , 65
*
, 43
, 207
.*
, 84
, 196
, 302
+
, 43
, 207
++, 572
\+, , 119
\<\>, 47, 78, 174, 193
egrep, 39
Emacs, 135
<>, 83
[:<:], 124
[==], 168
[::], 166
[..], 167
[:>:], 124
$, 148, 529
Java
, 442
, 169
Perl
, 350,
351
, 63
PHP, 523
, 107
$$, .NET, 502
\$, PHP, 523
$/, 61
$&, , 362, 363
.NET, 502
, 398
, 427
, 426
576
, 425
, 428
$', , 362, 363
.NET, 502
, 398
, 427
, 426
, 425
, 428
$`, , 362, 363
.NET, 502
, 398
, 427
, 426
, 425
, 428
$+, , 363, 364, 413
.NET, 254, 502
, 425
, 254
$_, , 110, 388
.NET, 502
$^N, , 363, 365, 413
$^R, , 365, 393
$^W, , 359
^, 148
Java, 442
, 302
^Subject: , 127, 196, 199, 297,
301, 350
Java, 128
.NET, 129
PHP, 130
Python, 130
$0, 363
Java, 453
PHP, 543
${0}, 543
$1, , 178, 363, 366
Java, 453
.NET, 502
, 67
, 425
${}
.NET, 502
{min, max}, 45
A
\A, 148, 169
, 302
\a, 151
$all_matches, 538
$matches, 537
, 539
, 538
anchored(), 433
AND,
, 164
appendReplacement, , 454
appendTail, , 454
$ARGV, 111
ASCII, , 140, 151
AT&T Bell Labs, 118
awk
gensub, 231
, 174
, 179
, 118
, 133
B
<B></B>, 211
\B, 174
\b, 95, 151, 174
Perl, 347
PHP, 521
, 71, 74
Java, 438
\b\B, 295
BLTN, Java, 290
BOL, 433
<br/>, , 570
BRE ( ),
119
C
\C, 157
PHP, 522, 523
\c, 155
/c, , 371, 380
C#
, 179
, 137
Capture, , 516
.NET, 495
CaptureCollection, , 518
CDATA, 573
CharBuffer, , 445
charnames, , 351
577
CharSequence, , 436, 446, 456,
473
CheckNaughtiness(), , 429
chr, , 497
Compilation failed, 561
compile, , 444
Compiled (.NET), ,
292, 485, 487, 498, 506
CompileToAssembly, 515
Config, , 351, 361
CR, 144, 442
create_function, , 549, 550
CR/LF, 442
CSV, ,
, , 330
currentTimeMillis(), , 290
D
\D, 77, 158
\d, 77, 158
Perl, 349
PHP, 521
Darth, 247
date_default_timezone_set(), ,
289
DBIx::DWIW, Perl, 315
debugcolor, 434
definekey, 135
Devel::FindAmpersand, , 428
Devel::SawAmpersand, , 428
Dr, , 434
E
\e, 110, 151
\E,
Java, 440, 470, 480
/e, , 384, 385
ECMAScript (.NET),
, 485, 489, 498, 506
ed, , 117
egrep
, 174
, 179
, 39
, 124
, 194
, ,
47
, 39
, 232
, 58
, 118
Emacs
researchforward, 134
, 174
, 179
, 134
, 135
, 168
, 155
end, , 450
English, , 428
ERE (
), 119
ereg, , 519
Escape ANSI, 111
eval, , 385
Explicit, , 492
ExplicitCapture (.NET),
, 485, 498, 506
F
\f, 151
, 71
FF, 144
find, , 448
flags, , 470
flex, , 123
floating '', 433
foreach, if while, , 385
G
\G, 171, 265, 529
.NET, 484
, ,
475
, 172
, 302
, 379, 380
/g
, 79
, 371, 374, 375, 380, 384
, 424
gensub, , 231
GetGroupNames ( Regex),
505
578
GetGroupNumbers (
Regex), 505
gettimeofday(), , 289
GNU awk
gensub, 231
, 123
GNU egrep
, 123
, 194
, 232
GNU Emacs
, 124
, 123
GNU grep
, 123
, 231
GNU sed, , 123
GPOS, 433
grep
Perl, 390
, 118
, 117
y, 118
, 124
, 118
group, , 450
Group, (.NET), 495
Capture, 517
Captures, 509
Index, 509
Length, 509
Success, 509
ToString, 509
Value, 509
, 508
GroupCollection, , 518
groupCount, , 450
GroupNameFromNumber (
Regex), 505
GroupNumberFromName (
Regex), 505
Groups, ( Match), 507
H
hasAnchoringBounds, , 463
hasTransparentBounds, , 462
height, , Java, 473
hitEnd, , 465
$HostnameRegex, , 106, 179,
367, 420
HTML, , 532, 543, 545, 549, 574
<HR>, , 245
URL, 104, 258, 260, 367, 385
, 492
, 211
, 380
, 492
, , 251
, 97
, 172
, , 253
, 24, 43, 427
htmlspecialchars, , 545
HTTP, , 552
HTTP URL, , 51, 253, 258, 260,
318, 367, 385
http://regex.info/, , 429
$HttpUrl, , 367, 413, 420
http://www.cpan.org/, , 428
Hz, 144
I
/i, , 176
, 74
study, 430
i y, 118
if, while foreach, , 385
IgnoreCase (.NET),
, 130, 133, 485, 497, 506
IgnorePatternWhitespace (.NET),
, 133, 485, 497,
506
IllegalArgumentException, ,
445
IllegalStateException, , 449
<img>, , Java, 473
implicit, 433
Imports, , 490, 493, 513
In Is, , 160
Index,
Group, 509
Match, 507
IndexOutOfBoundsException,
, 448, 449, 453
IP, , 236, 376, 379
Iraq, 35
Is In, , Perl, 349
IsMatch (.NET), 491
579
IsMatch ( Regex), 499
ISO88591, , 120,
140
J
Java, 436
BLTN, 290
\E, , 480
find, , 448
JIT, 290
\Q, , 480
, 146
split, , 470
, 290
, 174
, 438
, 179
, 442
,
463
, 261, 271, 289
, 457
, 463
, 443
, 436
, 436, 440, 467, 476, 480
, 447
, 447, 457
, 451
, 443,
448, 451, 455, 463
,
461
CSV, , 476
,
, , 271
, 477
, 440
, 442
, 441
, 447, 478
, 289
, 441
java.lang.Character, , 442
java.util.regex
, 123
java.util.Scanner, , 465
Jeffs, , 90
JfriedlsRegexLibrary, 513
JIT, , 487
Java, 290
JRE, Java, 290
L
\l, 351
\L\E, 352
Latin1, , 120, 140
lc(), , 352
lcfirst(), , 351
Length,
Group, 509
Match, 507
$LevelN, , 396, 411
lex
$, 148
, 118
, 147
LF, 144, 442
LIFO, , 204
local,
, 402
local, , 358
localtime, , 356, 385, 420
lookingAt, , 449
LS, 145, 442
M
/m, 176
Perl, 349
m//
, 64
makudonarudo, , 210, 282
Match, 130
Match (.NET) Success, 130
Match (.NET), , 495
Captures, 508, 517
Empty, 512
Groups, 507
Index, 507
Length, 507
NextMatch, 508
Result, 508
Success, 507
Synchronized, 508
ToString, 507
Value, 507
, 507
, 499, 508
Match ( Regex), 499
MatchCollection, , 500
580
matcher, , 445
Matcher, , 446
appendReplacement, , 454
appendTail, , 454
end, , 450
find, , 448
group, , 450
groupCount, , 450
hasAnchoringBounds, , 463
hasTransparentBounds, , 462
hitEnd, , 465
lookingAt, , 449
matches, , 449
pattern, , 468
quoteReplacement, , 452
region, , 460
regionEnd, , 460
regionStart, , 460
replaceAll, , 451
replaceFirst, , 452
requireEnd, , 465
reset, , 468
start, , 450
toMatchResult, , 450
toString, , 469
useAnchoringBounds, , 463
usePattern, , 468, 475
useTransparentBounds, , 462
, 453
, 457
$matches $all_matches, 537
Matches ( Regex), 500
$matches, , 532
matches, , 449, 470
MatchEvaluator, 501
mb_ereg, , 519
MBOL, 433
{min, max}, 45
minlen, , 433
MSIL (Microsoft Intermediate Language),
487
Multiline (.NET), ,
485, 497, 506
MungeRegexLiteral, , 409, 415
my, , 405
MySQL, , 123
N
\n, 77, 151
, 71
, 152
\N{}, 351
NEL, 144
$NestedStuffRegex, , 407,
414
.NET, 481
$+, 254
JIT, 487
MSIL (Microsoft Intermediate Lan
guage), 487
URL, , 257
, 174
, 485
, 124
, 494
, 483
, 481
, 492, 501
,
129
, 484
, 291
.NET Framework
, 123
New Regex, 130, 132, 493, 499
NextMatch ( Match), 508
no re 'debug', , 432
nomatchvars, , 428
None (.NET), 498, 506
O
/o, , 421
,
424
oneself, , 399
Options ( Regex), 505
OR,
, 164
osmosis, 355
overload, , 410
P
\p{^}, 164, 349
\P{}, 159, 164
Java, 439, 441, 480
\p{}, 159, 349
Java, 439, 441, 480
Perl, 164
581
\p{All}
Perl, 349
\p{all}, 441
\p{Any}, 164
Perl, 349
\p{Assigned}, 164
Perl, 349
\p{C}, 159
Java, 441
\p{Cc}, 161
\p{Cf}, 161
\p{Close_Punctuation}, 161
\P{Cn}, 165
\p{Cn}, 161, 441, 484
Java, 441
\p{Co}, 161
\p{Connector_Punctuatlon}, 161
\p{Control}, 161
\p{Currency_Symbol}, 161
\p{Cyrillic}, 162, 164
\p{Dash_Punctuation}, 161
\p{Decimal_Digit_Number}, 161
\p{Enclosing_Mark}, 160
\p{Final_Punctuation}, 161
\p{Format}, 161
\p{Greek}, 164
\p{Han}, 162
\p{Hebrew}, 162
\p{Hiragana}, 162
\p{InCyrillic}, 164
\p{Inherited}, 162
\p{Initial_Punctuation}, 161
\p{InTibetan}, 162
\p{IsCommon}, 162
\p{IsCyrillic}, 164
\p{IsGreek}, 164
\p{IsL}, 164
\p{IsTibetan}, 162
\p{javaJavaIdentifierStart}, 442
\p{Katakana}, 162
\p{L&}, 160, 164
Perl, 349
\p{L}, 159, 174, 438
\p{Latin}, 162
\p{Letter}, 159, 164
Perl, 349
\p{Letter_Number}, 161
\p{Line_Separator}, 160
\p{Ll}, 160, 484
\p{Lm}, 160, 484
\p{Lo}, 160, 484
\p{Lowercase_Letter}, 160
\p{Lt}, 160, 484
\p{Lu}, 160, 484
\p{M}, 157, 159
\p{Mark}, 159
\p{Math_Symbol}, 161
\p{Mc}, 160
\p{Me}, 160
\p{Mn}, 160
\p{Modifier_Letter}, 160
\p{Modlfier_Symbol}, 161
\p{N}, 159
\p{Nd}, 161, 438, 484
\p{Nl}, 161
\p{No}, 161
\p{Non_Spacing_Mark}, 160
\p{Number}, 159
\p{Open_Punctuatlon}, 161
\p{Other}, 159
\p{Other_Letter}, 160
\p{Other_Number}, 161
\p{Other_Punctuation}, 161
\p{Other_Symbol}, 161
\p{P}, 159
\p{Paragraph_Separator}, 161
\p{Pc}, 161
\p{Pd}, 161
\p{Pe}, 161
\p{Pf}, 161
Java, 441
\p{Pi}, 161
Java, 441
\p{Po}, 161
\p{Private_Use}, 161
\p{Ps}, 161
\p{Punctuation}, 159
\p{S}, 159
\p{Sc}, 161
\p{Separator}, 159
\p{Sk}, 161
\p{Sm}, 161
\p{So}, 161
\p{Space_Separator}, 160
\p{Spacing_Combining_Mark}, 160
\p{Symbol}, 159
\p{Titlecase_Letter}, 160
\p{Unassigned}, 161, 164
Perl, 349
\p{Uppercase_Letter}, 160
\p{Z}, 159, 438, 484
\p{Zl}, 160
582
\p{Zp}, 161
\p{Zs}, 160
\p{}, 484
panic: top_env, 399
Pascal, 62, 232
, 323
Pattern
CANON_EQ, 143
CASE_INSENSITIVE,
, 128, 132, 145, 444
, 468
COMMENTS, , 132,
440
compile, , 444
DOTALL, , 440, 442
flags, , 470
matcher, , 445
matches, , 470
MULTILINE, , 440,
442
pattern, , 470
quote, , 470
split, , 470
toString, , 470
UNICODE_CASE, ,
444
UNIX_LINES, ,
440, 442
pattern, , 468, 470
Pattern.CANON_EQ, ,
440
Pattern.CASE_INSENSITIVE,
, 440
Pattern.LITERAL, ,
440
PatternSyntaxException, ,
443, 445
Pattern.UNICODE_CASE,
, 440
PCRE
\w, 158
study, 529
X, 529
, 521
, 123
, 564
, 563
, 123
PCRE, , 520
, 521
Perl, , 61
$/, 61
$^W,
, 359
,
357
, 174
, 179
, 356
, 63
, 419
, 354
, 124
, 388
,
346
, 123, 343
0, 62
c, 432
Dr, 434
e, 62, 81, 432
i, 81
M, 432
Mre=debug, 434
n, 62
p, 81
w, 64, 391, 432
, 383
, 64
$^W, , 359
use warnings, 391
, 416
,
, , 266
, 175
, 346
, 120
Perl Porters, 122
PHP, 519
\w, 158
, 525
, 174
, 179
, 532, 564
study, 529
, 558
, 521
, 123, 521
583
, 542
CVS,
, 569
,
526, 530
, 564
, 563
, 523
, 525
, 548, 550
, 288
, 566
\pL, PHP, 521
\pN, PHP, 521
pos(), , 171, 378
POSIX
[==], 168
[::], 166
[..], 167
BRE (
), 119
ERE (
), 119
, 166
, 167
, 119
,
167
,
, 226
, 166
, 167
, 168
POSIX
, 283
, 190
preg, , 519
preg_grep, , 556
PREG_GREP_INVERT, , 556
preg_match, , 531
, 536
preg_match_all, , 536
PREG_OFFSET_CAPTURE, , 535,
538, 540
preg_pattern_error, , 562
PREG_PATTERN_ORDER, , 538
preg_quote, , 178, 557
preg_regex_error, , 563
preg_regex_to_pattern, , 558
preg_replace, , 542
preg_replace_callback, , 548
preg_split, , 551
PREG_SPLIT_DELIM_CAPTURE, ,
553, 554
PREG_SPLIT_NO_EMPTY, , 554
PREG_SPLIT_OFFSET_CAPTURE,
, 554
Procmail, , 123, 127
PS, 145, 442
Python
\Z, 148
, 174
, 179
, 176
, 123
,
130
, 175
, 138
, 293
\pZ, PHP, 521
Q
\Q, Java, 440, 470, 480
\Q\E, 352
Qantas, 35
qed, , 117
qr//, 107
quote, , 178, 470
quoteReplacement, , 452
R
\r, 77, 151
, 152
r"", 138
re 'debug', , 432
Regex (.NET),
CompileToAssembly, 513, 515
Escape, 511
GetGroupNames, 505
GetGroupNumbers, 505
GroupNameFromNumber, 505
GroupNumberFromName, 505
IsMatch, 491, 509
Match, 491, 495, 509
Matches, 500, 509
Options, 505
Replace, 492, 501, 509
RightToLeft, 505
Split, 504, 509
584
ToString, 505
Unescape, 512
, 496
, 499
, 495, 497
, 497
Regex.Escape, 178
RegexOptions
Compiled (.NET),
, 292, 485, 487, 498, 506
ECMAScript, ,
485, 489, 498, 506
ExplicitCapture,
, 485, 498, 506
IgnoreCase, ,
130, 133, 485, 497
IgnorePatternWhitespace,
, 133, 485, 497, 506
Multiline, , 485,
497, 506
None, 498, 506
RightToLeft, ,
485, 489, 498, 506
Singleline, , 485,
498
RegexOptions.RightToLeft, 504
region, , 460
regionEnd, , 460
regionStart, , 460
reg_match, , 538
regsub, 134
Replace ( Regex), 501
replaceAll, , 451
replaceFirst, , 452
requireEnd, , 465
researchforward, 134
reset, , 468
Result ( Match), 508
RightToLeft (.NET),
, 485, 489, 498, 504, 506
RightToLeft ( Regex), 505
Ruby
, 174
, 179
, 176
, 123
, 292
rx, 232
S
\S, 77, 84, 158
Emacs, 169
/s, 176
\s, 77, 158
Emacs, 168
Perl, 349
PHP, 521
, 74
s///, , 78, 383
SBOL, 433
sed
, 174
, 179
, 147
Singleline (.NET), ,
485, 498, 506
split
.NET, 504
Split ( Regex, 504
split,
Java, 470
,
472
split, , 387
start, , 450
Strict (), 492
strict, , 357, 403, 415
String, , 445
StringBuffer, , 445, 456, 473
StringBuilder, , 456, 473
str_ireplace, (PHP), 542
str_replace, (PHP), 542
study, , 429
, 430
Success,
Group, 509
Match, 507
System.currentTimeMillis(), ,
290
System.Text.RegularExpressions, 490,
493
T
\t, 77, 151
, 71
T1me::HiRes, , 286
585
Tcl
[:<:], 124
[:>:], 124
regsub, 134
, 298
, 174
, 179
, 176
, 124
, 123
, 134
, 232
, 139
, 147
, 293
this|that, , 298, 312, 319
time(), , 286
Time::HiRes, , 429, 431
Timer(), , 292
toMatchResult, , 450
ToString,
Group, 509
Match, 507
Regex, 505
toString, , 469, 470
U
\U, 155
\u, 155, 482
U+C0B5, 142
\U\E, 352
uc(), , 352
ucfirst(), , 351
UCS2, , 142
UCS4, , 142
UnicodeData.txt, , 351
unicore, , 351
URL, , 104, 318, 532
, 258, 260
use charnames, , 351
use Config, 351
use strict, 357, 403
useAnchoringBounds, , 463
usePattern, , 468, 475
useTransparentBounds, , 462
UTF16, , 142
UTF8, , 142, 528
V
\v, 151
, 435
\V, , 435
Value
Group, 509
VB.NET
, 179
,
129
, 137
vi, , 179
Visual Studio .NET, 513
VT, 144
W
\W, 77, 158
\w, 77, 95, 158
Perl, 349
PHP, 158, 521
Java, 438
,
125
w, , 359
warnings, , 391
while, foreach if, , 385
width, , Java, 473
with eval, 433
X
\X, 143, 158
/x, 103, 176
Perl, 349
, 122
\x, 144, 155, 482
Perl, 347
XML, 573
CDATA, 573
, 570, 573
Y
y,
grep, 118
Yahoo!, 258
Z
\Z, 148, 169
Java
, 442
586
PHP, 523
\z, 169, 529
, 381
(rMoore), 300
,
191
, ,
307
, 162, 349
PHP, 523
, , 571
, 204
, 202
, 197
, 228
, 57, 114
, 187
, 186
, , 319
( local
Perl), 360
, 192
, 186
, 353, 384
Java, 453
, 525
, 317
, 180
, 218,
327
, 216
, 249, 253, 266, 397, 408
, 217
(Alfred Aho), 118, 228
, ,
123
, 144
\s, Perl, 349
, 70
,
, 361
, 290
, 291
, 570
.NET, 515
Perl, 393, 407
PHP, 563, 570
, 209
LIFO, 204
, 202
, , 206
, 205
, 220
, 204
, 215
, 227
, 144, 442
POSIX , , 283
, 280
, 285
, 280
, 306
, 282
, 275, 277
, 280
, 279
, 153
, 489
, 426
, 52
, 397
local, , 402
my, 405
, 393
, 181
, 37
, 142
,
235
, 419
, , 322
, 279,
397, 408
, 280
, 280
, 349, 441, 480, 484
587
, 37
, 399
, 312
, , 312
, 165
, 231
,
Perl, 357
(James Gosling), 121
\<\>
egrep, 39
Perl, 349
, 39
, 147
, 45
, 166
, , 186
, 501
, , 50
, 124
, 53
Java, 438
.NET, 485
PCRE, 521
Perl, 347
PHP, 521
, 143
, Perl, 355, 357
, 361
,
393
charnames, 351
Imports, 490, 513
no re 'debug', 432
nomatchvars, 428
overload, 410
re 'debug', 432
strict, 357, 403, 415
warnings, 391
, 188, 200
, 194, 231
, 230
, 190
, 231
, 201
,
, 225
, 200, 229, 279, 281
, 229
No Dashes Or Spaces, 543
\, 352
, 230
, 144
Java, 442
,
152
(Jeremy Zawodny),
315
, 463
Java, 463
, , 545
PHP, 542
s///, 78
, 546,
549
, 407
, 184,
219, 317, 565, 572
, 411
, 317, 327, 328, 571
, 307
, 249, 253
(Andrei Zmievski), 520
, , 49
, , 239
, 351
, 180
.NET, 484
PHP, 532, 540, 564
, 413
, 221
588
,
411
, 413
, 386
,
, 316
Perl, 409
, 166
$&, 427
$', 427
$`, 427
, , 131
URL, , 104
, , 526
VB.NET, , 257
URL, , 104
, 258, 260
, 51, 131, 180, 318, 325, 532
, , 255
,
213
, 373
, 34
, 573
, 127
, 298
, , 45
,
352
,
527
, 350
PHP, 137
, 107
Perl, 404
, 386
, 420
, 305
,
, 304
, , 304
, 302
, 316
, 337
IllegalArgumentException, 445
IllegalStateException, 449
IndexOutOfBoundsException, 448,
449, 453
PatternSyntaxException, 443, 445
\+, 119
awk, 118
egrep, 118
grep, 118
lex, 118
Perl, 120
PHP, 520
sed, 118
/, 122
, 116
\w, 120
, 211
, 43
*, , 43
?, , 42
, 206
(), 46
+, , 43
, 219, 565, 572
,
317, 327, 328, 571
, 207
, , 303
, 302
, 290
, 228
Java, 261, 271, 289
ASCII, 140
Latin1, 120, 140
UCS2, 142
UCS4, 142
UTF16, 142
UTF8, 142, 521, 528
, 140
, 141
U+FFFF, 144
, 142, 158
589
, 150, 177
Java, 132
.NET, 497
XML, 573
Pascal, 323
, 331
, 297
, /o, 421
, 487
(Robert Constable),
116
, 285
, 275, 285
, 222
,
239, 276, 318
, 279
, 223
, 374
, 238
, 71
Perl, 356
,
, 302
, 193
split,
.NET, 504
split
Perl, 392
, 515, 563
, 37
egrep, 45
, 180, 413,
532, 540
\(\), 117
, 533,
556
, 72
, 304
, 515
, 243
, 564
, 563
, Perl, 68
, 364
, 194
, 297
.NET, 510
PHP, 566
Perl, 419
Tcl, 299
, 419
, 420
, 297
,
298
, 300
, 299
,
419
, 361
, 193
, 139, 350,
354, 371
, 419
, 177
, 149
, 312
/, 432
, 358
, 167
, 119
(Tom Lord), 232
, 195
, 207
, 313
, 530
, 222
, 277
, 213
, 204
, 210
, 215
, 196
, 183
590
, 34
, 53
, 409
,
, 198
, 366
, , 200
, 231, 294, 298
, 190
, 281
, 200, 229
, POSIX
Perl, 402
, 296
, 212
, 313
, 213
, 304
, 215
, 184
, 71
, 211
, 147
/g, 79
/i, 74
/osmosis, 355
, Perl, 354
,
368
, 530
, 381
, 368
, 145, 176, 485
A, 528, 529
D, 523, 528, 529
e, 528, 544, 550
I, 527
m, 523, 527
PHP, 527
S, 317, 528, 544, 567
s, 527
U, 528, 530
u, 521, 527, 535
X, 527, 529
x, 523, 527
,
530
, , 88
, PHP, 558
, 116
, 356
, 116
, 228
, 71
, 205
, 218
, 72,
179
, , 399
egrep, 39
, 39
, 198
, 222
, 207
, 188
, 199
, 190
, 231
, 201
, 200, 229
, 228
, 229
, 280
, , 321
, 153
, 157
hitEnd, , 465
Java, 457
requireEnd, , 465
, 473
, 463
,
, 459
, 461
,
, 306
,
, 79
, , 81
, 178
egrep, 45
, 194, 231
, 489
, 46
, 297
, 321,
322
, 463
Java, 463
, 167
Java, 443
.NET, 494
,
127
, 300
, 367
/g, , 424
/o, , 424
, 370,
422
, 369
, preg_split, 551
Perl, 388
PHP, 551
, Java,
472
,
293
, ,
213, 223
, 217
, 216
,
Perl, 346
, 164
, 175
<B></B>, 213
Java, 440
, 88
591
,
221
,
166
, 316
, 96
, 90
Java, 436
.NET, 481
PCRE, 521
Perl, 343
PHP, 521
, 294
BLTN, 290
JIT, 290
, 487
, 307
, 312
,
304
, 304
, 302,
316
, 241
, 303
,
302
, 304
, 306
, 230
, 307
, 300
, 306
, 301
, 302
, 301
, 303
,
314
, 294
, 303
, 301, 303
, 308
, 308
, 193
592
, , 218
$&, , 398
$', , 398
$`, , 398
, 433
,
369
, 431
, 398
, 230
, 538
c, 432
Dr, 434
e, 432
M, 432
Mre=debug, 434
w, 359, 432
, 335
, 144, 442
, 144
,
352, 409
, 410
, ,
420
, 426
Perl, 363
, 412
, 357
, 425
(Jeff Pinyan), 302
,
, 78
PHP, 523
, ,
307
, 264
s///, 383
, 111
, 302
, , 303
, 88
Perl, 349
, 220
, 88
, 183
, 447
, 447
Java, 457
,
, 329
,
328
IP, , 236
URL , 258
, ,
131
Java, 132
.NET, 132
, 407
, 49
, ,
101
, , 47
Emacs, 135
Perl, 61, 108
awk, 133
Emacs, 134
Java, 451, 456
.NET, 492, 501
Perl, 383
PHP, 542
Tcl, 134
, 400
,
, 402
, 370
, 279, 397, 408
, 280
, 280
, , 203
, 398
, 281
, 373
, 245
, 356
, 356
, 356, 374
, 356, 374
.*, 196
, 193
, , 195
, 201
, 201
, 193
, 193
, 193
, 193
, , 206
, 538
, 382
, 205
, 205
, 538
, 145
, 400
,
, 225, 402
, 377
, 229
HTML, 253
HTML, 251
, 246
, 247
, 227
, 488
, 357
, 546
,
,
egrep, 50
, , 262
,
191
, 195
, 277
, , 300
593
, 425
, 294
, , 306
, 151
.*, 85
Perl, 64
use warnings, Perl, 391
, 359
, , 100
,
.NET, 503
Perl, 63, 343
PHP, 525
, 356
, 351
, ,
302
, ,
301
$+ (.NET), 254
^Subject, 301, 350
gr[ea]y, 32
<img>, , 473
Jeffs, 90
HTML, 524, 543, 545, 549, 574
<HR>, 245
URL, 258, 260, 367, 532
, 492
URL, 385
, 211
, 380
, 492
, 251
, 172
, 475
, 253
, 51, 427
HTML URL, 255
HTTP URL, 255
HTTP, 552
IP, 376, 379
.NET
$+, 254
URL, 257
oneself, 399
this|that, 298, 303, 312, 319
594
, 62
, 452
, 239
, 381
CSV
Java, 476
CVS
PHP, 569
,
, 266
Java, 271
, 96
, 88
, 329
,
CSV, 330
, 329, 565
makudonarudo, 210, 282
, 322
, 247
, 327
, 275
, 330
, 245
Java, 443,
448, 451, 455, 463
, 140
, ,
303
, , 255
,
428
, 404
, 190
, Java, 461
, , 250
, 127
, 299
, 349
, 538
, , 239
, , 381
595
Perl, 386
PHP, 551
, 389
, 389
Perl, 388
, 387
, Perl, 392
CVS,
PHP, 569
, 486
, , 96
,
, 88
,
, 314
, 384
PHP, 526, 530
, 145, 442
, 145, 442
, Java, 477
, 291
, 335
, , 329
, 319, 565
, 322
, 565
, 99
, 231
.NET
, , 130
, 186
, 106
, 564
, Perl, 355
, 393
, 116
, 419
, 356
, 297, 419, 510, 566
, 296
, 354
, 235
, 352
, 350
, 431
, 352, 394
, 412
, 371, 424
, 354
, 229
, 561
, 116
, 368
, 563
,
, 225
, 356
, 116
, 145
/i, 74
Ruby, 145
study, 430
Java, 447
PCRE, 564
PHP, 564
Java, 477
PCRE, 563
PHP, 563
, 175
Java, 440
.NET, 484
Perl, 349
PHP, 523
, 96
, 484
, 175
, 137
, 331
, 335
Perl,
386
596
,
, 225
, 570
, 515,
563, 570
PHP, 570
,
, 291
, , 146
, 159
(Ravi Sethi), 228
, 290
, 152
, 142
, 302,
308
, 316
, 142, 158
HTTP, 153
, 55
\w, , 120
, 151
, 151
, 155
, 155
, 32
, 156
, 165
POSIX, 166
, 37
,
213
, 157
, 156
, , 304
, 34
, 193
, 164
, 483
, 165
, 165
, 156
, 163
, 168
Emacs, 168
, 263
, 356, 374, 377
, 464, 475
, ,
533
, 229
, 144
, 99
preg_match, 536
, 297
, 269
,
, 146
, ,
, Perl, 67
, 380
, 419
,
151
, , 53
(Henry Spencer), 120, 232
, 321
, 356, 374
, 460
, , 253
, 294
#, 137
Emacs, 135
Java, 137
, 137
Python, 138
Tcl, 139
VB.NET, 137
, 135, 369
, PHP, 525
makudonarudo, , 210, 282
, 322
, 327
, 275
, 247
, 97
, 226, 235
, 330
, 302
597
, , 300
, 217
,
, 215
HTML, , 51
XML, 570
, 570
, 246
, 447
Java, 478
, 228
, 146
(Thompson Ken), 147
, 156
, 156
, 35
, 193
, 190
,
, 308
,
Perl, 346
preg, 524
, 548, 550
, 417
, 250
(Jeffrey Ullman), 228
(Larry Wall), 120, 434
, 223
, 224
, 337
, 155
,
, 303
, 294
, 182
, 402
, 393
, 183
.NET, 486
, , 301,
303
, PHP, 289
, , 245
,
, 546,
549
, 155
, , 308
, , 47
, 280, 397, 408
, 280
, 280
, 306, 322
, 329
, 328
598
, 141
/x, 349
, Perl
\p(Any}, 349
\p{Assigned}, 349
\p{^}, 349
\p{}, 349
\p{Unassigned}, 349
, 159
Java, 441
, 142
PHP, 566
Perl, 416
, 416
, 227
,
422
, 277