Вы находитесь на странице: 1из 278

,

.........................................................................

...............................................................................

11

1. ...........................

14

1.1.

............................................. 14

1.2. World Wide Web ................................... 17


1.3. ........................................................................ 22
1.4. - .................................. 26

2. ...............................................
2.1.

27

............................................................... 29
2.1.1. ......................................................... 29
2.1.2. ......................................................... 32
2.1.3. ............................................................... 34

2.2. - .......................... 38
2.3. ................................................. 40
2.4. ................................... 46
2.4.1. ......................................... 46
2.4.2. .............................................47
2.4.3. ........................ 48
2.4.4. ..................................... 49
2.4.5. ...... 51
2.4.6. ....................................................... 51
2.5. - ......................................... 52
2.6. ............................ 54

3. TEXT MINING ..................................................


3.1.

5
58

- ........................................................................... 59

3.2. Text Mining ................................................................ 60


3.2.1. ........................................................................ 61
3.2.2. ........................................... 63
3.2.3. .................................................. 65
3.2.4. .................................................... 67
3.2.5. ....................................... 68
3.2.6. .............................................................71
3.3. Text Mining ....................... 74

4. ...........................
4.1.

76

.............................................................. 76
4.1.1. ........................... 77
4.1.2. ...................................... 78
4.1.3. ............................................................... 78

4.2. Rocchio............................................................................. 79
4.3. ........................................................................ 80
4.4. - ................................................................... 81
4.5.
........................................................................ 81
4.5.1. ....................................................................... 82
4.5.2. ..................................................... 83
4.5.3. .................................................... 85
4.5.4. .............................................. 86
4.6. .................................................... 86
4.6.1. ........................................ 86
4.6.2. ....................................................... 87
4.6.3. .................... 88
4.6.4. ......................................... 90
4.7.

.......................................................... 92

4.8. ............................................. 99

5. ..................................... 101
5.1.

- .............................................. 103
5.1.1. - ............................ 103

5.1.2. - ...................... 105

5.2. k-means............................................................................ 107


5.3. -........................ 108
5.4. .................................................... 109
5.5. ..................................................................... 113
5.6. .......................................... 116
5.6.1. HITS ................................................................................ 117
5.6.2. PageRank .......................................................................119
5.6.3. Salsa .............................................................................. 120
5.6.4. .......................................................... 123

6.

...................................... 124
6.1.

.............................................. 124
6.1.1. ................................................................. 124
6.1.2. ................................................................................ 127
6.1.3. ........................................................130
6.1.4. ................................................................................... 131

6.2. ..................... 132


6.3. ............................................. 135
6.4. ................................ 138

7. ............................ 142
7.1.

.................................................................... 145

7.2.

..................................................................... 147

7.3.

..................................................................... 149

7.4.

................. 151

7.5.

........................................................... 152

7.6.

............................................................. 154

8. ...................................... 156
8.1.

....................................................... 157
8.1.1. .................................................................. 157
8.1.2. ................................................................ 157
8.1.3. ...................................................158


8.1.4.
8.1.5.
8.1.6.
8.1.7.
8.1.8.

.........................................................................158
.......................................................... 160
............................................................................. 161
.......................................................................... 161
.................................................................. 161

8.2. ............................................................... 162


8.3. ................................................................. 163
8.4. WWW ............................................................. 166
8.4.1. WWW ............................................................................. 166
8.4.2. .............................................. 169
8.5. ................................................... 172

9. ....................................... 175
9.1.

....................................................... 175

9.2. .................................. 178


9.3. ............ 182
9.4. ............................................... 184
9.5. ............................................... 187
9.6. .............. 189

10. ............................... 191


10.1. ....................................................................... 191
10.2. ....................................................... 192
10.3. ............................................................... 193
10.4. ................................................ 199
10.5. ............................ 207

11. ................................... 216


11.1. ................................... 216
11.2. .............................................................. 219
11.3. ......................... 224
11.4. .................................................. 228
11.4.1. DFA ...................................................................................... 229
11.4.2. ........................................................... 230

11.4.3. ................................................................................. 236


11.4.4. ......................................................................... 237

11.5. ...................... 239

.......................................................................... 250
............................................................. 256
.............................................................................. 259
............................................................................. 268

.
.
, . -,
, , . [116] , ,
, ,
. -, , ,
.
.
, - [35]. , ,
( ). , . (G. Fox) (), ,
, , [91, 92]. , , .
, ,
, .
, ,
, -,
, . -,

10

. ,
.
, ,
. ,
, , . , , .
,
. , ,
,
, , .
.

.
, , ,
, ,
, .
: , , , ; , ,
. ,
.

,
, ,
.
, ,
9 2008 .

Nothing`s gonna change my world


J. Lennon, P. McCartney

.
, , . , , ,
, .
, , WWW, .
.

. , ,
, .

-.
, , . , .
, . , , ,
.
Text Mining, -, . , , , ,
.

12


,
. , ,
.
. ,
. .

. , , , .
-.
, .
, , .
, , , . , ,

.
(complex networks),
,
,
. , .
, ,
(),
.
, (
), (, ) , , ,
.
, , -

13

. ,
.

.
, ,

.
.
,

.
, ,
, .
, .
, ,
.

1.

1.1.
( , ),
, ,
, .
TCP/IP (Transmission Control Protocol/Internet Protocol), .
IP ,
, ,
. ,
IP- .
IP ,
, .
IP- .
IP ARPANET. IETF (Internet
Engineering Task Force),
.
, ARPANET , 40 . 1969
,

1.

15

.
(ARPA) .
, , -.

TCP/IP . (R. Kahn) . (V. Cerf)

ARPANET, , . ARPANET
:
;
;

;
.

ARPANET. , DNS .arpa: ,
, , IP- 1.2.3.4, 4.3.2.1.in-addr.arpa.
ARPANET , . 1973 ,

16

1.

. 1984 ARPANET
(NSF),
NSFNet,
(56 /), ARPANET. 1990 ARPANET ,
NSFNet.
,
.
,
[47].
TCP/IP.

(. Request for Comments, RFC),
,
. Request for Comments
. RFC IETF
(. Internet Society, ISOC). RFC
, (., , http://www.ietf.org/rfc.html). , .

- ( W3C, IETF, , 2).
19901994 . ,
. . 1994 (. 1), -.
,
:
, ;
;
;
, ;
;
- .

1.

17

. 1.
[: www.isc.org]

1.2. World Wide Web


World Wide Web ( , ) ,
HTTP.
World Wide Web -,
. World Wide Web - . . - , .

, . , , . 2005 Yahoo , 20 .
Google 2004 10 , . .
. Web Server Survey,
2008 - 166 .
, -, -

18

1.

, , .
, (), /
.
,
, , .
, ,
. , .
:
1945 (Vannevar Bush) Memex (memory extension),
, . (Ted Nelson) 1965
Xanadu ;
1980 - (T. Berners-Lee), CERN ( ) , ,
;
1990 ,
CERN - (GUI, Graphical User Interface) .
WorldWideWeb. 1992 GUI Erwise Viola;
1993 . (M. Anderssen) NCSA (

,
www.ncsa.uiuc.edu)
Mosaic
Xwindow System UNIX. CERN
HTML HTTP- , CERN HTTPD.
HTML (Hypertext Markup Language) , -.

1.

19

HTML
, ,
. HTML
SGML ( ), HTML
SGML, . . ISO
8879.
HTML , .
HTML-
,
.
-, , ,
. Internet Explorer, Firefox, Opera Safari.
HTML 4.01,
1999 . 2000 . ISO/IEC
15445:2000 ( ISO HTML, HTML
4.01 Strict).
W3C
HTML. 20 2007.
HTML XHTML ( . eXtensible HTML), , , SGML, XML
2000 W3C.
HTTP (HyperText Transfer Protocol), HTML-.
HTTP ,
, , .
HTTP -.
HTTP- , , . HTTP-
: , (GET,
POST, HEAD, PUT),
(Uniform Resource Locator, URL), , ,
, . URL ,
:
, ,
(, ftp);
, ,
;

20

1.

,
.
HTTP URL :
http://[user[:passwd@]host[:port][/path]
host IP ; :port
TCP ,
, ; path
; user ; passwd .
HTTP:

()
, , .
, , , , ,
,
, .
HTTP
. ,
. HTTP , , .
HTTP ( HTTP 1.1, HTTP
) TCP- . , (permanent connection), TCP HTTP-, . -

1.

21

HTTP 1.1, ,
,
, . TCP- ,
.
, , HTTP, :
IP- ( DNS-);
TCP- ;
URL;
(HTML- );
TCP/IP-.
HTML- , HTML
, . ,

, HTML, ,
CSS (Cascading Style Sheets), -
,
, . , WWW
, .
, .
, , . HTML-

, . .
- .

HTML, XML, .

22

1.

1.3.
WWW -. , , ( ), ,
, . , , , .
(Peer-to-peer, P2P ) , . , (peer) , . peer-to-peer
1984 . (P. Yohnuhuitsman) Advanced Peer to Peer Networking IBM.
P2P [14], , .
P2P , TCP/IP TCP UDP. , ,
(- ). P2P
, , .
, P2P , . , , [51]. , ,
, ,
[14].
/, , (), () .
, , . .
. ,
/
, .
, , , .

1.

23

, P2P (
),
. , ,
, P2P-
, ,
BitTorrent (tracker), .
/ (, ), P2P , , ,
,
, , .
, , Gnutella,
BFS, , , , . DHT (Distributed Hash Tables),
, Kademlia,
P2P-. ,
, , , . 2003
Gnutella2.
.
, .

, . .

. , Microsoft
P2P- Scribe Pastry. PNRP (Peer
Name Resolution Protocol), P2P-,
Windows Vista.
P2P
Sun Microsystems JXTA [28]. P2P-
.
, , :

24

1.

. P2P FTP-,
.
. ,
P2P, SETI@HOME, , . ,
.
. , ICQ P2P-.
, ,
login.icq.com.
. - Skype (www.skype.com), 2003 .
. . ,
KaZaA. P2P Skype 10 .
. ,
Groove Network ( )
OpenCola ( ).
P2P-, 2008 150 .

P2P-. Bittorrent, Gnutella2
eDonkey2000.
BitTorrent 2001 .
BitTorrent , , , , ,
- . (www.bittorrent.com) , , ,
. , (torrent
file) , ,
.
2000 . Gnutella
(www.gnutella.com).
Gnutella2 (www.gnutella2.com), 2003 ,
P2P-.

1.

25

EDonkey2000 2000 .

ed2k-, ID .
EDonkey2000 . 200 .
EDonkey2000 10 .
EDonkey2000 . , .
, .
- , . , ,
, .
, P2P-,
, , , .
, , ,
,
(, 2 ).

. ,
(DNS) , 2 [115].
2
GRID.
distributed.net, , .
,
,
- .
, , , .
, P2P. (, Napster
2001 ). , .
Gnutella, , 70 % . -

26

1.

, . . - .
P2P-
. . ID . , , .
P2P- , .

1.4. -
, -, ( ):
;
.
, , .
,
, , .
, , , ., .

2.

, , :
, .

, . - , .
(Retrieval Systems, ) . , , , , ,
, , , .
-
:

;
;
, ;
.
1966 16-

MARC (http://www.loc.gov/marc/), , .
.
1972 MARC-2
[67, 32], .

28

3. Text Mining

1970-
. : 1965 Dialog (http://www.dialog.com/), Thomson, .
1990-
Z39.50 . 1994

(http://www.usg.edu/galileo/) Site-Search , Z39.50.


Z39.50
WAIS (Wide Area Information
Service) [127], .
-
,
. , ,


. .
, Google, Yahoo, AltaVista, AllTheWeb, MSN, ndex,
Rambler, -. ,
[68].
, ( ),
(- ),
( ). , , ,
(. Terms), , ( . Bag of Words), , , . , , .
:
, , .

2.

29

-
.
(. .
), .
, , .
, , , ,
,
, , ,
. ,
, , .

.
i ti T (i = 1,..., M ), d ( j ) ,
D , wi( j ) 0 , -

( ti , d ( j ) ) .

ti , d ( j ) ,
: wi( j ) = 0 . d ( j )
d ( j ) = ( w1( j ) , w2( j ) ,..., wM( j ) ) .

gi , ti , :

gi (d ( j ) ) = wi( j ) .

2.1.
2.1.1.

. , .

. : 0 ( -

30

3. Text Mining

) 1 ( ).
: wi( j ) {0,1} .

,
(AND, ), (OR, ) (NOT, ). , , ( , dnf).
, , , , .
[68] : q = a (b c).
, a, b c (. 2), , (a b c) (a b c) (a b c) ,
. . :
qdnf = (1,1,1) (1,1,0) (1,0,0) , -

a, b c ( ),
.

. 2. q = a (b c)

. , :
q qdnf = qcc( i ) ,
i =1,.., N

2.

31

qcc(i ) i - qdnf .
d ( j ) q sim(d ( j ) , q ) ( . similarity,
) :

sim(d ( j ) , q ) 1, qcc(i ) , qdnf , k


d ( j ) .
sim(d ( j ) , q ) 0.
, sim(d ( j ) , q ) = 1, d ( j ) () q .
. ,
,
.
, .
, - . STAIRS
IBM. , ,
:
, ;
,
;
, , , , ;
, .
:
1. , ,
, , .

32

3. Text Mining

2. ,
, , .
3. . .
4. ,
.
, . , , ,
. -
.
2.1.2.
,
.
.

, . (G. Salton),
. . (E. Fox) . (H. Wu) 1983 [130].
,
[0, 1]. D ,
d D x y . x, d y, d x y . , , :

x = f x

idf y
idf x
, y = f y
,
max idf
max idf

f x x d , idf x ,
( ), x .

2.

33

(19271995)


( ) d = ( x , y ) , [0, 1] [0, 1].

.
: qor = x y
qand = x y :

sim(qor , d ) =

x 2 + y 2
;
2

(1 x ) 2 + (1 y ) 2
.
sim(qand , d ) = 1
2
, , , a b, :
a 2 x 2 + b 2 y 2
;
sim(qor , d ) =
a 2 + b2
a 2 (1 x ) 2 + b 2 (1 y ) 2
.
sim(qand , d ) = 1
a 2 + b2

34

3. Text Mining

2, p , 1 p < .
, m , ( p ) ( p ) p :
qor = x1 p x2 p ... p xm qand = x1 p x2 p ... p xm . :
1

x1p + x2p + ... + xmp p


sim(qor , d ) =
;
m

(1 x1 ) p + (1 x2 ) p + ... + (1 xm ) p p
sim(qand , d ) = 1
.
m

, p = 1 :
sim(qor , d ) = sim(qand , d ) =

x1 + x2 + ... + xm
.
m

p :
sim(qor , d ) max( xi );
sim(qand , d ) min( xi ).
,
(
).
, , ,
q = ( x1 p x2 ) p x3 , :
1
p

p
1 1 p

sim(q, d ) = 1 1 ( x1 + x2p ) + (1 x3 ) p .

2 2

2.1.3.

(Fuzzy set theory), . .


(L. A. Zade), ,
,
[0, 1], 0 1
[18].

2.

35

, (fuzzy set) A U ( )
( A (U ),U ) , A (U ) u U A .
[0, 1]. ,
u U . , .
A B U , A A U u U .
, , :

A (u ) = 1 A (u );
AB (u ) = max( A (u ), B (u ));
AB (u ) = min( A (u ), B (u )).
. , [18]. A u U , A (u ) .
A sup A (u ) .
uU

36

3. Text Mining

A U ,
A 1/2.
, [18], U , [0, 100], u, , . U , ,
(. 3):

(0 u 50),
0,

1
A (u ) = u 50 2
1 + 5 , (50 u 100).


[50, 100], 1,
u = 55 (. 3). , ,
55 ,
0.5, 70, 0.9.
50 ( A (u ) = 0 ) . ,
, .

, . c , cil
i l :

cil =

nil
,
ni + nl nil

ni , ti , nl
, tl , nil , .

2.

37

. 3. ,

, d j
ij , :

ij = 1 (1 cil ).
xl d j

d ( j ) i- ti , ti . tl d ( j ) ti (. . cil ~ 1 ), ij ~ 1 ti
d ( j ) . , cil << 1 , ij cil (
). ,
max , . , ,
cil .
, . , -

38

3. Text Mining

,
, :

qdnf = cc1 cc2 ... cc p ,


cci i - , p qdnf .
, qj d ( j ) Dq , q (
), :
p

qj = 1 (1 cc j ) .
i=1

2.2. -
-
- (Vector Space
Model), . 1975 .
SMART [131]. .
, , ,
,
,
. , , . .
ti d ( j )
wi( j ) .
q ,
, , wiq .
,
n- , n . , d ( j )
q , -

2.

39

d j = ( w1( j ) , w2( j ) ,..., wn( j ) ) q = ( w1q , w2q ,..., wnq ) .


.
wi( j ) -

freqi( j ) , :
wi( j ) = freqi( j ) / max freqk( j ) .
1 k n

tfi ( j ) TF ( . Term Frequency


).
,
, , . ,
,
:
wi( j ) = tf i ( j ) log

N
,
ni

ni , ti , N
. ,
, ,
, . , ni = N , N
, wi( j ) = tfi ( j ) log = 0.
N
,
. 1988 ti :

freqiq
N
log ,
w = 0.5 +
q

max freql
ni
1l n

q
i

freqiq ti .
wi( j ) . TF IDF , TF , IDF , ,
( . inverse document frequency).

40

3. Text Mining

, sim(d (1) , d (2) ), -

( w1(1) , w2(1) ,..., wn(1) ) ( w1(2) , w2(2) ,..., wn(2) )


d (1) d (2) . , sim(d (1) , d (2) ) [0, 1].
sim(d (1) , d (2) ) d (1) d (2) . d sim( d , d ) = 1.
d ( j ) q :
n

sim(d j , q ) =

d q
=
| d ( j) | | q |

( j)

( j)
i

i =1

(w
i =1

( j) 2
i

wiq
n

(w )
i =1

q 2
i

- , , , :
;
(
);
.
-
, .

2.3.
1977 . . (S. E. Robertson) . -
(K. Sparck Jones) , 1960 . ,
,
-
. , .

2.

41

. (Microsoft Research Laboratory)

, :
, d q ?
, . ,
, /, ,
.
,
. , .
,
( ).

,
-
. ,
, .
( ,

3. Text Mining

42

, , , ).
, ,
,
.
, , . X , Y , X , Y G, G .
X Y :
P( X | Y ) =

P( X Y )
.
P (Y )

, :

P( X | Y ) =

P( X Y )
P (Y X )
; P (Y | X ) =
;
P (Y )
P( X )

P(Y X ) = P( X Y ) P ( X | Y ) =

P(Y | X ) P ( X )
.
P (Y )

, ,
( R ) P( R | q, d ), q , d
, , ( R )
P ( R | q, d ).
O( R) :
O( R) =

P( R)
P( R)
=
.
P( R ) 1 P( R)

, , 1 P ( R ) < 0.5
1 P ( R ) > 0.5.
, :

O ( R | q, d ) =

P ( R | q, d )
.
P ( R | q, d )

:
P( R | d , q) =

P( R q d )
.
P(q d )

2.

43

P( R q d )
, , d q , P(q d ) ,
q d .
:
P( R | d , q) =

P (d | R q ) P ( R q ) P(d | R q ) P( R | q)
=
.
P(d | q) P(q)
P(d | q)

P( R | d , q ) P ( R | d , q ) , :
p(d | R q) p( R | q)
p( R | q) p(d | R q)
p(d | q)
.
O( R | d , q) =
=

p(d | R q) p( R | q) p( R | q) p(d | R q)
p(d | q)
.
T = {t1 , ..., Tn } , D.
d = ( w1 ,..., wn ), :
1, ti d ;
wi =
0, ti d .

,
, :
n

p (d | R, q) = p( d | R, q) = p(ti | R, q).
i =1

:
n
p( R | q) p(d | R q)
p (t | R q)
O ( R | q, d ) =
.

= O( R | q) i
p( R | q) p(d | R q)
i =1 p (ti | R q )

O( R | q ) . , ,
, ( ti T \ q ),
,
. .:
ti T \ q : p (ti | R, q ) = p (ti | R , q ).

3. Text Mining

44

O ( R | q, d ) = O ( R | q )

ti q d

p (ti | R q )
p (ti | R q )
p (t | R q )
.

i
p (ti | R q ) tiq \ d p (ti | R q ) tiq p (ti | R q)

q d
, q \ d ,
, , q ,
.

. ,
, ,
:
ri = p ( wi = 1| R, q );
ni = p ( wi = 1| R , q ).
:

O ( R | q, d ) = O ( R | q )

1 ri

ri

n 1 n .

ti q d

ti q \ d

, :

(1 ri )(1 ni )

(1 n )(1 r ) = 1,

ti q d

O ( R | q, d ) = O ( R | q )

ri (1 ni )

1 ri

n (1 r ) 1 n .

ti q d

ti q

, , . ( , ). :

ti q d

log

ri (1 ni )
r
1 ni
= log i + log
.
ni (1 ri ) ti qd
ni
1 ri

2.

45

,
:

ri =

reli
nreli
; ni =
,
rel
nrel

reli ,
i; nreli , .
, :

reli nreli
1

ri (1 ni )
rel
nrel
.
= log
SV = log
nreli reli
ni (1 ri ) tiqd
ti q d
1

nrel
rel
, :

SV =

ti q d

SVi =

log

ti q d

reli (nrel nreli )


.
nreli (rel reli )

,
: d (1) ,..., d (6) (. 1) d (7) ,..., d (9) (. 2), . ,
t1 , t2 , t3 , t4 (, , ). R .
1

t1

t2

t3

t4

d (1)

d (2)

d (3)

d (4)

d (5)

d (6)

reli

rel = 3

nreli

nrel = 3

exp( SVi )

1/4

3. Text Mining

46

reli nreli ,
exp( SVi ) , . 1.
,
(7)
d , d (8) , d (9) , t1 , t2 , t3 , t4 . 2. . , . 2, , , d (8) ,
.
2
t1

t2

t3

t4

(7)

(SV)
2 = log 4 + log1

d (8)

4 = log 4 + log 4

d (9)

0 = log 4 + log1 4 + log1

2.4.
( P2P)
, . ,
[104].
2.4.1.

, , , (ID): (, ), (Key), . . MN , M , N
. ID , . . 4.
, ID 0 [155, 156].
ID 0 14. ,
14. ID 14 ID 0 , , 14.

2.

47

. 4.

, .
2.4.2.
(Breadth First Search, BFS) [104]
P2P, , , Gnutella (www.gnutella.com). BFS (. 5) P2P N . q ,
( ). p , . r (Query) , - (QueryHit), . QueryHit
, (. 5).

48

3. Text Mining

q QueryHits ,
. QueryHit , .

. 5. BFS

BFS ,
(
).
. ,
(Time-to-live, TTL). TTL , (
, , , ,
, ). TTL
57 , . TTL 0, . BFS .
2.4.3.
(Random Breadth First
Search, RBFS) BFS
[104]. RBFS (. 6) q
, .
RBFS.
RBFS ,
;
, . ,
. .

2.

49

. 6. RBFS

2.4.4.
(Intelligent Search Mechanism,
ISM) P2P (. 7).

,
, , , .
, , .

. 7. ISM


(profile) , . .
.
,
.
, , .
ISM , T N . , ,
(Least Recently Used, LRU) .

50

3. Text Mining

(Relevance Rank, RR)


Pl , , q . RR Pi , Pl q
, , RR( Pi , q) :

RR ( Pi , q ) = Sim(q j , q ) S ( Pi , q j ),
jQ

, . Q , Pi ; S ( Pi , q j ) , Pi q j ; Sim , -
:
Sim( q j , q) =

qj q
qj q

RR , . ,
, , . , ,
Qsim( q j , q) . , P1 q1 q2 q : Qsim( q1 , q) = 0.5 Qsim( q2 , q) = 0.1 , P2 q3 q4 Qsim( q3 , q) = 0.4
Qsim( q4 , q) = 0.3 . = 10, Qsim( q1 , q)10 ,

0.510 + 0.110 > 0.410 + 0.310.


= 1 , P2 . = 0 ,
.
ISM ,
. , Gnutella
, , . ISM ,
,
. [156] , , .

2.

51

, ,
BGP4 (RFC 1771),
, .
2.4.5.
[151, 152] >RES (. 8), ,
.
>RES , Z
( Z ). >RES
q k ,
m . k 1 10
>RES BFS , (Depth-first-search).

. 8. >RES

>RES ISM, . ISM , . >RES ,


. , >RES ,
( ,
). ,
, ,
.
2.4.6.
[109] (Random
Walkers, Algorithm, RWA). ,
, -

52

3. Text Mining

. ,
, k ,
. , k - T
, kT . RBFS, RBFS . , RBFS
,
. RBFS, RWA
.
, RWA, (Adaptive Probabilistic Search, APS) [156]. APS
, ,
. RWA
, APS
( )
. APS
, RWA.

2.5. -
-
- , , , .
,
. .
SQL,
, , :
, ( );

;
, , , , (
. snippet , ), . .

2.

53

-
, , , .
, -
, , ,
(, , , , ).
, (Google,
Alltheweb, AltaVista, . .), AND, OR NOT,
.
Lycos, : ADJ, NEAR, FAR BEFORE.
Google
(http://www.googleguide.com/),
( , , ), (OR) ().
, URL, , . .
Google, , (site:), (admission site:), , DVD
player $150..250, , , . .
HTML, PDF, RTF, DOC (MsWord), PS.

, . , (Custom Search Folders), ,

() .
, Vivisimo (http://www.vivisimo.com),
Mooter (http://www.mooter.com) Nigma (http://www.nigma.ru) .
iBoogie (http://www.iboogie.com/)

3. Text Mining

54

, Windows.
, , ,
- Zoom
[3] InfoStream [31], .

2.6.
, (recall) (precision).
,
. -
, .
,
.
(NIST)
Text REtrieval Conference (TREC,
http://trec.nist.gov/) [125]
(, http://romip.ru/).
:

a
b

c
d

:
(recall):

r=

a
.
a+c

(precision):
p=

a
.
a+b

(accuracy):
acc =

a+d
.
a+b+c+d

2.

55

(error):

err =

b+c
.
a+b+c+d

F- (F-measure):
F=

2
1 1
+
p r

(average precision):
1 k
ArgPrec= prec _ rel (i ),
k i =1
k , , i
, prec _ rel (i ) i - ( ).
i - , prec _ rel (i ) =0.

11- / TREC
(), ,
[142]. , . n , 0, 1/n, 2/n, ..., 1.
/ :
1. 0.0; 0.1; 0.2; ...; 1.0
( 11 ).
2. .
3. .
,
[54] (. 9). 20 , 4 . , , , , .
0.25, 0.5,
0.75 1.0. , 0 0.5 1.0 (

3. Text Mining

56

1.0), 0.6 0.7 0.75,


0.8, 0.9 1.0 0.27 (4/15).
1,0

0,8

0,6

0,4

0,2

0,0
0,0

0,2

0,4
0,6

0,8

1,0

. 9. .

-
11- ,
,
. , , .
(recall) . , .
, . , :
;
;

2.

57

, . . ;
, ;
;
. .
,
. , .
, , , . ,
, .
,

.
. , . .
-
, Text Mining.

- , . . 0.9999.
(RBC, Google) .

3. TEXT MINING

,
, ,
,
.

(Text Mining),
. , , Text Mining . Text Mining
[75, 32].
Text Mining
, , , . Text Mining
. , Text Mining ,
,
, . ,

. Text Mining
. Text Mining,
.
, (Data Mining), Text Mining.
,
Data Mining . - GTE Labs:

3. Text Mining

59

,
[122].
90-
, Text Mining
Data Mining, . Text Mining ,
. Text Mining ,

, .

3.1. -
Text Mining -.
-, ,
:
- .
(J. J. Jerry), . (J. Jerry).
- ,

. (D. Mangeim), . (R. Rich).
- -
,
(. ).
- ( ),
(. ).
, -
, .
- : . -

(, ). -
.

60

3. Text Mining

3.2. Text Mining


, Text Mining : (classification, categorization),
(clustering), , (feature extraction),
(summarization), (question answering), (thematic indexing) (keyword searching).
, , .
, . Text Mining ,
, .
, . ,
,
. Text Mining .
. , ,
, .
Text Mining ,
, (, ) .
, Text Mining,
, , ,
. , ,
[3]. , , . ,
.
( , ) .
, ,

3. Text Mining

61

, . ,
, .
3.2.1.
(Feature Extraction)
, . ( , , , ), , , ,
. .
:
) Entity Extraction ,
. , , , , .
) Feature Association Extraction .
) Event and Fact Extraction , .

- ,
.
, ,
. , - , , . .
,

, .
(. 10). ,
( ). ,
, , -.
,
. , ,

62

3. Text Mining

, ,
, , , .

. 10.

(, , ), .
, , (, , , , ) , . . .

3. Text Mining

63

3.2.2.
,
( ) . , ,
. , , , . ,
:
;

, , .

TVP ' , .
p j ( j = 1, ..., M )

,
D

d ( i ) D (i = 1, ..., N ) , Pj D , p j , e(ji ) :

1, d ( i ) Pj ,
e =
(i )
0, d Pj .
(i )
j

p j pk :
N

v j ,k = e(ji )ek( i ) .
i =1

v j ,k
TVP ' .
, ,
TVP ''
. Wi = {w1(i ) , ..., wn( i ) } , d (i ) .
p j ( j = 1, ..., M )
, :

64

3. Text Mining

IP( p j ) = Wi .
d ( i ) Pj

S = {s1 , ..., sK }

, D , t ( j ) = (t1( j ) , ..., t K( j ) )
ti( j ) , :
1, si IP( p j ), i = 1, ..., K ,
ti( j ) =
0, si IP( p j ), i = 1, ..., K .
p j pk :
K

v j ,k = t j tk = ti( j )ti( k ) .
i =1

, TVP " v j ,k .
, , ,
(. 11).
, , v j ,k > 0 , v j ,k > 0 ,
, ( i ), d ( i ) Pj , d ( i ) Pk . ,

:
IP ( p j ) IP( pk ) , , t j tk = v j ,k > 0.
.
, .
. , , ,
, ,
, .

3. Text Mining

65

. 11. : ) TVP ' , ) TVP "


( , )

(
, . . 11) , k-means (. . 5.2),

.
3.2.3.
(Automatic Text Summarization)
, ,
. . [55].
,
.
.
,
:
, ,
, ;
, ,
;
, ,
, .
( )
, -

66

3. Text Mining

. :

Weight = Location + KeyPhrase + StatTerm .


Location , ,
,
, , .
(KeyPhrase) -, , , ,
. . KeyPhrase
, , .
(StatTerm) .
(, , )
, .

. ,
, . ,
,
. ,
.
.

, . - , , , , -, , ,
.
, , . ,
. ,
.
- :

3. Text Mining

67

, .
.
, .
.
,
, . , . , ,
. .
.
, .

, . . .

:
.
, , . .
;

, ;
, , , .
, , Microsoft Word
.
3.2.4.
( , ) () [1]. ( ) , -

68

3. Text Mining

, . , . ,
,
, .
3.2.5.
, ,
, .
( ),
( ) .
,
, , .
( )
, ,
.

. ( ) : , .

, -, . , ,
.
, . ,
, .
, , , . -

3. Text Mining

69

,
.
, , , ,
. .

, , , . ,
, ,
: . , ( ). ( ).
, . . .
,
, , (),
[82], [103] [110].
.

. ,
, ,
.

. ,
. (
, ) .
, , ,
InfoStream, , , 6 12
(, ). ,

70

3. Text Mining

.
:
. , , , :

A A, A A.
A .
.
A B , . .:

A B
/ B A.
:

A B, B C
/ A C.
, ,
, ,
. , , .
, , :
A B B A,
A B, B C A C.
, , ,
,
, .
,
, , 3, 4 5 .
d ( i ) d ( j ) :
(i )
( j)
1, d d ,
ai , j =
(i )
( j)
0, d d .

3. Text Mining

71

i, j : ai , j = a j ,i ,
:
i, j , k : ai , j = 1, a j ,k = 1 ai ,k = 1.
( ), :
N

i =1

|ai , j a j ,i |

j =1

i =1

j =1

ai , j

:
N

i =1

ai , j a j , k ai , k

j =1 k =1
N
N

ai , j

i =1

j =1

N .
, , .
,
.
, -.
, , ,
( ), .
3.2.6.

,
,

. ,
. , -

72

3. Text Mining

, .
, . . ,
( ).
.
. -
. , TF IDF. :
1.
. (
), . -
. ,
, .
2. ( ).
3. , , .
4. ,
, .
5.
. ,
, .
,
.
, . [120], , , . :
1. (
Text Mining ).
2. .

3. Text Mining

73

3. , .
4. , .
, ,
, ,
, , , [94]. [31] , :
) , ;
) , , , ( , IDF - );
) , , (
, , . .);
) ( ,
).
:
n ;
D1 ;
Dn ;
Di i- ;
PlusDic ;
sim( Di , D j ) i j ;
sim( Di , PlusDic) i ;
Ranki , i - .

sim( Di , D j )
- .
,
, , ,
w Di , D j , D j :

74

3. Text Mining

sim( Di , D j ) = P( w Di | w D j ) P ( w D j ).
Newi Di , ),
:

Newi =

Ranki sim( Di , PlusDic)


N

log(i + 1)sim( Di , D j )

j =1

,
,
, - .

3.3. Text Mining



, , , . , IBM (www.ibm.com)
Intelligent Miner for Text, , Text Mining:
Language Identification Tool ,
.
Categorisation Tool .
Clusterisation Tool
, , .
Feature Extraction Tool
, , , , .
Annotation Tool .
PolyAnalyst (www.megaputer.com)
, , . PolyAnalyst TextAnalyst, Text Mining:
, , , .

3. Text Mining

75

SAS (www.sas.com) SAS Text


Miner, , ,
.
Text Mining Oracle (www.oracle.com). , Oracle Text,
. Oracle Text
.
,
, .

4.

, !
, ,
, , !

4.1.
(Text Categorization, TC) (
, ).
(machine learning, ML)
(information retrieval, IR) [33, 134]. :

;
.
,
.
, , .
,
,
.
(
),
. ,

.

4.

77

, . , .
.
. , ;
. ,
. , ,
.
:

( ) ;
;
;
;
,
;
(. . , );
.
4.1.1.
D = {d (1) , ..., d ( N ) } , C = {c1 , ..., cM }

, , d ( i ) , c j
, d ( i ) c j (1 True)
(0 False). ,
.
, . .
, , :

78

4.

1. . ' .
2. . .
.
,
, , . , , C = {c1 , ..., cM } M
{ci , ci } .
,
[0, 1].
,
, . . .
4.1.2.

, ci CSVi
( ), D [0; 1], .
, , .
ci () i .
CSVi ( d ) > i , d ci . d k , . . k , CSVi ( d ) .
, .
. ci
,
. ,
, ci ,
.
4.1.3.

Ci c ( i ) = (c1( i ) , ..., cN( i ) ) ,


N . d :

4.

79

CSV (i ) (d ) = d c (i ) .
,
CSV ( i ) (d )
Ci
d = ( d1 , ..., d N ) , d :

d c (i )
.
CSV (d ) =
| d || c ( i ) |
(i )

c (i ) , .

4.2. Rocchio

(profile, ) .
, ( ) .
, . (J. Rocchio) [129], ,
. Ci c ( i ) = (c1( i ) , ..., cN( i ) ) (N ),
ck(i )
Rocchio :
ck(i ) =

| POSi | d ( j )POSi

wk( j )

| NEGi | d ( j )NEGi

wk( j ) ,

wk( j ) tk d ( j ) (, ,
TF IDF), POSi , c (i ) , . . POSi = {d ( j ) | (d ( j ) , ci ) = 1} ,
NEGi , c (i ) : NEGi = {d j | ( d j , ci ) = 0}. ,

, . , = 1 = 0, Ci
,
.

4.

80

CSV ( i ) (d ) , d , Ci ,
.

4.3.
,
, . ,
.
.
( F ) (C ).
:
D ,
, ;
O = oi , j , i

Di (i = 1, ..., N ) , j ( j = 1, ..., M ), oi , j CSV (i ) (d ( i ) ) .



M , MD O F , :
M = arg min MD O F .
M

,
,
. . , F

ij

i, j

mi , j M i - j - .

4.

81

4.4. -
- ,
- (
), , (, )
. ,
, . .
.
Ci , {d1(i ) , ..., d K(i ) ) :

( x = d1( i ) ) ( x = d 2( i ) ) . . . ( x = d K( i ) ) , Ci .
, ,
, , -, , -, . - ,
, . , , . :
(( )
( )
( )
( - ))


,
.

4.5.


, ,

82

4.

. 1011 . ,
, , 10 000 (. 12).

, , ,
, . .

. 12.

, , () . , .
4.5.1.
, . 13.
,
, ( ), [16].
( NET ), F , , OUT .

4.

83

. 13.

:
n

NET = xi wi ,
i =1

n , , xi i -
, wi , NET .
:
OUT = F ( NET ).

:
OUT = K NET ;

1, NET > T ,
OUT =
1, NET T ;
OUT =

1
;
1 + e NET

OUT = th( NET ).

4.5.2.

,
, , . .
:
(FeedForward)
, ;
, .

84

4.


.
(. 14), 1958 . . (F. Rosenblatt). .
.

,
+1 1.

. (19281971)

:
1. wi (i = 1, ..., N ) b : .
2. xi (i = 1, ..., N )
d .
3. :

y (t ) = sign wi (t ) xi (t ) b ,
i =1

t , b .
4. :

wi (t + 1) = wi (t ) + r[d (t ) y (t )] xi (t ), i = 1, ..., N ,
wi (t ) i - t ; r ; d (t ) .

4.

85

,
.
5. 2.

. 14. : X ;
W ; (1), (2), (3) ;
Y

,
. ,
(, ). . , [45].
,
.
.
.
, .
4.5.3.
,
- (
). -

86

4.

() . - :
1. X = {x1 , ..., x N }

OUT = {OUT1 , ..., OUTM } .


2. j = 1, ..., | X | j = T j OUT j , () T j .
3. j : j = 0, .
4. j , j 0, wij
: wij ( s + 1) = wij ( s ) + ij , ij = j xi , j
, i , .
5. 1.
, Toolbooks , MatLab.
4.5.4.

, , .
d ( j ) , wk( j ) ; , , ,
. (back propagation).
,
, , .

4.6.
4.6.1.

D C : P(C | D).
: D = ( w1 , ..., wN ) ,
wi i , N .
:

4.

87

P(C | D) = ( D) = ( i wi ) ,
i =1

C {0, 1} , = {1 , ..., N } , , :

( x) =

1
.
1 + exp( x )

, , i ,
0.
i , ,
i .
4.6.2.


C , F1 , ..., Fn :
P(C | F1 , ..., Fn ) .
:

P (C | F1 , ..., Fn ) =

P(C ) P ( F1 , ..., Fn | C )
.
P( F1 , ..., Fn )

:
P(C | F1 , ..., Fn ) = P(C ) P( F1 , ..., Fn | C ) = P(c) P( F1 | C ) P( F2 , ..., Fn | C , F1 ) =
= P (c) P ( F1 | C ) P ( F2 | C ) P( F3 , ..., Fn | C , F1 , F2 ).
,
F i , Fj i j :
P ( Fi | C , Fj ) = P ( Fi | C ).

:
n

P(C | F1 , ..., Fn ) = P (C ) P( F1 | C ) P( F2 | C ) ... P( Fn | C ) = P (C ) P( Fi | C ).


i =1

4.

88

.
:

P( D | C ) = P( wi | C ).
i

:
P (C | D) =

P (C )
P ( D | C ).
P( D)

, C
C . :
P(C | D) =

P(C )
P(wi | C );
P( D) i

P(C | D) =

P(C )
P(wi | C ).
P( D) i

C ( ):
P(C | D) P (C )
P ( wi | C )
=
.

P (C | D) P (C ) i P ( wi | C )
:
ln

P(C | D)
P(C )
P( wi | C )
= ln
+ ln
.
P (C | D)
P(C ) i
P( wi | C )

P(C | D)
> 0, (. ., ,
P (C | D)
p (C | D) > p (C | D) ), , D
C.

ln

4.6.3.

(). , , .

4.

89

, ( 0 1), , . , 0.5, ,
.
, . (P. Graham) [95],
n w1 ,..., wn ,
, ,
:
Spm =

w
.
(1
)
w
+

w

i

, S , ,
, , ,
t. , , :
P ( S | A) =

P( A | S ) P( S )
.
P( A | S ) P( S ) + P( A | S ) P( S )

, ,
, - , , P( S ) = P( S ), :

P ( S | A) =

P( A | S )
.
P( A | S ) + P( A | S )

, A1 A2 , ,
t1 t2 . ,
( ). , , ( t1 t2 ), ,
:
P( S | A1 & A2 ) =
=

P( A1 | S ) P( A2 | S )
=
P( A1 | S ) P( A2 | S ) + P( A1 | S ) P( A2 | S )

p(t1 ) p(t2 )
.
p(t1 ) p(t2 ) + (1 p(t1 ))(1 p(t2 ))


= 1 .

90

4.

,
= 1 . , .
, ,
Spm .
, .
4.6.4.

.


. , , , , , , . .
, , ( ).
,
, .
(, ),
: , , , , (,
).
, ,
,
.

: H 1 , H 0 H1 .
: H1 , H1 . t , , p(t | H1 ) , 1/2. .
- , .

4.

91

,
, ( ). Spm :
Spm( x ) =

x
,
x + (1 ) x

(
) .
,
( H 1 ) . ,
, H1 H 1 .
() .
, , . ,
, . x ( 0, 1) .
, , . ,
, . .
. .
.
( ). , .
(. 15). xi = 1,
i, xi = 0.
( ), ,
+
+

w1 ,..., wn w1 ,..., wn .
. NET +
NET .

92

4.

. 15.

, NET +
NET . , OUT + OUT ,
, OUT +
OUT , , . 15.

4.7.
(Support Vector Mashine, SVM), . . [146, 84],
. . , . . c c ,
,
. ,
N- .
, ,
.

4.

93

. .

,
{x1 , ..., xn } R N {x1 , ..., xn } R N { y1 , ..., yn } {1, 1}.
yi 1 xi
c , 1 . ,
. ( N-
), . (. 16),
:
(), c , c .
: w ,
b xi :

w xi w xi :
N

w xi = w j xi , j .
j =1

w xi = b , . ,
w xi b ,
, . w ,

94

4.

b . ,
?

. 16.

SVM :
,
. SVM ,
w b , > 0 (
) :

w xi b + yi = +1,

w xi b yi = 1.
1 ,
, . ,
xi :

1 < w xi b < 1 ,
(. 17).
w . , , .

4.

95

. 17.

, ,
, SVM , .
, : yi ( w xi b) 1 (

, , yi {1, 1} ).
. xi yi , w
b .
, 2 / w .
w b , ,
w , :
2

w = w w.
.
, , i 0, {x1 , ..., xn }. :

yi ( w xi b) 1 i .

96

4.

, i = 0, xi .
i > 1, xi . 0 < i < 1,
,
.

:

w + C i
2

yi ( w xi b) 1 i , C ,
. , :
w2
+ C i min;

2
i

yi ( w xi b) + i 1, i = 1, ..., n.
:
1
max;
2 w w + C i i (i + yi ( w xi b) 1) min
w,b

i
i

0, 0, i = 1, ..., n.
i
i

w b , :
w = i yi xi ,
i =1

. . ,
i 0 . i > 0 ,
.
, :

y x x b = 0 .
i =1

i i

b , :

4.

y
i =1

97

= 0.

w
, :
1

i j yi y j ( xi x j ) min;

2
i, j
i

i yi = 0;
i =1
C 0, i = 1, ..., n.
i

,
xi , . ,
,
.
:
;
.
,
:
1. ( x ) x , .
2. ,
, (kernel function):
K ( x , z ) = ( x ) ( z ).
( x ) , K ( x , z ) , ( x ) . .
3. :
K ( x , z ) . xi x j K ( xi , x j ) , .

98

4.

4. w b , , ,
w ( x ) b , .
,
. ,

LibSVM
(http://www.csie.ntu.edu.tw/~cjlin/libsvm) [124] :
2

K ( x , z ) = exp( x z ),
.
, . 18. ,
. , , , . ,
( x ) , .
SVM :
;
. , SVM ;
,
.

. 18.

4.

99

:
, C ;
;
.

4.8.

. . c , c . , c . :

True positive (TP) False positive (FP)


False negative (FN) True negative (TN)

true positive (TP) , , false positive (FP) , ; false negative (FN) true negative (TN) . false negative , false positive .
. :

TP
.
TP + FN


, (
,
):

TP
.
TP + FP


.
,

100

4.

( ),
.
.
.

5.

, ,

, (,
, ).
, .
.
,
(, Yahoo! Open Directory) : ? , . . . , ,
(, ), .
, , .
( HTML- ,
, , URL-, . .). ,
, , . .
,

102

5.


(), .
. ,
, ,
,
[25].
, , , .
, , (
)
.


(), :
1/ p

D p ( x , y ) = ( xk yk ) p
k =1

p = 2 :

Dp ( x, y) =

(x
k =1

yk ) 2 .

, , :
Sim( x , y ) = x y,
x , y , ,
, , ,
.
, , . .

( ) .
[111, 134, 32]. -

5.

103

, [31].
- , .

5.1. -
5.1.1. -
LSA/LSI ( . Latent Semantic Analysis/Indexing - /)
[106] (SVD) [93].
D = {d ( j ) | j = 1, ..., n} A , ,
( m ). A r m n A = USV T , U V m r
r n , , S , ( si ,i 0 ).
S A . ,
S , A , .
A ,
S k
( Sk ), U V -

(, U k , Vk ),
Ak = U k Sk VkT A , k . , X M N :

x .
i =1 j =1

2
ij

, Ak k ,
A Ak F , , :

104

5.

Ak = arg min A X
X :rank ( X ) = k

LSA ,
k A , .
Ak k - , ( V ), ( U ).
(),
, , . . .
k LSA . , k
, , .
U k VkT . , U k k - . , VkT
k - . , k -
.
,
LSA. d (,
) , , . ,
.
d ( A ), : d ' = Sk1U kT d .
q , i- 1,
i , 0 , q
: q ' = qTU k Sk1.
q d q ' VkT {d } ( VkT {d } -

d - VkT ).
, ,
, , .
-

5.

105

, .
LSA - , .
HITS (Hyperlink Induced Topic Search)
. LSA , .
.
SVD O ( N 2 k ), N = | D | + | T |, D
, T , k .
LSA ,
. , LSA
.
5.1.2. -
- ( . Probabilitstic
Latent Semantic Analysis, PLSA) LSA,
. PLSA
,
.
, , k z1 ,..., zk ( k ). zi P ( zi ) ,
k

( P ( zi ) = 1) .
i =1

P (d | zi ) ,
zi Z , d D .

P ( d | z ) = 1.
i

d D

P ( t | zi ) ,

zi , t
T zi .

P ( t | z ) = 1.
tT

5.

106

, d t ,
t d ,
(. 19 a, ), :
P ( d , t ) = P( d ) P (t | d ),
k

P(t | d ) = P ( t | zi )P ( zi | d ) .
i =1

, (. 19 ) :
k

P ( d , t ) = P ( zi )P ( d | zi ) P ( t | zi ) .
i =1

k , PLSA
:
P( zi ) , zi ;
P(d j | zi ) , d j -

, zi ;
P (t j | zi ) , t j , -

zi .

. 19. :
) ; )


t d , tf ( d , t ).
:

L = tf (d , t )log P(d , t ),
dD tT

5.

107

,
.
PLSA EM (Expectation Maximization
),
1) , , 2) , [101].
:

P( z | d , t ) =

P ( z ) P ( d | z ) P (t | z )
,
P( z ') P(d | z ') P(t | z ')

z 'Z

L :
P(t | z ) tf (d , t ) P( z | d , t ),
dD

P(d | z ) tf (d , t ) P ( z | d , t ),
tT

P( z ) tf (d , t ) P( z | d , t ).
d D tT

L
. , .
, PLSA .
: 1) U , ui ,k
P(d ( i ) | z ) , 2) V , v
k

j ,k

P(t j | zk ) , 3) S k ,
P( zi ) . T .
P ( z ) USV
SVD, , U V PLSA
. , , k
S PLSA.
PLSA LSA L .

5.2. k-means
k-means (k-) {d (1) , ..., d ( N ) } ( -

108

5.

,
) : k , ( ) .
C j ( j = 1, ..., k ) C j , . k N k ,
,
.
, , :

Sim(d , C j ) =

d Cj
| d || C j |

,
Sim(d , C j ) .
C j ( j = 1, ..., k ) ,
, , , .
, . .,
(,

).
k-means Q :
k

Q(C1 , ..., Ck ) = Sim(d , C j ).


j =1 d C j

LSI, k-means O ( kn ) , n
(). , .

5.3. -
- (Hierarchical Agglomerative
Clustering, HAC) ,
, ,

5.

109

, . ,
. .
HAC . ,

.
, ,
, . . Ci
C j :

Sim(Ci , C j ) =

1
Sim( x, y ).
Ci C j ( Ci C j 1) x , yCi C j , x y

| Ci C j |
Ci C j , x y , Ci C j .
HAC O ( n 2 s ) , n
, s .

5.4.
(Suffix Tree Clustering)
.
W S ,
( ) VW S (, )
V . , | V | 0. ,
substring sub ,
ring . V .
, .
, ().
, .
(, (E. Ukkonen)
[143]), O ( n ) ,
n .
. ,

110

5.

, ,
, .
,
,
. S t1 ,..., tn . , . . t1 , ..., ti S .
, . :
0. t1 .
1. t1t2 .
...
n 1. t1 , ..., tn1

t1 ,..., tn .
n. t1 , ..., tn t1 ,..., tn $
($ ).
abca$
. 20.

-
. , ,
Clusty (http://www.clusty.com) Nigma
(http://www.nigma.ru).
,
, .

. 20.

,
, , () . . ,
( ),

5.

111

. , , , , . ,
, , , . ,
, . , , .
. . 21 I know you know I know. 6
1 6.

. 21.

. . 22
cat ate cheese, mouse ate cheese too cat ate mouse too.
, a f. , ,
, .
, , . Bm Bn , | Bm |, | Bn | .
Bm Bn .
sim( Bm , Bn ) Bm Bn :

112

5.

| Bm Bn |
| Bm Bn |

> ,
> ;
1,
| Bm |
| Bn |
sim ( Bm , Bn ) =
0, ,

, 0 1,
0.6.

. 22.

. 23. ,
. 22 6- ( a f).
= 0.7, b = 0.6, c 0.6,
ate -.
,
Bag of Words, , . , , , .

.

5.

113

5.5.

, .
,
InfoStream [31].

114

5.

. 23.

5.

115

, ( 1 ),
. -
, . , . 1, ,
,
. .
, . , k-means ,
- .

. T ti (i = 1, ..., N )
() Pj ( j = 1, ..., M )
P , , .
E = P T P ,

.

, , , k-means LSI, , .
P ( ).
, .
- , .
, . , .
M , ,

116

5.

. A = M T M . A .
, B = M M T ,
.
B
A . ,
A ,
B , .

5.6.
,
. . (, ).
, , ,
.
, . .
, . , , . ,
-
, .
-, , HITS
(hyperlink induced topic search) PageRank, 1996
IBM . (J. M. Kleinberg) [105] . (S. Brin) . (L. Page) [80].
,
,
.

5.

117

5.6.1. HITS
HITS (Hyperlink Induced Topic Search),
. , - (. . 5.1) -
.
HITS
(, ) (, ). ,
,
, , , .

d j D
a ( d j ) h( d j ) :
a(d j ) =

|D|

i =1, i j

h ( d i ), h( d j ) =

|D|

i =1, i j

a ( d i ).

, HITS LSA.
A , aij , d i d j , . : A = USV T , S si ,i . AT A,

:
T
T
T
2 T
2
A A = VSU USV = VS V , S

118

5.

si2,i . , AAT AAT = US 2U T .


, LSA, , AAT ( AT A ), ( ).
HITS . ,
HITS, , , , . .
. , , (topic drift)
(Tightly-Knit Community, TKC).

HITS PHITS. : D , C
, Z ().
, d D P ( d ).
P( c | z ) P( z | d )
c C , z Z
d D .
:
L( D , C ) =

P(d , c) =

cC ,d D

P( d ) P ( c | d ),

cC ,d D

P( c | d ) = P( c | z )P( z | d ).
zZ

PHITS , P( z ) , P( c | z ) ,
P ( d | z ) , L( D, C ) .
: P ( c | z ) ; P ( d | z ) .

Z , P ( c | z )
z . ,
,
L . , - , PHITS HITS.

5.

119

5.6.2. PageRank
PageRank .
PageRank HITS, ,
.
- PageRank :
-, . -
. .,
, . , -
, URL. , WWW - . , PageRank -
, , .

, PageRank

n {d1 , ..., d n } , (- A ), C ( A) - A
.
, , - D , A , -

120

5.

URL .
- N -
, (URL)
1 ( ). PageRank
PR( A) A , :
n

PR( A) = (1 ) / N +
i =1

PR(di )
.
C (di )

. 30
.
HITS PageRank, ,
() , , ,
.
, , , . .

-
(-) , . , (SEO, Search Engine Optimization), -
.
, ,
- .
5.6.3. Salsa

Salsa (Stochastic Approach for Link-Structure


Analysis ) [108]
. (Sh. Moran) . (R. Lempel) PageRank HITS,
TKC .

5.

121

PageRank Salsa -, . Salsa:


1. v , u , v . v ,
, v u -.
2. u w ,
(u, w) .
- G (. 24 ) Gbip , (. 24 )
Gbip = (Vh , Va , E ) , h , Vh -

- (, ), a , Va - (, ). , .
s G Gbip
sh sa . Salsa
.
Gbip .

. 24. Salsa:

122

5.

Gbip ( Gbip ), ( ).
, .
, t
Gbip (
), Va
Vh , t , .
Salsa ,
: Gbip ( ), Gbip .

, : W G . Wr , W
, Wc ,
W . , H , WrWcT , A ,
, WcTWr .
Salsa A H ,
, ,
Gbip . A H , HITS.
[108] ,
v , :

v = c1 InDegree(v) ,
u :

u = c2 OutDegree(u ) ,
c1 c2 , InDegree OutDegree , .
. . , Salsa
, HITS, , ,
. -

5.

123

,
.
5.6.4.
2005 . . (J. Hirsch) ( h -),
[100].
h ,
h . p h , h
N p h ,

N p h , , h
(. 25).
- , ,
( h ),
h (
[49]).

. 25. h -: 0X
; 0Y

,
.
.

6.

,

.


, . ,
, , . , .

6.1.
, ,
.
6.1.1.
, . (V. Pareto)
, ,
. 1906 , 80 20 .
,

. , -

6.

125

, N = A X p , X , N
, X , A p
. , : X 1, p > 1 . , . . , , .
, 80/20, . , 20 %
, 80 % ,
. : 80 % -
20 % -.

(18481923)

, - , , ,
80 ,
20 .
. ,
x1 , x2 , ... xn , ...
. -

126

6.

x(1) , x(2) , ..., x(r) , ... (

x(r) ).
, N , x( r ) , . . N = r .
:

r=

A
x( r ) p

:
1

A p
x( r ) =
r
n ( n = 1, 2, ..., N )
x(r) , m (n ) :
1

n
A p
C
m (n ) = x(r) = = ,
r =1
r =1 r
r =1 r
n

= 11/ p; C = A1/ p .
(,
n >> 1 ), :
n

m (n )
1

C
C 1

dr
n .
1
r

= m (n ) / m ( N ) = n / N
(. . 26):

= 1 .
,
n , (
) .
, . 26, 0.2
20 % 0.8 80 % ( ).

6.

127

1
0,9

10, 9

0,8
0,7

10, 8

0,6
0,5
0,4

0,3
0,2
0,1

0
0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

. 26. :
= 1 = 0 ( )
= 0.8, = 0.9

6.1.2.

. (G. Zipf) ,
. , ,
,
. .
- , , ,
: f r = c , f
; r ; c
( ).
, ,
0.060.07.

128

6.

(19021950)

, , .
. , [111]
, 11 000 .
(the, and, .), 1 % .
.


, , . . , .
, , , ,
, , :

N( f ) =

B
,
f

N ( f ) ,
f , B .

. , . p ,
(1 p ) , . ,

6.

129

, .

,
[73]. ,
.
, , . . (H. A. Simon) [135]. : n ,
, (n + 1) - :
1. N ( f , n) , f n . ,
(n + 1) - , f
f N ( f , n)
, f .
2. (n + 1) - .
, .
1975 . [131]
.
.
,
. , ( , ) , , .
, .
.
,
( ,
, ).
,
,
. , -

130

6.

, ,
- .
,
,
. :
, , . , . ,
, n
:

(1 p ) n p,
p .
,
, .
,
.
6.1.3.

. (S. Bredford), , ,
: , , .
: ,
, ,
, . . 1934 . [79]:
N3 N 2
=
= const ,
N 2 N1
N1 , N 2 ,
N3 .

6.

131



. ,
[2, 31], -, .
6.1.4.
. .
(H. S. Heaps)
, [98]. , , . , ! , (. 27):

v ( n ) = n ,
v , , n , . 10
100, 0.4 0.6.

132

6.
Heaps (k)

10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
0

5000

10000

15000

20000

25000

30000

35000

. 27. , :
,

, , ,
[96], .

6.2.
( ),
, ,
(. 28):
2

1
2
f ( x) =
e
2
,
2

f ( x) =

1
x 2

(ln x )

22

, x>0

,
,
.

6.

133

. ,
.

. 28. .

. , , , ( ) :
f ( x) =

B
,
x

P ( X x) =

A
,
x

0 < x < ,

= 1,

P ( X x ) , X x , A , .
, . (B. Mandelbrot)
-. -

134

6.

, ,
. ,

P ( X x) =
x

B
dx,
x

P ( x )
= f ( x ).
x

, . (. 29).

. 29. ,

, , ,
: F ( ) = const . ,
dx f ( x ) dx = F ( ) d . , x = a tg ( ) , :
f ( x) =

1 a
,
a2 + x2

< x < .

x f ( x )dx

6.

135

1 , , ,
.
, ,
.
, A / x
. , . , .
, , [39].

6.3.
, , , , II . ,
.
, . , , . , x , :
f ( x ) = g ( ) f ( x )

,
f ( x) = x 3 .
:
f ( x ) = 3 x 3 = 3 f ( x ), g ( ) = 3 .
,
g ( ) , ,
x0 , x .
, x = x0 , :

f ( x ) = f ( x0 ) = g ( ) f ( x0 )
, g ( ) :

g ( ) = p

136

6.

p . :
g () = g ( ) g ()

f ( x1 , x2 ,...) = g ( ) f ( x1 , x2 ,...) .
,
f ( x, y ) = p f ( x, y ) .
, , ( )
f ( a x, b y ) = p f ( x, y ) .
,
a , b ,... .
,
. ,
f ( x, y ) = p f ( x, y )

= 1/ y :
x
f ( x, y ) = f ,1 = y p f ( x, y )
y

x
f ( x, y ) = y p f ,1 .
y
:
x
F =
y

x
f ,1
y

:
x
f ( x, y ) = y p F .
y

6.

137

, f ( x, y ) ,
,
F ( z) .
F ( z ) . , ,
x , , F ( z ) y z = x / y .

, x / y . , f ( a x, b y ) = p f ( x, y ) = 1/ y1/ b , , , :

f ( x, y ) = y F ( z ),

z=

x
y

,
a /b

p
b.

.
u (, T ) ( , T
)
. , u (, T ) . , . , :
u (, T ) = 3 ( / T )

( z ) , , u (, T ) , . , , max , u (, T )
( max T , ).
:

( N , p ) p ( Np ) ,
( N , p ) ,
() ,
, ( pN ) . , :

138

6.

( z << 1) x ( z >> 1) ln x
,
.
().

6.4.
, ,
. ,
, ( ,
, , ), .
. ,
: ,
() (). . ,
() (). (,
). (
Tc = 3600 C ) , , . .
. Tc ,
. ,
, .
.
[52] . , , p pc ,
. , , .
, ,
. , .
. I
, . , 00 C .
( -

6.

139

), . II
.
, . , .
II , . . , ( ) .
, , , = 0 , , , 0 .
.
, G ( , T ,...) .
. (,
. .) . , . .
:

G
= 0,

2G
>0.
2

,
. , .

(19081968)

140

6.

. .
= 0 . , G ( , T , ...) :

G ( , T ) = G0 (T ) + A(T ) 2 + B 4 + ... ,
= 0 (, , ) 0 ( ).
G / = 0 :

2 A + B 3 = 0 ,
= 0 = A / 2 B 0 .
T > Tc = 0 , T < Tc
0 . , T > Tc = 0 A > 0 . .
T < Tc , . .
A < 0 . : A > 0 T > Tc , A < 0 T < Tc , A(Tc ) = 0
.
A , ,
A = (T Tc ) .

2 = A / 2 B (T Tc ) ,
Tc T , G ( , T )
G ( , T ) = G0 (T ) + (T Tc ) 2 + B 4 + ... .

. 30 G ( , T ) T > Tc T < Tc .
II
G ( , T ) , (
, ). S C
S =

G
,
T

C = T

S .
T

6.

. 30. G ( , T ) T > Tc T < Tc

141

7.

, ! !
, ! !
, , ! !

, 40-
. (C. E. Shannon) [62].
, , .
, , , , , , . . , , ,
, ,
, .
, . , ( . .
. . ). ,
, . ,
. ,
, . , 100.
:
,
( N = 1 ). -

7.

143

( N = 100 ). ,
. , , . , , 100
50 N 1029 . ,
,
( 1029 ) . , , 1024 ( ), .

, ,
:

S = k ln N ,
k , .

pi = p = 1/ N = const :
N

i =1

i =1

S = k ln N = k p ln N = k p ln p .
, ,
:
N

S = k pi ln pi .
i =1

k k = 1/ ln 2
( !), :
N

S = pi log 2 pi ,
i =1

, . .

. , ,
, . .

144

7.

,
.
(mutual information), , , [111], -
.
c , , , t .
:

I (t , c ) = log

P (t , c )
,
P(t ) P( c )

P(t , c ) t c ; P (t ) t , P ( c ) c .
, t c .
. , ,
Autonomy IDOL (Intelligent Data Operating
Layer), , , .
IDOL,
, . , , , IDOL .
IDOL, , , .
.


.

7.

145

7.1.
U = {u1 , ..., uN }, :
N

H (U ) = K pi log 2 pi ,
i =1

pi ui , K .

(19162001)

, ,
:
N

i =1

i =1

H (U ) = pi log 2 pi =

1
1
1
log 2 = log 2 = log 2 N ,
N
N
N

, ,
.

N ( ).
, , n u1 , u2 , ..., un . :

7.

146

u10
1

u5
2

u21
3

u3
N

, .
i ui ( i = 1, 2, ..., n ) pi , .
N ui Npi . , p , Np1 u1 , Np2
u2 . . ( ), :

p = p1Np1 p2 Np2 ... pn Npn .



:
n

log 2 p = N pi log 2 pi .
i =1

N , , :
n

S = pi log 2 pi ,
i =1

N
, :

p = 2 NS .
( p ),
K :
K=

1
= 2 NS .
p

, ( )
,
( u1 p1 , u2 p2
. .).
, , ,
. , -

7.

{p ,
1

147

p2 , ..., pn } .

K ({ p1 , p2 ,... pn } ) = 2 ( 1 2 n ) .
, .
{1, 0, ..., 0} ( 1 u1 )
NS

{p ,p

,... p

, . .
( u1, u1, ..., u1 ) . (
0log 0 = 0 ) :
n

S = pi log 2 pi = 1 log 2 1 0 log 2 0 ... = 0 .


i =1

, ,
, , () .
, , ,
pi = 1/ n {1/ n, 1/ n, ..., 1/ n}

:
n

S = pi log 2 pi = log 2 n .
i =1

N ,
N / n ,
NS {1/ n ,1/ n ,...1/ n} )
K ({1/ n,1/ n,...1/ n} ) = 2 (
= 2 N log 2 n = n N .

7.2.
,
, :
1. [0, 1].
2. .
3. , , . . .
, p 1 p,
(. 31).

7.

148
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,0
0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

. 31.
( p , H)

H (U ) = [ p log p + (1 p ) log(1 p )] .

(. 32), p, q,1 p q ,
:

H (U ) = [ p log p + q log q + (1 p q) log(1 p q) ] .


,
,
.
. , , , , 0.9 0.1
[17].

7.

149

. 32.
( OX p, OY q, OZ )

7.3.
U V ,
p (U ,V ) = p (ui , v j ) , (i = 1, ..., N ; j = 1, ..., M ) .

:
p (ui , v j ) = p(ui ) p ( v j / ui ) = p ( v j ) p(ui / v j ) , , :

7.

150
N

H (U ,V ) =
i =1

p(uiv j )log p(uiv j ) =


j =1

i =1

i =1

i =1

j =1

p(u ) p(v
i

j =1

/ ui )log[ p(ui ) p(v j / ui )] =

= p(ui )log p(ui ) p(ui ) p(v j / ui )log p(v j / ui ).

H ui (V ) V -

ui U :
M

H ui (V ) = p( v j / ui ) log p( v j / ui ).
j =1

, , V U
V U :
N

j =1

i =1

j =1

HU (V ) = p (ui ) H ui (V ) = p (ui ) p (v j / ui )log p (v j / ui ).


:
1. V U :

H (UV ) = H (U ) + HU (V ),
H (UV ) = H (V ) + HV (U ).
2.
:
HU (V ) H (V ),
HV (U ) H (U ).
3.
U V :
HU (V ) = H (V ),
HV (U ) = H (U ).

7.

151

7.4.
,
.

,
.
.
U
p (u ) n u .
:

p (ui u ui + u ) =

ui +u

p(u )du p (ui ) u.

ui

U U n , p (ui u ui + u ), U :
n

i =1

i =1

i =1

H (U ) = p (ui )u log [ p(ui )u ] = p(ui )u log p(ui ) p (ui )u log u.


n

p(u )u 1, :
i =1

H (U ) p(ui )u log p(ui ) log u.


i =1

, u 0
, :

H (U ) = lim H (U ) = p (u )log p (u )du lim log u.


u 0

u 0

- H (U )
(
). , [9]. :

7.

152

h(U ) = p(u ) log p(u )du.


.
.
1. , U . U k , uk = ku, ,
p (uk ) = p(u ) / k , k :

h(U ) = p (uk )log p (uk )duk =

p (u )
p(u )
log
kdu = h(u ) + log k .
k
k

, h(U ) . ,
,
.
2. , :

h(UV ) = h(U ) + hU (V ) = h(V ) + hv (U ),


hU (V ), hv (U ) .
3.
:

p ( x) =

1
e
2

( xm )
2 2

hmax (U ) = log 2 2 e.
,
m.

7.5.
Z Z = {z1 , ..., z N }. ,

7.

153

,
W = {w1 , ..., wN }.
,
w j
:
N

H w j ( Z ) = p( zi / w j ) log p( zi / w j ).
i =1

j:
N

HW ( Z ) = p ( w j ) H w j ( Z ).
i =1

,
, :

I ( ZW ) = H ( Z ) HW ( Z ).
, :
N

I ( ZW ) =
i =1

p( zi w j )

j =1

p( zi ) p( w j )

p( zi w j ) log

:
1. . ,

H ( Z ) HW ( Z ) I ( ZW ) = H ( Z ) HW ( Z ) 0.
2. Z W :

H ( Z ) = HW ( Z ) I ( ZW ) = 0.
3. I ( ZW ) = I (WZ ). :

I ( ZW ) = H ( Z ) HW ( Z ) = H ( ZW ),
I (WZ ) = H (W ) H Z (W ) = H (WZ ).
H ( ZW ) = H (WZ ).
4. Z W :
I ( ZW ) = H ( Z ).

154

7.

7.6.

, , . ,
.
V , N , U , M (). , , ,
.

HU (V ) , , V ,
U . :
M

HU (V ) = P(u j ) H u j (V ).
j =1


H ui (V ) , :
M

HU (V ) = P ( vk , u j ) log( P ( vk , u j ) / P (u j )).
j =1 k =1

V U :

I (V ,U ) = H (V ) HU (V ).
,
(, ). I (V ,U ) ,
V U .
:

I (V ,U ) = H (V ) HU (V ) =

7.
N

155

= P( vk ) log P ( vk ) + P( vk , u j ) log( P( vk , u j ) / P (u j )) =
k =1

k =1 j =1

= P( vk , u j ) log P( vk ) + P( vk , u j ) log( P( vk , u j ) / P (u j )) =
k =1 j =1

k =1 j =1

= P ( vk , u j ) log
k =1 j =1

P ( vk , u j )
P ( vk ) P ( u j )

:
1. I (V ,U ) 0 , ,
V U .
2. I (V ,U ) = I (U ,V ) , . . U V , V U .
I (V ,U ) = H (U ) HV (U ) .
3. I (V ,U ) H (V ), I (V ,U ) H (U ) , ,
U V .
4. I (V ,V ) = H (V ) ,
V .

8.

,
.

, (complex
networks) [116], ,
, , , , , . . ,
.
,
, , ,
.
, ( ),
, .
1954 . (J. Barnes)
.
XX , , , .
, (Social Network
Analysis, SNA). ,
, , ,
, WWW.
, , [87].

8.

157

8.1.
: , ;
;
.
, , , . .
; ; .
8.1.1.
:
, ;
,
;
;
;
(eccentricity) ( )
;
(betwetnness), , ;

.
8.1.2.
, :
, , , ,
, ,
,
. .
,
:
. , , ;

158

8.

( ), ;
. ,
;
(
).
8.1.3.
P ( k ), , i
ki = k . , P( k ) , . P( k )
( P (k ) = e m m k k !, m

),

( P(k ) = e k / m )

( P (k ) ~ 1 k , k 0, > 0 ).
(scale-free).
.
,
.
8.1.4.

,
,
. , . dij . ,
:

l=

2
dij ,
n(n + 1) i j

n , dij i
j.
. (P. Erds) . (A. Rnyi) ,
[88, 89].

8.

159

(19131996)

. ,
, , . , , ,
500. , , : ;
; , ,
; .
,
. , 90 % 8, , .
, . . , . ,
. , :

il =

2
1
.

n( n 1) i > j d ij

,
dij .

160

8.

8.1.5.
. (D. Watts) . (S. Strogatz) 1998
, [147], .
,
(clique). , ,
.

. k ,
k , . ,
, 1
k ( k 1). ,
2
, . ,
(,
) i C (i ). , .

. .

,
. , . .

8.

161

8.1.6.
(betweenness) , , .
. . bm m :

bm =
i j

B (i, m, j )
,
B(i, j )

B(i, j ) i j ,
B(i, m, j ) i j , m .
8.1.7.


.
, . . .
, . , .
. (Rka Albert) ,
- ,
WWW 326 000 [65].
,
, ( ). . , ,
.
8.1.8.

,
, ,
, . . , ,
. . , .

162

8.

8.2.
. . , , .
, . , , ,
, ,
[77].
4,6
, 20 %
. , ,
.
4,6 7
, . . ,
18 . ,
.
, ( 18 ) . , . ,
(. 33). ,
, . ,
-,
.

8.

163

. 33. : ) ;
) , ;
) , :

8.3.
,
( WWW, )
. 1967 . .
, , ,
[113].
. . ,
, (Small Worlds) [146].
, . . 34:
, , ( ) ,
.
. 35 . .
(
).

164

8.

. 34. -

, , ,
.
WWW ,
. , (S. Zhou)
. . (R. J. Mondragon) , ,
, ,
, .
(rich-club phenomenon). , 27 %
5 % , 60 % 95 %
5 % 13 % , 5 %.

8.

165

. 35.

( 0 )

, WWW
, , . .
. , , . ,
, .
, ,
, ,
, , . , , (entangled
networks). , . , , , ,
.

166

8.

8.4. WWW
8.4.1. WWW
, WWW,
, , . , , , ,
, . WWW , ,
SNA. ,
,
. , -.
1999 . . (A. Broder) IBM
AltaVista, IBM Compaq - [83], - (Bow Tie, . 36).
AltaVista 200
- , .

8.

167

. 36. - Bow Tie

- :
(28 % -)
(Strongly Connected Component, SCC), , , ,
;
22 % - - (IN).
, ,
;
22 % - (OUT), , ;
22 % - : , , , , .
, 90 % -,
.
, .
. ,
- .
,
,
-. , ,

168

8.

- Bow Tie . , , -,
[78].
Bow Tie, , , . ,
( ) ( edu 325729) , . . ,
i , 1 i k (
k 2.1 , k 2.45 ). , , WWW
, 11
, 0.15 (
0.0002).
-
, - ,
, , 16. , .
, ,
- .
. ,
, IBM
(N. Lamour) [32].
- , ,
, , .
-,
, . , , - .

- , -
AltaVista, -,
-, .
.
.

8.

169

. (L. Bjrneborn)
, . , . , .
,
WWW.
8.4.2.
-, , ,
. .
-,
, ,
,
( ) , . , . -, -,
:
;
, . . -, ;
,
,
- (
, );
, ;
-, , - .
-
InfoStream [31],
-. 2500 , , :

170

8.

< >< >.


, ,
-,
( ). ,
484 945 2323 -.
-, . 1459 ( ). , 100 80 % .
,
-:
Web-

Reuters
-

BBC

-
1051
983
882
787
773
675
662
631
623
598
595

, -,
,
Salsa [108].

log( N OUT + 1), log( N IN + 1) , N OUT , N IN
(. 37).


(. 38).

log (NOUT + 1)

8.

171

10
9
8
7
6
5
4
3
2
1
1

5
log (NIN + 1)

. 37.

172

8.

. 38.

-, .
.
, -, .

8.5.
. , ,
.
:
;

;

8.

173


.
, , TouchGraph Amazon ,
( , , ).

. TouchGraph, , Livejournal
TouchGraph LiveJournal Browser.
WWW TouchGraph Google Browser
(http://www.touchgraph.com/TGGoogleBrowser.html) , , . Google Browser Java-, -,
Google. ,
( Google) , . TouchGraph Google Browser , (. 39).

174

8.

. 39. - ( TouchGraph)

NetVis (http://www.netvis.org), online- .

/ InFlow ( 3.1 http://www.orgnet.com/ inflow3.html) UCINET (http://www.analytictech.com/ ucinet/ucinet.htm)


NetDraw.

9.

- . .
, : , ?
.
?

9.1.

, , () . .
( . percolation , ) 1957 . . .
(S. R. Broadbent) . . (J. M. Hammersley) [81]. ( ,
, , ...), ,
. , , ,
[132].

176

9.

. . (19202004)

,
[48].
.
, ( p ) , , . pc ,
.
, .
p = 0 .
, p = pc
, ,
, . ,
, ,
.
.
, . . , , ,
. , , , ,
pc ,
. 39.

9.

177

R ~ 1/ G . . 40, ,
p
.

. 40.
( )

, , , , . , . , pc , ,
. pc
.
, , ,
, ;
;
. .
,
,
( ), (, ) , , , .

9.

178

9.2.
,
, , ( ).
:
P ( p ) ,
,
, ;
s s2ns ( p )
S ( p) =
( . mean cluster
s
n
p
(
)
s
s

size), ns s ,
,
;
( p ) ,
. G (r , p pc ) G ( ri rj , p ) = g (ri , rj ) ,
g (ri , rj ) i j ,
, , ,
. r G ( r, p ) , ,
G ( r, p ) exp (r / ( p )) .
, p pc ( p )
.
( p pc ), ,
, , .

, . ,
, , ,
d f ( d = 2) = d / 1.896, d f ( d = 3) 2.54.

9.

179

(. 41)
( . backbone) , , ( . dead
ends), .

,
( , 1/ h , 0 < h << 1 ).
h
II , . .

. 41. : ) ;
) ( )

, ,
, = (T Tc ) / Tc , T , Tc
, . , Tc ,
( ). Tc (. 41):
~ h | | , > 0, (T > Tc ), > 0;
~ | | , < 0, (T < Tc ), > 0,
, , h
( h << 1). -

180

9.

~ h | | , > 0, (T > Tc ), > 0;


~| | , < 0, (T < Tc ), > 0.
h . ,
h = 0, Tc
. . 42, Tc . Tc () h .

. 42.

h ,
Tc , , (h << h ) . ,
( h >> h ) .
h
, :
1

= h f ( / h ),

= (T Tc ) / Tc .
1

f ( z ) z = / h
:

z ,

f ( z ) ~ const ,

| z | ,

z +,
z 0,
z .

9.

181


II .
e , .
e ,
= ( p pc ) / pc , pc .
e
:
t

e = h f ( / h ),

z t ,
z +,

f ( z ) ~ const ,
z 0,

| z |q ,
z .
t q , = t + q.
, p
T p
(. 43).

. 43.

, e
, . . | |<< 1.

182

9.

9.3.
. ,
, , .
.
- f ( ) = p (1 ) + (1 p ) (2 ) .

(. 44). ,
min / max << 1 . .

. 44.


( r ~ 1/ ) ( , ):
r = r0 e x , >> 1 ,

r , x (0,1) ,
f ( x ) . , x rmin = r0 exp ( )
rmax = r0 .
>> 1 : rmax >> rmin max >> min . , ,
r = r0 e x -.
e
-

9.

183

, , . , , ,
. , ,
.
, , , ( rmin ). .
rc = r0 exp ( xc )
( r < rc ) .
, , , . , , , rc .
rc : :
1

f ( x ) dx = pc

xc

,
f ( x ) = 1,
x (0,1) xc = 1 pc . , :
(1 pc )

rc = r0 e xc = r0 e

e = 0 e x = 0 e (1 p ) .
c

, ( ) , , , ( ), . ,
.

. . (

184

9.

) . , , , .

9.4.
( ) , ,
WWW . . 45
( . fully directed) . .
.

. , , , . 46
OX . OX , || , ,
.

. 45.

-
, , . OX

9.

185

; , .
, , , . 47. p+ p
OX ( , ), p
. . .
: q = 1 p p+ p .

. 46.

. 48. , , p , , q , A
. . q p , q p
pc - .
q p+ fully directed . , q p+ q p . p+ p ,

186

9.

fully directed , .

. 47.

. 48.

9.

187

9.5.

, , . .
(giant connected component).
() N , pc 1/ N , . .
k 1 .
. (M. Newman) . [117] . .
[146] , shortcuts,
, ,
.
( ):

( kd )

1/ d

(shortcuts), N
, N , k , d .
(shortcuts).
l (, )
:
l=

f (z

1) const ,

1/ ( k )

1/ d

N N
f
,
k

f (z

1)

log z
.
z

z = N / 1 , N 1/ d 1, . .
(shortcuts) . :

l=

N N
f
N,
k

9.

188

. . , .
z = N / 1 , . . N 1/ d 1, :
l=

N N
1/ d
f log N ( kd ) log N ,

, , .
.
. ,
q = 1 p , . . (, ,
)? qc = 1 pc , (giant connected component), . . ( )
? [117] , pc ( Npc
) (shortcuts) (. 48)

(1 pc )
.
=
k
2kpc 1 + kpc (1 pc )

. 49 pc
, . ,
, ,
.
,
P ( k ) k ,
. pc
. , , > 3 :
qc = 1 pc = 1

1
,
2
k 1
3 0

k0 k K 0 .

9.

0,8

0,8

(pc, 1)

0,6
0,4
0,2

189
(pc, 1)

(pc, 2)

0,6
(pc, 2)

0,4

(pc, 10)

0,2

0
0,2

0,4
)

0,6

0,8

0 -4
10

(pc, 10)
-3

10

0,01

0,1

. 49. pc
: , .

k0 = 0 qc qc = 4 .
, () 4 , . . , .
,
[85].

9.6.
, [85],
, . , .
, (backbone) . ,
. , . .
, , . ,
.
, (, ) q .
, ,
, ,
, . () -

190

9.

. ,
DDos- [136].
1 % .
( . directed) .
()
,
.
- () WWW , HTTP
, , -,
. , Google, - , . -
(, -) , -
- ( -). , -, -, ,
, .
-, [141].

10.

,
[10]. , .
.

10.1.
, , , , ,
( ), , . . t , ,
:

y (t ) = y (t0 ) + v (t t0 ),
t0 , y (t ) t , v () .
(t ) , :

192

10.

(tn ) =

1
n

y (t ) ( y(t ) + v(t
i

i =0

t0 ) ) .
2


,
. .
, : (t ) t ( 1 2 1),
,
.

10.2.
( )
, :
y (t ) = y (t0 )e ( t t0 ) ,

.
,
t0 , ..., tn , . :
y (ti ) = y (t0 )e ( ti t0 ) = y (t0 ) e ( ti ti 1 +ti 1 t0 ) = y (ti 1 ) e ( ti ti 1 ) .
:
y (ti )
= e ( ti ti 1 ) .
y (ti 1 )
: (ti )
ti :

(ti ) = (ti ti 1 )
:

10.

(ti ) = ln

193

y (ti )
.
y (ti 1 )

ti :

(ti ) = ln

y (ti )
y (ti ) y (ti 1 )

.
y (ti 1 )
y (ti 1 )

(ti ) :
( tn ) =

1
n

( (t ) )
i =0

, (t )
,
, . : (t ) t , , 1.
, ,
. , .

10.3.
[5, 6, 11]
, y (t )
:
dy (t )
= ky (t ) ,
dt

k .
, y (t ) , [4, 5]. :
N ry (t ),

194

10.

N , y (t ) ,
r , .
, ,
( ) :

dy (t )
dt = ky (t )( N ry (t )),

y (t ) = y .
0
0

, , .
,
t = 0 n0 . , (
) , : 0 < t D > 0 t > D = 0 (
D = const ) , , u(t ) v (t ), :
u (t ), 0 < t < ,

y (t ) = v(t ), t > ,

u (t ) = v(t ), t = .

( D > 0 ) , , .
: .
.
,
N,
u (t ) :
du(t )
= pu(t )(1 qu(t )) + Du(t ),
dt
u(0) = n0 .

10.

195

, p
.
(
).
D .
q , u (t ) D = 0.
, v(t ) , , :
dv(t )
= pv(t )(1 qv(t )).
dt

u (t ) v(t )
t = :
v( ) = u ( ).
:
y = ay 2 + by ,

z = 1/ y :
z + bz + a = 0 .

:
z=

1
[C a ( x)dx]
( x)

:
( x ) = ebx .

C : , . :
u (t ) =

us

u
1 + ( s 1)exp[( p + D )(t )]
n0

196

10.

us u , :
us =

p+D
.
pq

, , , S- () , . 50.
5
4
3
2
1
0

. 50. u(t )

, n0 ,
.
,
, , .
, . 50 :

tinf =

1
u
ln( s 1) + .
p + D n0

, S-
, t ~ tinf u(t ) .
u(t )
:

u (t ) =

us exp[( p + D )t ]
,
u
exp[( p + D )t ] + ( s 1) exp[( p + D ) ]
n0

10.

197

t<

1
u
ln( s 1) + = tinf
p + D n0

u(t ) , . . t,
tinf , .
, , (. 51):
v (t ) =

v ( )
,
qv ( ) + (1 qv ( )) exp[ p(t )]

:
v( ) = u ( ).
u (t )
t < , , :
v (t ) =

vs ( p + D )
,
p + D (1 exp[ p (t )])

vs = 1/ q v (t ) .
5
4
3
2
1
1,5

2,5

. 51. v (t )

, vs , u (t ) .
, us ( t )
vs , . , , ,
,

198

10.

, .
, .
y (t ) . 52.
, ,
, ,
.
.
, ,
. , , -, , ,
, . .
.
5
4
3
2
1
0

. 52.

:
T Nm

0 i =1

mi (t )dt = MT ,

mi (t ) i - , N m , M , .
,
. ,
-

10.

199

,
.


.
:
Nm
dmi (t )
= pi mi (t ) rij mi (t ) m j (t ),
dt
j =1

N m .
, . () , .
-, , pi
rij ,
.
:

dmi (t )
= ( pi + Di (t , i ) ) mi (t ) rij mi (t ) m j (t ) .
dt
j

pi Di ,
, i , Di
.

10.4.
, .
, , ,
, .
, , ,
(). , , , .
, ,
.

200

10.


. (J. Von Neumann) [41] . (S. Wolfram)
[149].

. (19031957) .


[53]. , ,
,
, .
,
.

.
, (), . : , .
.
, , , .

, , . ,
, ,
.

10.

201


, , . () . , .
j
. j - t + 1 , , :
y j (t + 1) = F ( y j , O( j ), t ) ,

F , , ,
. ,
, . . y j O ( j ) , : y j (t + 1) = F (O ( j ), t ) . -

:
(
);
, . . ;
;
.
,
.
, , ,
. . , .
, ,
, yi , j ,
,

:
yi 1, j , yi , j 1 , yi , j , yi , j +1 , yi +1, j ),

( (G. Moore):
yi 1, j 1 , yi 1, j , yi 1, j +1 , yi , j1 , yi , j , yi , j+1 , yi +1, j 1 , yi +1, j , yi +1, j+1 ).
.
(. 53),

202

10.

. . ,
. .

. 53. . : wikimedia.org


t + 1 t :
yi , j (t + 1) = F ( yi 1, j 1 (t ), yi 1, j (t ), yi 1, j +1 (t ), yi , j 1 (t ), yi , j (t ), yi , j +1 (t ), yi +1, j 1 (t ), yi +1, j (t ), yi +1, j +1 (t ) ) .

. , ,
, . , . , .
(J. Conway)
M. (M. Gardner) [12].
[76]. ,
,
[43].
,
t + 1 . t (
).
20 000 . , , .

10.

203


[76].
: , , . : 1
; 0 . , ,
, ( 1, ).
, ,
. . ( )
m , p (
) pm > R, ( R ), ( 1).
, , .
,
. , , , ,
, :
1 ( ); 2 , , ( ); 3
, ( , ). :
, , (. 54 );
( );
, ,
: pm > 1 ( m 2 : 1.5 pm > 1 );
, ,
( ,
);
, ,
(
).

10.

204



.
40 40 (
). ,
, . . .
. 54.
,

http://edu.infostream.ua/newsk.pl ,
80 150 .

. 54.
: ;
;

,
. 55.

10.

205

: 1 ,

, 2 ,
: 3:1:0; .

. 55.
: (); (); ()

[107].
, :
x g = f ( t , g , g ) ,

t ( ), g ,
, g .
, xw ( t ) x g :

x w = 1 f ( t , w , w ) .

206

10.

, , , . . , :

x g + xw + xb = 1,
xw t .
, :
xb = 1 xg xw = f ( t , w , w ) f ( t , g , g ) .

. 55 ,
f ( t , , ) ( ):
f ( t , , ) =

C
,
1 + e ( t )

C .
. 56 xg , xw , xb , .

10.

207

. 56. ,
, :
( xg ); ( xw ); ( xb )

, ,
, - (), .

10.5.
, , ,
. , ,
, .
, ,
,

208

10.


. - , .

, ,
.
(self-organizing), , . (V. Ashby) 1947 , . (N. Winner),
. (G. Forster) .
. (Per Bak) [7, 69]. 19871988 . ,
. (C. Tang) . (K. Wiesenfeld) [70, 71]
,
,
.
, , , ,
.

(19482002)

, , , . , ,
,

. -

10.

209

, , , ,
.
,
, , .
, , . , . , ,
. . - [69], . . (H. Yager)
, ,
. (J. Feder) . (T. Joessang)
, .
,
,
, , .
,
, . , ,
(. 57).

. 57.


,
, .
, . h( x ) x ( x = 1, 2, ..., N ) .

10.

210

(. 58 ) h = h ( x ) z ( x ) = h ( x ) h ( x + 1) , . 58 .
. 58 .
1: x
z ( x ) > zc ,
. zc = 2
:

z ( x) z ( x) 2,
z ( x 1) z ( x 1) + 1 z ( x )>2

( 1)

, h ( x ) x
, 2 , ( ) .
:
z (1) = 0,
z ( N ) z ( N ) 1,
z ( N 1) z ( N 1) + 1 z ( N )>2

( 1)

1 ,
, ,
.
. 58 h ( x )
z ( x ) , . , , x = 6

z ( 6 ) = 3 > 2 .
, x = 5 x = 7
. 58 .
1,
, , , x t + 1 /
t . 1
, . 58.
, , 1 1 z ( x ) 2 ,
z ( x ) = 2 x . ,

10.

211

,
, z ( x ) < 2 . , . , .
17 17

h(x)
16

17 17

h(x)
16
14

14

12

12 12
9

11

10
8

6
4

z(x)

9 10 11

3
2

1
0

2
1

9 10 11

z(x)

2
1

. 58. . 6
11

.
( 2 2). 1 x y , zc 3:

z ( x, y ) z ( x, y ) 4,
z ( x, y 1) z ( x, y 1) + 1,
z ( x 1, y ) z ( x 1, y ) + 1

( 2)
z ( x , y )>3

z (0, y ) = z ( x,0) = z ( N + 1, y ) = z ( x, N + 1) = 0

( 2)

2 . . 2 ,
, , . , , ,

212

10.

,
.
.
z ( x, y ) , 1D, z ( x, y )

x y .
, z ( x, y ) 4 ,
. ,
, , ,
.
.
z ( x, y ) 4 , z ( x, y ) = 0 2 , . , .
,
.
, ,
.
z ( x, y ) 4 , .
. 59 500 500,
z ( x, y ) 0 3.

10.

213

. 59.

( z ( x, y ) = 3 ), 2 2,
z ( x, y ) = 4 , . . , , . x , y , , , . . 60
, . , . 59. ,
.
D ( s )
. , ,
z ( x, y ) = 3 z ( x, y ) = 4 ,
s ,
, z ( x, y ) 4 . . 61 a
D ( s ) , 500 500.
:

D ( s ) s , 2 D 1,1.

214

10.

(4354, 100578)

(696, 25340)

. 60. ,


, D(t )
, . s
, t ,
.
:
D(t ) t , 2 D 0.54.

, ,
, .
,
, . ,
(. . 11.3),
.

.

10.

10
10
10
10
10
10

-1

215

D(s)

= -1,085 0,015

-2

-3

-4

-5

-6

S
0

10

10

10

10

D(t)
10

-3

= 0,54 0,02
10

10

10

-4

-5

-6

t
0

10

10

10

10

. 61. () D(s) () D(t)

11.

,
.

11.1.
(B. Mandelbrot)
1975
. , , : , , - [37]. ,
, , .
, , , .

11.

217

.
. (J. Hutchinson) 1981 . , ,
, . . , .
, . , , , , .
,
d , , ,
1 ( d = 1 ),
d = 2 , . . . , . (F. Hausdorf) . . .

, ,

, , . .
, . , , , .

(18681942)

218

11.

G d .
d ,
, . . i < .
d
:

ld ( ) = id .
i

Ld ( ) = inf
i,


i<

d
i

, d , , :
lim Ld ( ) 0.
0

d :
lim Ld ( ) .
0

, d x , :
0, d > d x ,
lim Ld ( ) =
0
, d < d x ,
(
). ( d x =1,
d x = 2, d x = 3 . .)
, ,
. d c . . , , :

Ld ( ) = N ( ) dc ,

11.

219

N ( ) , G .
( 0 ) :
d c = lim
0

log N ( )
,
log

. ,
d c , , : d x d c . , [13].

.

N ( ) , N ( ') ' . :

N ( ) ~

N ( ') ~

1 ,

'd

dc :

dc =

log ( N ( ) N ( ') )
.
log ( ')

11.2.
, .
,
(H. Von Koch) (. 62), ,
1. . , ,
1/3 ,
1/3
.

11.

220

. , n-
n, ,
n, , .
, . , 4 , 3 , . . n- n
3 ( 4 3 ) n .

. 62. 5

, , :
k

1 4
S = 1 + = 1,6.
3 k =0 9
.
. , .
: = 1, N ( ) = 3 . : ' = 1/ 3, N ( ') = 12 . :

dc =

log ( N ( ) N ( ') )
log ( 3/12 ) log 4
=
=
~ 1, 26 .
log ( ')
log ( 3)
log3

, 1915 . .
(V. Serpinski), . , , (. 63). .

. ,
,

11.

221

. , , .
, :

dc =

log ( N ( ) N ( ') ) log 3


=
~ 1,58 .
log ( ')
log 2

. 63.

, .
, .
,
. . ,
:
2
3
1
1 3 1 3 3 1
1 3 3 3
1
S = + + + = 1 + + + + =
= 1.

4 4 4 4 4 4
4 4 4 4
4
1
3/
4

, .
(. 64)
:
Z i +1 = Z i 2 + C , i = 0, 1, 2, ...
Z i +1 , Z i C .
C
. , Z i , (0, 0),
.
, Z i , C . Z i
, .

222

11.

, . ,
, ,
.

. 64.

80- (Iterated Functions System IFS),


. IFS ,
. IFS
:

X ' = AX + BY + C ,
Y ' = DX + EY + F ,
X , Y , X ', Y ' ,
A, B , C , D, E F .
IFS
, , Java-,
http://www.fractals.nsu.ru/fractals.chat.ru/ifs2.htm (. 65).
IFS , ,
( , ).
80- . (M. Barnsley) . (A. Sloane) ,
.

11.

223

, 500
1000 . .

. 65. : ) ;
) ; )

, ,
, ,
. ,
.

224

11.

. , , .
, L l ,
,
L = l 1 , = const. , , (. 66) 1.24 .

11.3.

, , , . ,
, , .
, . ,
, , .
-
. [131] [31].
, WWW . IBM Altavista [82].

11.

225

. 66.
(http://maps.google.com)

. , ,

, (, , , , )
[22, 23].
, [145, 20]. , - ,
, . , ,
,
(,
).

, ?
, , , . -

226

11.

.

-, , . , ,
, .
, -
News Is Free (http://newsisfree.com).

.
(. 67).
-
,
, .
. [2023],
, ,
().

. 67. (http://newsisfree.com)

11.

227

228

11.

. 67.
(http://newsisfree.com)

, ,
, , , , . , ,
, .
, ,
, . .
,
,
[20]:

N publ ( t ) = N k (t ) ,
N publ ( ); N k ( ); ; .

.

11.4.
(, ,
, . 67).
.
, , ,
, ,
, . .
,
. . , , -

11.

229

- -
InfoStream.
. .
, 14 069 ,
1 2006 . 31 2007 ., , :
OR OR
( AND ( OR OR Windows OR Linux)).

(. 68).

, , , .
11.4.1. DFA
DFA (Detrended Fluctuation Analysis) [121] .

. 68.
( ) ( )

230

11.

DFA , .
.
( F

Fn , n = 1, ..., N ) y (k ) :
k

y ( k ) = Fi F
i =1

y (k ) , k = 1, ..., N n, ,
y (k ) .
yn (k ) ( yn (k ) = ak + b )
.
D(n) n:

D (n ) =

1
N

[ y (k ) y
k =1

(k ) ] .
2

, D (n ) D (n) ~ n ,
. .
ln D ~ ln n , .
. 69, D (n ) n , . . .
11.4.2.

X t ( , , ,
t , t = 1, ..., N ), :
F (k ) =

N k
1
N k

(X

k +t

m)( X t m),

t =1

m , ,
, 0 ( -

11.

231

t t m). , X
.

. 69. D(n) ( )
n ( )

, ,
,

.
, . . :

a0
X t = + an cos(nt + n ),
2 n =1
:

a02 1 2
F (k ) = + an cos nk .
4 2 n=1
[59] , ,
, n .

11.

232

X , N S :

X t = N t + St .
(
m = 0 ):

F (k ) =
=
=

N k
1
N k

N k
1
N k

( N

k +t

Xt =

+ Sk +t )( N t + St ) =

t =1

N k
1
N k

k +t

t =1

N
t =1

k +t

Nt +

N k
1
N k

S
t =1

k +t

St +

N k
1
N k

k +t

St +

t =1

N k
1
N k

k +t

Nt .

t =1

, , . N S
, . , S . X .

, . , ( ), ,
.
, .

:

y = ae0.001x + sin( x / 7 + a),


( ), , a
. y . . 70 ( x
, y ).

11.

233

. 70.

: ( ).
, X
N :

R(k ) =

F (k )
,
2

F (k ) ; 2 .
. 71 (
k, R(k).
, -
(. 72).
, , ,
(. 73).

234

11.

. 71.

11.

235

. 72. R ( k ) ( )
k ( )

236

11.

. 73. R ( k ) ( ),
k ( )

11.4.3.

(IDC), (U. Fano) [90].


( ) k :
F ( k ) = 2 ( k ) / m( k ).

:
F ( k ) = 1 + Ck 2 H 1 ,

C H . . 73 F (k ) , C 6.8 H 0.65 .

11.

237

. 74.

11.4.4.
(H. E. Hurst) H R / S , R
, S [102]. . . (18801978) ,
: R / S = ( N / 2) H . [58] ,
D :
D =2H .

, , . : ,
, ; ,
, ,
, . . . . ,

238

11.

,
, .
, , , . ,
, . .
, ( ). H > , , ,
. H < , , . H = .
F ( n ) ,
n = 1, ..., N , ,
, :
R / S = ( N / 2) H , N >> 1 .

S :
S=
F

1
N

( F (n)

1
N

n =1

),
2

F (n),
n =1

R :

R( N ) = max
X ( n, N ) min
X ( n, N ),
1 n N
1 n N

X ( n , N ) = ( F (i ) F
n

i =1

).

,
, , n H 0.65 0.75. , H -

11.

239

, ( , ). , F (n ) ( ),
D ,
D = 2 H 1.35 1.25.
,
-. , , . . , .

.
, , , .
.

11.5.
, , [58].
[8, 15, 24, 42] .
, (, ) . (Y. Kantor),
. , [ 0, 1] ,
100 % (. 75 ).

11.

240
)

. 75. ( ):
; ;
;

(
) p -
, (1 p ) - (
), : 1 > p > (1 p ) > 0 . , [ 0, 1] -

11.

241

0, 0.5) , [ 0.5, 1] (. 74 ).
, , . , , ( ) ,
, ( ) p -, ( )
(1 p ) -. , 2
p 2 - , (1 p ) -
(. 74 ).
(. 75 ) , .
, . 76.

. 76.

. ,
x * ,
x * = 1/ 5 , : 1/ 5 0.00110011...
( L ) ,
. 75, ( R ).

242

11.

, LLRRLLRR...
x * = 1/ 5 , , , p , 1 p . ,
, , x * = 1/ 5 , 4
p 2 (1 p ) , n - :
nk

x * p k (1 p ) ,
n - k n k .
nk
n 1/ 2 n , p k (1 p )
Cnk = n !/ (n k )! k ! (). , ,
nk
. 75, p k (1 p ) ( k n k ).
,
nk
nk
k
p (1 p ) ( p k (1 p ) ) :
nk

Cnk p k (1 p ) .
n >> 1 , ,
Cnk :
k

k
Cnk
n

k n

k
1
n

=2

k
nH
n

k
H
n

1
= n
2

H ( ) = log 2 (1 ) log 2 (1 ),

=k /n.

H ( ) . 77.
k / n (, n ), D .

11.

243

. 77. H ( )

, n 1/ 2 n ,
Cnk , :

k
log 2 Cnk
D = lim
= H .
n
n
n
log 2 (1/ 2 )
, H ( ) (. . 76),
.

k / n , ( ). , n
. , k = 0 k = n
. , , p n ,
, (1 p ) n .
. L L .

244

11.

L L ,
.

f () ( ), . f ()
L L ,

( , q ).
,
Dq . , Dq :
N

1 ln p
,
D = lim
q 1 ln r
i =1

r 1

pi , (
) r .
,
Dq q :
( q) = (1 q) Dq .

f () ( q) :
( q) = f () q,

q :
d
( q f () ) = 0.
d

, Dq ( ( q) ),
:
f ( ( q)) = ( q) + q( q),

( q) =

d ( q )
.
dq

11.

245

f () (
q )
q f .

.
Z i = i

pi

Dq .
0, N
n = N / m () m .
:
SmZ ( q) = ( Z k( m ) ) ,
n

k =1

(m)
k

= Z ( k 1) m+l .
l =1

, , log SmZ ( q)
log m ,
[128], . , ( q) :
log SmZ ( q) Z ( q) log m + const.
( - , 2007 . 2008 .),
, (
). . 78.

11.

246

100

80

60

40

20

0
50

100

150

200

250

300

350

400

450

. 78.
( ) ( ):
,

. 79 (m, q) q m
. :
f (( q)) = ( q) q '( q),

(. 79).

f

11.

247

( ) .
, . 80.
, , , .
(. 81),
. , , , , ,
, , , , .

. 79. (q, m) ( )

248

11.

. 80. f ( q, m )

11.

249

. 81.
( ) (*)

, , ,
, , . ,
, ,
,
(
) .

,
,
. 1965 , , ,
. , ,
,
[26].

. ,
, , , . ,
, , (. 82),
y = Ae kt , y

251

, t , A
, k .
, , . ,
. , , ,
, .
,
, .

. 82. -
( Netcraft 2008 )

, , , , , , , ,
.

252

WWW HTML, , , , , , , HTML


, .
WWW ,
, , , .
WWW , , ,
, . , , W3C, WWW
- [74].
, , .

WWW . ,
,
[114].
WWW, ,
.
,
(). .

253


, ,
,
. , , , .
,

(XML). ,
-.
XML ,
XML-, : W3C, DTD, XML Schema, XQuery
( XML-) . . RDF.
. , -
. . 2004 . W3C
OWL (Web Ontology Language).
OWL , . -

254

,
, , .
.
, :
XML, ;
RDF, ;
OWL,
.
, (. 81),
Universal Resource Identifier (URI), , .
, URI, , . URI- URL-, URI- , .

, . , .

255

. 81.

, . ,
, , ,
.

,
.

ARPANET
BFS
DBMS
DDos
DFA
DNS
DTD
HAC
HITS
HTML
HTTP
IETF



-





Advanced Research Projects Agency Network,

Breadth First Search,
Database Management System,

Distributed Denial of Service,

Detrended Fluctuation Analysis,

Domain Name System,
Document Type Definition,

Hierarchical Agglomerative Clustering,
-
Hyperlink Induced Topic Search,
HyperText Markup Language,

HyperText Transport Protocol,

Internet Engineering Task Force,

IRS
ISM
LSA
OSI
OWL
P2P
PLSA
RBFS
RDF
RFC
RWA
Salsa
SNA
SCC
SQL
SVD
SVM
TCP/IP
TREC
URI
URL
W3C

257


Information Retrieval System,
-
Intelligent Search Mechanism,

Latent Semantic Analysis, -
Open Systems Interconnection Reference Model,

Web Ontology Language, -
Peer-to-peer,
Probabilitstic Latent Semantic Analysis,
-
Random Breadth First Search,

Resource Description Framework,

Request for Comments,
Random Walkers Algjrithm,

Stochastic Approach for Link-Structure Analysis,

Social Network Analysis,
Strongly Connected Component,

Structured Query Language,
.
Singular Vector decomposition,

Support Vector Mashine,
Transmission Control Protocol/Internet Protocol,

Text Retrieval Conference,

Universal Resource Identifier,

Universal Resource Locator,

World Wide Web Consortium, W3C

258
WWW
WAIS
XML

World-Wide Web,
Wide Area Information Service,

Extensible Markup Language,

( . Summarization)
, .
, .
(. Weighting) ,
.
, , .
,
, , . -
, .
( . Hypertext) ,
( ).
. . , , . -, ,
HTML.
, (. Hyperlink) . ,
. , ,
, HTML-, , FTP WWW-.
( ) ,

260

( ,
). ,
.
(. IRS Index) -
, . - , .
, (Internet) ,
,
TCP/IP. .
(. Information space) , , , .
, - (. Information
Retrieval System, IRS) , , . - , . , -,
- -.
(. Keyword):
1. , .
2. , ,

.
( . Content) .

(, -) , , .
- .
, .
- ,
- .
( . cache) , , -

261

.
, ,
.
- ( . Latent Semantic Analysis,
LSA) -

. - , , ,
, . LSA -, .
( . Lemmatization)
, () . , . .
,
, .

, . , -
, -.
( . Relation )
, :
;
- : , , ;
: , ,
;
, .
, .

, . . , ,
, .

262

, , a
.
(. Search Engine)
- . , ( ),
.
, (. Recall) -
.
(. Full-text search engine)
- ,
( ) .
( . profile ) ()
, .
( . Ranking)
, , .
( . Relevancy ) , , - ,
. .
. , -
, .
, , :
;

. , ;
.
( . Semantic Web)
W3C,
, , , -

263

, WWW , ,
. XML.
, ,
.
( . Snippet , ) , -, , .
(SPAM) , ,
,
. .
( . Stemming) ,
. , , : , . .
- (. Stop words) ,
/ . . - ,
, .
(. DBMS)
, , .
, :
;
;
/ ;
.
( . tag):
1. , .
2. . , .

264

( . Text corpus) ,
, .
, - .
, ()
.
( . Term) .
.

, , ,
, , , . , . . . . 1, 2- . . .
( . Fractus , )
( ) (),
.
, , . :
.
; - .
ARPANET (Advanced Research Projects Agency Network, )
, . 1969
(Defense Department's Advanced Projects Research Agency).
ARPANET , . 1990 .
Data Mining ( ):
1. Data mining ,
(G. Piatetsky-Shapiro, GTE Labs)
2. Data mining (selecting),

265

(patterns)
(SAS Institute).
Deep Web (, , )
WWW- , . , , ,
. Deep Web
-,
.
DNS (Domain Name System) ( ), IP- TCP/IP.
DNS
IP .
HTML (HyperText Markup Language)
. HTML-
(), , , . HTML ,
. HTML
SGML.
HTTP (HyperText Transport Protocol)
, WWW.
- .
MARC , 1966 16
. 1972
-2 .
OSI (Open Systems Interconnection Reference Model) . .

.
OWL (Web Ontology Language) - XML/RDF. - OWL
- , .
.

266

P2P (Peer-to-peer) ,
. ,
(peer) , . -,
.
RDF (Resource Description Framework)
W3C .
, . , .
RFC (Request for Comments) , ,
. RFC 1969 . RFC
.
SQL (Structured Query Language)
,
.
TCP/IP (Transmission Control Protocol/Internet Protocol)
, ( ) . :
TCP (Transmission Control Protocol) , ;
IP (Internet Protocol) , , .
Text Mining . Text Mining ,
, , .

. Text Mining
.
W3C World Wide Web Consortium W3C , 1994 . CERN DARPA .
W3C -

267

(), INRIA () (). W3C , World Wide Web,


.
WAIS (Wide Area Information Service)
:
1. WAIS- ,
Z39.50.
2. - ,
WAIS-.
XML (Extensible Markup Language)
, W3C 1998 .

, ,
, ,
,
.

[1] . ., . . //
, 06. 2003. URL: http://www.osp.ru/pcworld/2003/06/165855/
[2] . .
// 1996. URL: http://libconfs.narod.ru/1996/4s/4s_p1.html
[3] . . - // - . . 1. . 6. 2004. . 2027.
[4] . ., . ., . ., . .
// . . . : . . 4 / .
.:
,
2007.
. 440464.
URL:
http://window.edu.ru/window_catalog/ redir?id=45290&file=440-464.pdf
[5] . . : //
- . . 1. . 3. 2003. . 110.
[6] . . . .: ,
1971. 240 .
[7] ., . // , 1991.
3. . 1624.
[8] . ., . . . :
, 2001. 128 .
[9] . . 4. , , . .: , 2005. 216 .
[10] . ., . . :
// - . . 1.
. 11. 2005. . 2133. URL: http://dwl.visti.net/art/nti05/
[11] . . .: , 1976.
[12] . . .: , 1972.
[13] . ., . ., . .
. . . 2. : , 2007. 263 .


[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]

269

. . . . 2: , , // . 11. . 62.
2006. URL: http://www.ccc.ru/magazine/depot/06_11/read.html?0302.htm
. . . 2-. .: ,
, 2006. 208 .
. . // . 4. 1997. URL: http://www.osp.ru/text/302/179189/
. . : . .
. . .: . ., 1989. 320 .
. . .: , 1973. 165 .
. . . : - , 1966.
. . // . . 2. 8. 2002. . 718.
. . // . . 2. . 12. 1985. . 1419.
. . // - . . 2. . 1. 2003. . 1
7.
. ., . .
// - . . 2.
. 2. 2004. . 1114.
. ., . ., . .
// . . . . 389.
2. 2003. . 279282.
. , . .
// '2001,
URL: http://www.dialog-21.ru/Archive/2001/volume2/2_26.htm
. // . 2003. 11. URL:
http://www.silicontaiga.ru/home.asp?artId=2066
. . // , 1965. . 1. . 1. . 25
38
. JXTA P2P // Java World. 10, 2001. URL:
http://www.javaworld.com/javaworld/ jw-10-2001/jw-1019-jxta.html
. ., . . .
.: , 1977. 280 .
., . // . 28(4). 2002. . 226242.
. . . .: , 2006. 240 . URL: http://dwl.visti.net/art/monogr-osnov/ spusk3.pdf

270
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]

[48]

[49]

. . Internet. .: -, 2005.
URL: htt://poiskbook.kiev.ua
. . URL:
http://logic.pdmi.ras.ru/~yura/internet.html
. ., . . . M.: , 1990.
. ., . ., . ., . . . WEB-PLAN Group, 2001. URL:
http://www.nbuv.gov.ua/texts/libdoc/01nsaopi.htm
. . . .: , 1988. 176 .
. . .: , 2002. 656 .
. , . .: , 2004. 256 .
. ., . ., . . . , , . .; : , 2005. 368 .
. ., . ., . .
//
. , 2000. . 204210.
. . .: , 1971.
382 .
. ., . ., . . // . , . . 11, 2, 2003. . 3954.
. . . . 2-. .: ,
2001. 296 .
. // Intrnet. 1998. 2.
URL: http://www.citforum.ru/pp/search_03.shtml
. . : // EXPonenta Pro. , 2003. 1. URL:
http://nature.web.ru/db/msg.html?mid=1193685
. . // Internet. 2002.
10. URL: http://www.dialog-21.ru/direction_fulltext.asp?dir_id=15539
. . . 1. . : ., 2007. 640 .
URL: htt://book.itep.ru/1/intro1.htm
. ., . ., . . : . .: , - , 2007. 304 .
. ., . ., . ., . ., . . //
MegaLing'2006 -

[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]

[65]
[66]
[67]
[68]
[69]
[70]

271

. 2027 2006, , , . C. 248249.


. . . .: , 1967.
., . : . .: , 2003. 876 .
. . : , , . .: ,
2002. 112 .
., . . .: , 1991.
280 .
'2006 (, 19 2006 .) : , 2006, 274 . URL: http://romip.narod.ru/
romip2006/index.html
., . //
, 2000. 12. URL: http://www.osp.ru/os/2000/12/ 067.htm
. . .: , 1992. 184 .
. . // . . . 11. 1, 1956. . 227231.
. . .: , 1991. 254 .
. . /
. . . . . .: , 1959.
- . : , , //
. 4. 1998. URL: http://www.osp.ru/text/302/179534/
. ., . ., . . . .: , 2003. 480 .
. . .: , 1963.
. . . ., , 1982. 176 .
. ., . . . .: , 1973.
512 .
Albert R., Jeong H., Barabasi A. Attack and error tolerance of complex networks // Nature. 2000. Vol. 406. P. 378382.
APPN/HPR in IP Networks (APPN Implementers' Workshop Closed Pages
Document). IBM. URL: http://www.javvin.com/protocol/rfc2353.pdf
Avram H. D., Knapp J. F., Rather L. J. The MARC II Format: A Communications Format for Bibliographic Data, Library of Congress. Washington, D.C.,
1968.
Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrieval. ACM Press Series/Addison Wesley, New York, 1999. 513 p.
Bak P. How nature works: The science of self-organized criticality. SpringerVerlag, New York, Inc., 1996.
Bak P., Tang C., Wiesenfeld K. Self-organized criticality: An explanation of 1/fnoise // Phys. Rev. Lett. 1987. Vol. 59, p. 381384.

272
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]

[84]
[85]
[86]
[87]

Bak P., Tang C., Wiesenfeld K. Self-organized criticality // Phys. Rev. A., 1988.
Vol. 38. 1. P. 364374.
Bandini S., Mauri G., Serra R. Cellular automata: From a theoretical parallel
computational model to its application to complex systems // Parallel Computing. Vol. 27, Issue 5, April 2001. P. 539553.
Bell A., Fosler-Lussier E., Girand C., Raymond W. Reduction of English function words in Switchboard // Proceedings of ICSLP-98. Vol 7. 1998. P. 3111
3114.
Berners-Lee T., Hendler J., Lassila O. The Semantic Web. Scientific American,
2001. URL: http://www.sciam.com/article.cfm?articleID=00048144-10D21C70-84A9809EC588EF21
Berry M. W. Survey of Text Mining. Clustering, Classification, and Retrieval.
Springer-Verlag, 2004. 244 p.
Bhargava S. C., Kumar A., Mukherjee A. A stochastic cellular automata model
of innovation diffusion // Technological forecasting and social change, 1993.
Vol. 44. 1. P. 8797.
Bjorneborn L., Ingwersen P. Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14):
12161227. 2004.
Boyle A. Net not as interconnected as you think. URL: http://news.zdnet.com/
2100-9595_22-502388.html
Bradford S. C. Sources of Information on Specific Subjects. Engineering: An
Illustrated Weekly Journal (London), 137, 1934 (26 January), p. 8586.
Brin S., Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW7, 1998.
Broadbent S. R., Hammersley J. M. Percolation processes // I. Crystals and
mazes, Proc Cambridge Philos. Soc. P. 629641. 1957.
Broder A. Identifying and Filtering Near-Duplicate Documents, COM00 //
Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching. 2000. P. 110.
Broder A., Kumar R., Maghoul F. etc. Graph structure in the Web // Proceedings of the 9th international World Wide Web conference on Computer networks: the international journal of computer and telecommunications networking.
Amsterdam,
2000.
P. 309320.
URL:
http://www.almaden.ibm.com/cs/k53/ www9.final/
CJC Burges. A Tutorial on Support Vector Machines for Pattern Recognition.
URL: http://www.music.mcgill.ca/_rfergu/adamTex/references/Burges98.pdf
Cohen R., Erez K., ben-Avraham D., Havlin S. Resilience of the Internet to.
Random Breakdown // Phys.Rev.Lett. 85, 4626 (2000).
Donetti L., Hurtado P. I., Munoz M. A. Entangled Networks, Synchronization,
and Optimal Network Topology // Physical Review Letters. Vol. 95, 18,
2005.
Dorogovtsev S. N., Mendes J. F. F. Evolution of Networks: from biological networks to the Internet and WWW, Oxford University Press, 2003.


[88]
[89]
[90]
[91]
[92]

[93]
[94]
[95]
[96]
[97]
[98]
[99]
[100]
[101]
[102]
[103]
[104]

273

Erds P., Rnyi A. On Random Graphs. I. // Publicationes Mathematicae 6.


P. 290297. 1959.
Erds P., Rnyi A. On the evolution of random graphs, Publ. Math. Inst. Hungar. Acad. Sci. 5. P. 1761. 1960.
Fano U. Ionization yield of radiations. II. The fluctuations of the number of
ions. Phys. Rev., 72. P. 2629. 1947.
Fox G. C. From Computational Science to Internetics: Integration of Science
with Computer Science, Mathematics and Computers in Simulation, Elsevier,
54 (2000) 295306. URL: http://www.npac.syr.edu/users/gcf/internetics2/
Fox G. C. Internetics: Technologies, Applications and Academic Fields // Invited Chapter in Book: Feynman and Computation, edited by A. J. G. Hey,
Perseus Books (1999). Technical Report SCCS-813, Syracuse University,
NPAC, Syracuse, NY, February 1998. URL: http://www.newnpac.org/users/fox/ documents/internetics/
Furnas G. W., Deerwester S., Dumais S. T., etc. Information retrieval using a
Singular Value Decomposition Model of Latent Semantic Structure. ACM
SIGIR, 1988.
Del Corso G. M., Gull A., Romani F. Ranking a stream of news. International
World Wide Web Conference // Proceedings of the 14th international conference on World Wide Web. Chiba, Japan, 2005. P. 97106.
Graham P. A Plan for Spam. 2002. URL: http://paulgraham.com/spam.html.
Grootjen F. A., Van Leijenhorst D. C., van der Weide T. P. A formal derivation
of Heaps' Law // Inf. Sci. Vol. 170(24). P. 263272. 2005. URL:
http://citeseer.ist.psu.edu/660402.html
Hei X., Liang Ch., Liu Y., Ross K. W. Insight into PPLive: A Measurement
Study of a Large-Scale P2P IPTV System. URL: htt://photon.poly.edu/~jliang/
pplive.pdf
Heaps H. S. Information Retrieval Computational and Theoretical Aspects.
Academic Press, 1978.
Hinrichsen H. Nonequilibrium Critical Phenomena and Phase Transitions into
Absorbing States Adv. in Phys. 49, 815 (2000). URL: http://arxiv.org/abs/condmat/0001070v2
Hirsch J. E. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569
16572. 2005.
Hofmann T. Probabilistic latent semantic indexing. In Proc. of the SIGIR'99.
1999. P. 5057.
Hurst H. E. Long-term storage capacity of reservoirs. // Trans. Amer. Soc. Civil
Engineers 116. P. 770799. 1951.
Ilyinsky S., Kuzmin M., Melkov A., Segalovich I. An efficient method to detect
duplicates of Web documents with the use of inverted index // WWW2002,
2002.
Kalogeraki V., Gunopulos D., Zeinalipour-Yazti D. A Local Search Mechanism
for Peer-to-Peer Networks. // Proc. of CIKM'02, McLean VA, USA, 2002.

274

[105] Kleinberg J. M. Authoritative sources in a hyperlink environment. // In Processing of ACM-SIAM Symposium on Discrete Algorithms, 1998, 46(5):604632.
[106] Landauer T. K., Foltz P. W., Laham D. An introduction to latent semantic analysis. Discourse Processes. Vol. 25. 1998. P. 259284.
[107] Lande D. Model of information diffusion // Preprint Arxiv (0806.0283), 2008.
5 p. URL: http://arxiv.org/abs/0806.0283
[108] Lempel R. and Moran S. The stochastic approach for link-structure analysis
(SALSA) and the TKC effect // In Proceedings of the 9th International World
Wide Web Conference, Amsterdam, The Netherlands, 2000. P. 387401.
[109] Lu Q., Cao P., Cohen E., Li K., Shenker S. Search and replication in unstructured peer-to-peer networks. // Proc. of ICS02, New York, USA, June 2002.
[110] Manber U. Finding similar files in a large file system. Proceedings of the
1994 USENIX Conference, p. 110, January 1994.
[111] Manning C. D., Schtze H. Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press, 1999.
[112] Maymounkov P., Mazi`eres D. Kademlia: A Peer-to-peer Information System
Based on the XOR Metric. URL: http://kademlia.scs.cs.nyu.edu
[113] Milgram S. The small world problem, Psychology Today, 1967, Vol. 2. P. 60
67.
[114] Miller E., Swick R., Brickley D., McBride B., Hendler J., Schreiber G., Connolly D. Semantic Web. W3C (MIT, ERCIM, Keio) 2001. URL:
http://www.w3.org/2001/sw/
[115] Mockapetris P. Domain Names Concepts and Facilities // Request for Comments: 1035, 1987. 55 p.
[116] Newman M. E. J. The structure and function of complex networks // SIAM Review. 2003. Vol. 45. P. 167256.
[117] Newman M. E. J., Watts D. J. Scaling and percolation in the small-world network model, Phys. Rev. E, 7332, 1999.
[118] Onnela J.-P., Saramaki J., Hyvonen J., Szabo G., Lazer D., Kaski K., Kertesz J.,
Barabasi A.-L. Structure and tie strengths in mobile communication networks.
Proceedings of the National Academy of Sciences. May 1, 2007, vol. 104.
18, 73327336.
[119] Page S. E. Computational models from a to z // Complexity. Vol. 5, Issue 1, 1999. P. 3541.
[120] Papka R. On-line News Event Detection, Clustering, and Tracking. Ph. D. Thesis, University of Massachusetts at Amherst, September 1999.
[121] Peng C.-K., Havlin S., Stanley H. E., Goldberger A. L. Quantification of scaling
exponents and crossover phenomena in nonstationary heartbeat time series //
Chaos. Vol. 5. 1995. P. 82.
[122] Piatetsky-Shapiro G., Fayyad U., Smith P. Advances in Knowledge Discovery
and Data Mining. Cambridge, Mass: AAA/MIT Press. p. 135. 1996.
[123] Platt J. Sequential Minimal Optimization. URL: http://research.microsoft.com/
users/jplatt/smo.html

275

[124] Powell A. L., French J. C., Callan J., Connell M., Viles C. L. The Impact of Database Selection on Distributed Searching // Proc. of ACM SIGIR'00, pages
232{239, Athens, Greece, 2000.
[125] Program to evaluate TREC results using SMART evaluation procedures. URL:
http://www-nlpir.nist.gov/projects/trecvid/trecvid.tools/ trec_eval/ README
[126] Redner S., Directed and diode percolation. Phys. Rev. B, 25, 3242, 1982.
[127] RFC1625 WAIS over 39.501988. Network Working Group. Request for
Comments: 1625. M. St. Pierre, J. Fullton, K. Gamiel, J. Goldman, B. Kahle,
J. Kunze, H. Morris, F. Schiettecatte, 1994. URL: http://www.faqs.org/rfcs/
rfc1625.html
[128] Riedi R. H., Vehel J. L. Multifractal Properties of TCP traffic: a numerical study
// Technical Report 3128 INRIA Rocquencourt. March 1997.
[129] Rocchio J. Relevance feedback in information retrieval // In G. Salton ed., The
SMART Retrieval System: Experiments in Automatic Document Processing,
Englewood Cliffs, New Jersey, Prentice-Hall, p. 313323, 1971.
[130] Salton G., Fox E., Wu H. Extended Boolean information retrieval. Communications of the ACM. 2001. Vol. 26. 4. P. 3543.
[131] Salton G, Wong A, Yang C. A Vector Space Model for Automatic Indexing //
Communications of the ACM, 18(11):613620, 1975.
[132] Sarshar N., Boykin P. O., Roychowdhury V. P. Scalable Percolation Search in
Power Law Networks. Preprint. 2004. URL: http://arxiv.org/abs/condmat/0406152
[133] Scime A. Web mining: application and techniques. Idea Group Publishing,
2005. 427 p.
[134] Sebastiani F. Machine Learning in Automated Text Categorization. URL:
http://nmis.isti.cnr.it/sebastiani/Publications/ACMCS02.pdf
[135] Simon H. A. Biometrika 42, 425 (1955).
[136] Snarskii A. FreeBSD Stack Integrity Patch. 1997. URL: ftp://ftp.lucky.net/pub/
unix/local/libc-letter
[137] Soumen C. Mining the web. Discovery knowledge from hypertext data. Publisher: Morgan Kaufmann, 2002. 344 p.
[138] Stanley H. E., Amaral L. A. N., Goldberger A. L., Havlin S., Ivanov P. Ch.,
Peng C.-K. Statistical physics and physiology: monofractal and multifractal
approaches // Physica A. 1999. Vol. 270, p. 309.
[139] Stauffer D., Aharony A. Introduction to percolation theory. Taylor & Francis,
London, Washington DC, 1992. 182 p.
[140] Stanley H. E., Amaral L. A. N, Goldberger A. L., Havlin S., Ivanov P. Ch.,
Peng C.-K. Statistical physics and physiology: monofractal and multifractal
approaches // Physica A. 1999. Vol. 270. P. 309.
[141] The Deep Web: Surfacing Hidden Value, 2000 BrightPlanet.com LLC, 35 p.
URL: http://www.dad.be/library/pdf/BrightPlanet.pdf
[142] The Twelfth Text Retrieval Conference (TREC 2003). Appendix 1. Common
Evaluation Measures. URL: htt://trec.nist.gov/pubs/trec12/

276

[143] Ukkonen E.
On-line
construction
of
suffix
trees
URL:
http://www.cs.helsinki.fi/u/ ukkonen/SuffixT1withFigs.pdf
[144] Understanding the Impact of P2P: Architecture and Protocols URL:
http://www.cachelogic.com/home/pages/understanding/architecture.php
[145] Van Raan A. F. J. Fractal geometry of Information Space as Represented by
Cocitation Clustering // Scientometrics. 1991. Vol. 20, 3. P. 439449.
[146] Vapnik V. N. Statistical Learning Theory. NY: John Wiley, 1998. 760 p.
[147] Watts D. J., Strogatz S. H. Collective dynamics of small-world networks. //
Nature. 1998. Vol. 393. p. 440442.
[148] Wikipedia, Support Vector machine. URL: http://en.wikipedia.org/wiki/ Support_vector_machine
[149] Wolfram S. A New Kind of Science. Champaign, IL: Wolfram Media Inc.,
2002. 1197 p.
[150] Wolfram S. ed. Theory and Applications of Cellular Automats. Singapore:
World Scientific. 1986.
[151] Yang B., Garcia-Molina H. Comparing hybrid peer-to-peer systems // Proc. of
VLDB'01, Rome, Italy, 2001.
[152] Yang B., Garcia-Molina H. Efficient Search in Peer-to-Peer Networks // Proc.
of ICDCS'02, Vienna, Austria, 2002.
[153] Yeager N., McCrath R. WebServer Technology. Morgan Kaufmann, San Francisco, California, 1996.
[154] Zamir O. E. Clustering Web Documents: A Phrase-Based Method for Grouping
Search Engine Results. PhD Thesis, University of Washington, 1999.
[155] Zeinalipour-Yazti D. Information Retrieval in Peer-to-Peer Systems //
M. Sc Thesis, Dept. of Computer Science, University of California Riverside, June 2003.
[156] Zeinalipour-Yazti D., Kalogeraki V., Gunopulos D. Information Retrieval in
Peer-to-Peer Networks // IEEE CiSE Magazine, Special Issue on Web Engineering,
2004.
p. 113.
URL:
www.cs.ucr.edu/~csyiazti/papers/
cise2003/cise2003.pdf
[157] Zhou S., Mondragon R. J. Topological Discrepancies Among Internet Measurements Using Different Sampling Methodologies, Lecture Notes in Computer
Science (LNCS), Springer-Verlag, 3391, p. 207217, Feb. 2005.

Вам также может понравиться