Академический Документы
Профессиональный Документы
Культура Документы
.........................................................................
...............................................................................
11
1. ...........................
14
1.1.
............................................. 14
2. ...............................................
2.1.
27
............................................................... 29
2.1.1. ......................................................... 29
2.1.2. ......................................................... 32
2.1.3. ............................................................... 34
2.2. - .......................... 38
2.3. ................................................. 40
2.4. ................................... 46
2.4.1. ......................................... 46
2.4.2. .............................................47
2.4.3. ........................ 48
2.4.4. ..................................... 49
2.4.5. ...... 51
2.4.6. ....................................................... 51
2.5. - ......................................... 52
2.6. ............................ 54
5
58
- ........................................................................... 59
4. ...........................
4.1.
76
.............................................................. 76
4.1.1. ........................... 77
4.1.2. ...................................... 78
4.1.3. ............................................................... 78
4.2. Rocchio............................................................................. 79
4.3. ........................................................................ 80
4.4. - ................................................................... 81
4.5.
........................................................................ 81
4.5.1. ....................................................................... 82
4.5.2. ..................................................... 83
4.5.3. .................................................... 85
4.5.4. .............................................. 86
4.6. .................................................... 86
4.6.1. ........................................ 86
4.6.2. ....................................................... 87
4.6.3. .................... 88
4.6.4. ......................................... 90
4.7.
.......................................................... 92
4.8. ............................................. 99
5. ..................................... 101
5.1.
- .............................................. 103
5.1.1. - ............................ 103
6.
...................................... 124
6.1.
.............................................. 124
6.1.1. ................................................................. 124
6.1.2. ................................................................................ 127
6.1.3. ........................................................130
6.1.4. ................................................................................... 131
7. ............................ 142
7.1.
.................................................................... 145
7.2.
..................................................................... 147
7.3.
..................................................................... 149
7.4.
................. 151
7.5.
........................................................... 152
7.6.
............................................................. 154
8. ...................................... 156
8.1.
....................................................... 157
8.1.1. .................................................................. 157
8.1.2. ................................................................ 157
8.1.3. ...................................................158
8.1.4.
8.1.5.
8.1.6.
8.1.7.
8.1.8.
.........................................................................158
.......................................................... 160
............................................................................. 161
.......................................................................... 161
.................................................................. 161
9. ....................................... 175
9.1.
....................................................... 175
.......................................................................... 250
............................................................. 256
.............................................................................. 259
............................................................................. 268
.
.
, . -,
, , . [116] , ,
, ,
. -, , ,
.
.
, - [35]. , ,
( ). , . (G. Fox) (), ,
, , [91, 92]. , , .
, ,
, .
, ,
, -,
, . -,
10
. ,
.
, ,
. ,
, , . , , .
,
. , ,
,
, , .
.
.
, , ,
, ,
, .
: , , , ; , ,
. ,
.
,
, ,
.
, ,
9 2008 .
.
, , . , , ,
, .
, , WWW, .
.
. , ,
, .
-.
, , . , .
, . , , ,
.
Text Mining, -, . , , , ,
.
12
,
. , ,
.
. ,
. .
. , , , .
-.
, .
, , .
, , , . , ,
.
(complex networks),
,
,
. , .
, ,
(),
.
, (
), (, ) , , ,
.
, , -
13
. ,
.
.
, ,
.
.
,
.
, ,
, .
, .
, ,
.
1.
1.1.
( , ),
, ,
, .
TCP/IP (Transmission Control Protocol/Internet Protocol), .
IP ,
, ,
. ,
IP- .
IP ,
, .
IP- .
IP ARPANET. IETF (Internet
Engineering Task Force),
.
, ARPANET , 40 . 1969
,
1.
15
.
(ARPA) .
, , -.
ARPANET, , . ARPANET
:
;
;
;
.
ARPANET. , DNS .arpa: ,
, , IP- 1.2.3.4, 4.3.2.1.in-addr.arpa.
ARPANET , . 1973 ,
16
1.
. 1984 ARPANET
(NSF),
NSFNet,
(56 /), ARPANET. 1990 ARPANET ,
NSFNet.
,
.
,
[47].
TCP/IP.
(. Request for Comments, RFC),
,
. Request for Comments
. RFC IETF
(. Internet Society, ISOC). RFC
, (., , http://www.ietf.org/rfc.html). , .
- ( W3C, IETF, , 2).
19901994 . ,
. . 1994 (. 1), -.
,
:
, ;
;
;
, ;
;
- .
1.
17
. 1.
[: www.isc.org]
18
1.
, , .
, (), /
.
,
, , .
, ,
. , .
:
1945 (Vannevar Bush) Memex (memory extension),
, . (Ted Nelson) 1965
Xanadu ;
1980 - (T. Berners-Lee), CERN ( ) , ,
;
1990 ,
CERN - (GUI, Graphical User Interface) .
WorldWideWeb. 1992 GUI Erwise Viola;
1993 . (M. Anderssen) NCSA (
,
www.ncsa.uiuc.edu)
Mosaic
Xwindow System UNIX. CERN
HTML HTTP- , CERN HTTPD.
HTML (Hypertext Markup Language) , -.
1.
19
HTML
, ,
. HTML
SGML ( ), HTML
SGML, . . ISO
8879.
HTML , .
HTML-
,
.
-, , ,
. Internet Explorer, Firefox, Opera Safari.
HTML 4.01,
1999 . 2000 . ISO/IEC
15445:2000 ( ISO HTML, HTML
4.01 Strict).
W3C
HTML. 20 2007.
HTML XHTML ( . eXtensible HTML), , , SGML, XML
2000 W3C.
HTTP (HyperText Transfer Protocol), HTML-.
HTTP ,
, , .
HTTP -.
HTTP- , , . HTTP-
: , (GET,
POST, HEAD, PUT),
(Uniform Resource Locator, URL), , ,
, . URL ,
:
, ,
(, ftp);
, ,
;
20
1.
,
.
HTTP URL :
http://[user[:passwd@]host[:port][/path]
host IP ; :port
TCP ,
, ; path
; user ; passwd .
HTTP:
()
, , .
, , , , ,
,
, .
HTTP
. ,
. HTTP , , .
HTTP ( HTTP 1.1, HTTP
) TCP- . , (permanent connection), TCP HTTP-, . -
1.
21
HTTP 1.1, ,
,
, . TCP- ,
.
, , HTTP, :
IP- ( DNS-);
TCP- ;
URL;
(HTML- );
TCP/IP-.
HTML- , HTML
, . ,
, HTML, ,
CSS (Cascading Style Sheets), -
,
, . , WWW
, .
, .
, , . HTML-
, . .
- .
HTML, XML, .
22
1.
1.3.
WWW -. , , ( ), ,
, . , , , .
(Peer-to-peer, P2P ) , . , (peer) , . peer-to-peer
1984 . (P. Yohnuhuitsman) Advanced Peer to Peer Networking IBM.
P2P [14], , .
P2P , TCP/IP TCP UDP. , ,
(- ). P2P
, , .
, P2P , . , , [51]. , ,
, ,
[14].
/, , (), () .
, , . .
. ,
/
, .
, , , .
1.
23
, P2P (
),
. , ,
, P2P-
, ,
BitTorrent (tracker), .
/ (, ), P2P , , ,
,
, , .
, , Gnutella,
BFS, , , , . DHT (Distributed Hash Tables),
, Kademlia,
P2P-. ,
, , , . 2003
Gnutella2.
.
, .
, . .
. , Microsoft
P2P- Scribe Pastry. PNRP (Peer
Name Resolution Protocol), P2P-,
Windows Vista.
P2P
Sun Microsystems JXTA [28]. P2P-
.
, , :
24
1.
. P2P FTP-,
.
. ,
P2P, SETI@HOME, , . ,
.
. , ICQ P2P-.
, ,
login.icq.com.
. - Skype (www.skype.com), 2003 .
. . ,
KaZaA. P2P Skype 10 .
. ,
Groove Network ( )
OpenCola ( ).
P2P-, 2008 150 .
P2P-. Bittorrent, Gnutella2
eDonkey2000.
BitTorrent 2001 .
BitTorrent , , , , ,
- . (www.bittorrent.com) , , ,
. , (torrent
file) , ,
.
2000 . Gnutella
(www.gnutella.com).
Gnutella2 (www.gnutella2.com), 2003 ,
P2P-.
1.
25
EDonkey2000 2000 .
ed2k-, ID .
EDonkey2000 . 200 .
EDonkey2000 10 .
EDonkey2000 . , .
, .
- , . , ,
, .
, P2P-,
, , , .
, , ,
,
(, 2 ).
. ,
(DNS) , 2 [115].
2
GRID.
distributed.net, , .
,
,
- .
, , , .
, P2P. (, Napster
2001 ). , .
Gnutella, , 70 % . -
26
1.
, . . - .
P2P-
. . ID . , , .
P2P- , .
1.4. -
, -, ( ):
;
.
, , .
,
, , .
, , , ., .
2.
, , :
, .
, . - , .
(Retrieval Systems, ) . , , , , ,
, , , .
-
:
;
;
, ;
.
1966 16-
MARC (http://www.loc.gov/marc/), , .
.
1972 MARC-2
[67, 32], .
28
3. Text Mining
1970-
. : 1965 Dialog (http://www.dialog.com/), Thomson, .
1990-
Z39.50 . 1994
2.
29
-
.
(. .
), .
, , .
, , , ,
,
, , ,
. ,
, , .
.
i ti T (i = 1,..., M ), d ( j ) ,
D , wi( j ) 0 , -
( ti , d ( j ) ) .
ti , d ( j ) ,
: wi( j ) = 0 . d ( j )
d ( j ) = ( w1( j ) , w2( j ) ,..., wM( j ) ) .
gi , ti , :
gi (d ( j ) ) = wi( j ) .
2.1.
2.1.1.
. , .
. : 0 ( -
30
3. Text Mining
) 1 ( ).
: wi( j ) {0,1} .
,
(AND, ), (OR, ) (NOT, ). , , ( , dnf).
, , , , .
[68] : q = a (b c).
, a, b c (. 2), , (a b c) (a b c) (a b c) ,
. . :
qdnf = (1,1,1) (1,1,0) (1,0,0) , -
a, b c ( ),
.
. 2. q = a (b c)
. , :
q qdnf = qcc( i ) ,
i =1,.., N
2.
31
qcc(i ) i - qdnf .
d ( j ) q sim(d ( j ) , q ) ( . similarity,
) :
32
3. Text Mining
2. ,
, , .
3. . .
4. ,
.
, . , , ,
. -
.
2.1.2.
,
.
.
, . (G. Salton),
. . (E. Fox) . (H. Wu) 1983 [130].
,
[0, 1]. D ,
d D x y . x, d y, d x y . , , :
x = f x
idf y
idf x
, y = f y
,
max idf
max idf
f x x d , idf x ,
( ), x .
2.
33
(19271995)
( ) d = ( x , y ) , [0, 1] [0, 1].
.
: qor = x y
qand = x y :
sim(qor , d ) =
x 2 + y 2
;
2
(1 x ) 2 + (1 y ) 2
.
sim(qand , d ) = 1
2
, , , a b, :
a 2 x 2 + b 2 y 2
;
sim(qor , d ) =
a 2 + b2
a 2 (1 x ) 2 + b 2 (1 y ) 2
.
sim(qand , d ) = 1
a 2 + b2
34
3. Text Mining
2, p , 1 p < .
, m , ( p ) ( p ) p :
qor = x1 p x2 p ... p xm qand = x1 p x2 p ... p xm . :
1
(1 x1 ) p + (1 x2 ) p + ... + (1 xm ) p p
sim(qand , d ) = 1
.
m
, p = 1 :
sim(qor , d ) = sim(qand , d ) =
x1 + x2 + ... + xm
.
m
p :
sim(qor , d ) max( xi );
sim(qand , d ) min( xi ).
,
(
).
, , ,
q = ( x1 p x2 ) p x3 , :
1
p
p
1 1 p
sim(q, d ) = 1 1 ( x1 + x2p ) + (1 x3 ) p .
2 2
2.1.3.
2.
35
, (fuzzy set) A U ( )
( A (U ),U ) , A (U ) u U A .
[0, 1]. ,
u U . , .
A B U , A A U u U .
, , :
A (u ) = 1 A (u );
AB (u ) = max( A (u ), B (u ));
AB (u ) = min( A (u ), B (u )).
. , [18]. A u U , A (u ) .
A sup A (u ) .
uU
36
3. Text Mining
A U ,
A 1/2.
, [18], U , [0, 100], u, , . U , ,
(. 3):
(0 u 50),
0,
1
A (u ) = u 50 2
1 + 5 , (50 u 100).
[50, 100], 1,
u = 55 (. 3). , ,
55 ,
0.5, 70, 0.9.
50 ( A (u ) = 0 ) . ,
, .
, . c , cil
i l :
cil =
nil
,
ni + nl nil
ni , ti , nl
, tl , nil , .
2.
37
. 3. ,
, d j
ij , :
ij = 1 (1 cil ).
xl d j
d ( j ) i- ti , ti . tl d ( j ) ti (. . cil ~ 1 ), ij ~ 1 ti
d ( j ) . , cil << 1 , ij cil (
). ,
max , . , ,
cil .
, . , -
38
3. Text Mining
,
, :
qj = 1 (1 cc j ) .
i=1
2.2. -
-
- (Vector Space
Model), . 1975 .
SMART [131]. .
, , ,
,
,
. , , . .
ti d ( j )
wi( j ) .
q ,
, , wiq .
,
n- , n . , d ( j )
q , -
2.
39
freqi( j ) , :
wi( j ) = freqi( j ) / max freqk( j ) .
1 k n
N
,
ni
ni , ti , N
. ,
, ,
, . , ni = N , N
, wi( j ) = tfi ( j ) log = 0.
N
,
. 1988 ti :
freqiq
N
log ,
w = 0.5 +
q
max freql
ni
1l n
q
i
freqiq ti .
wi( j ) . TF IDF , TF , IDF , ,
( . inverse document frequency).
40
3. Text Mining
sim(d j , q ) =
d q
=
| d ( j) | | q |
( j)
( j)
i
i =1
(w
i =1
( j) 2
i
wiq
n
(w )
i =1
q 2
i
- , , , :
;
(
);
.
-
, .
2.3.
1977 . . (S. E. Robertson) . -
(K. Sparck Jones) , 1960 . ,
,
-
. , .
2.
41
, :
, d q ?
, . ,
, /, ,
.
,
. , .
,
( ).
,
-
. ,
, .
( ,
3. Text Mining
42
, , , ).
, ,
,
.
, , . X , Y , X , Y G, G .
X Y :
P( X | Y ) =
P( X Y )
.
P (Y )
, :
P( X | Y ) =
P( X Y )
P (Y X )
; P (Y | X ) =
;
P (Y )
P( X )
P(Y X ) = P( X Y ) P ( X | Y ) =
P(Y | X ) P ( X )
.
P (Y )
, ,
( R ) P( R | q, d ), q , d
, , ( R )
P ( R | q, d ).
O( R) :
O( R) =
P( R)
P( R)
=
.
P( R ) 1 P( R)
, , 1 P ( R ) < 0.5
1 P ( R ) > 0.5.
, :
O ( R | q, d ) =
P ( R | q, d )
.
P ( R | q, d )
:
P( R | d , q) =
P( R q d )
.
P(q d )
2.
43
P( R q d )
, , d q , P(q d ) ,
q d .
:
P( R | d , q) =
P (d | R q ) P ( R q ) P(d | R q ) P( R | q)
=
.
P(d | q) P(q)
P(d | q)
P( R | d , q ) P ( R | d , q ) , :
p(d | R q) p( R | q)
p( R | q) p(d | R q)
p(d | q)
.
O( R | d , q) =
=
p(d | R q) p( R | q) p( R | q) p(d | R q)
p(d | q)
.
T = {t1 , ..., Tn } , D.
d = ( w1 ,..., wn ), :
1, ti d ;
wi =
0, ti d .
,
, :
n
p (d | R, q) = p( d | R, q) = p(ti | R, q).
i =1
:
n
p( R | q) p(d | R q)
p (t | R q)
O ( R | q, d ) =
.
= O( R | q) i
p( R | q) p(d | R q)
i =1 p (ti | R q )
O( R | q ) . , ,
, ( ti T \ q ),
,
. .:
ti T \ q : p (ti | R, q ) = p (ti | R , q ).
3. Text Mining
44
O ( R | q, d ) = O ( R | q )
ti q d
p (ti | R q )
p (ti | R q )
p (t | R q )
.
i
p (ti | R q ) tiq \ d p (ti | R q ) tiq p (ti | R q)
q d
, q \ d ,
, , q ,
.
. ,
, ,
:
ri = p ( wi = 1| R, q );
ni = p ( wi = 1| R , q ).
:
O ( R | q, d ) = O ( R | q )
1 ri
ri
n 1 n .
ti q d
ti q \ d
, :
(1 ri )(1 ni )
(1 n )(1 r ) = 1,
ti q d
O ( R | q, d ) = O ( R | q )
ri (1 ni )
1 ri
n (1 r ) 1 n .
ti q d
ti q
, , . ( , ). :
ti q d
log
ri (1 ni )
r
1 ni
= log i + log
.
ni (1 ri ) ti qd
ni
1 ri
2.
45
,
:
ri =
reli
nreli
; ni =
,
rel
nrel
reli ,
i; nreli , .
, :
reli nreli
1
ri (1 ni )
rel
nrel
.
= log
SV = log
nreli reli
ni (1 ri ) tiqd
ti q d
1
nrel
rel
, :
SV =
ti q d
SVi =
log
ti q d
,
: d (1) ,..., d (6) (. 1) d (7) ,..., d (9) (. 2), . ,
t1 , t2 , t3 , t4 (, , ). R .
1
t1
t2
t3
t4
d (1)
d (2)
d (3)
d (4)
d (5)
d (6)
reli
rel = 3
nreli
nrel = 3
exp( SVi )
1/4
3. Text Mining
46
reli nreli ,
exp( SVi ) , . 1.
,
(7)
d , d (8) , d (9) , t1 , t2 , t3 , t4 . 2. . , . 2, , , d (8) ,
.
2
t1
t2
t3
t4
(7)
(SV)
2 = log 4 + log1
d (8)
4 = log 4 + log 4
d (9)
2.4.
( P2P)
, . ,
[104].
2.4.1.
, , , (ID): (, ), (Key), . . MN , M , N
. ID , . . 4.
, ID 0 [155, 156].
ID 0 14. ,
14. ID 14 ID 0 , , 14.
2.
47
. 4.
, .
2.4.2.
(Breadth First Search, BFS) [104]
P2P, , , Gnutella (www.gnutella.com). BFS (. 5) P2P N . q ,
( ). p , . r (Query) , - (QueryHit), . QueryHit
, (. 5).
48
3. Text Mining
q QueryHits ,
. QueryHit , .
. 5. BFS
BFS ,
(
).
. ,
(Time-to-live, TTL). TTL , (
, , , ,
, ). TTL
57 , . TTL 0, . BFS .
2.4.3.
(Random Breadth First
Search, RBFS) BFS
[104]. RBFS (. 6) q
, .
RBFS.
RBFS ,
;
, . ,
. .
2.
49
. 6. RBFS
2.4.4.
(Intelligent Search Mechanism,
ISM) P2P (. 7).
,
, , , .
, , .
. 7. ISM
(profile) , . .
.
,
.
, , .
ISM , T N . , ,
(Least Recently Used, LRU) .
50
3. Text Mining
RR ( Pi , q ) = Sim(q j , q ) S ( Pi , q j ),
jQ
, . Q , Pi ; S ( Pi , q j ) , Pi q j ; Sim , -
:
Sim( q j , q) =
qj q
qj q
RR , . ,
, , . , ,
Qsim( q j , q) . , P1 q1 q2 q : Qsim( q1 , q) = 0.5 Qsim( q2 , q) = 0.1 , P2 q3 q4 Qsim( q3 , q) = 0.4
Qsim( q4 , q) = 0.3 . = 10, Qsim( q1 , q)10 ,
2.
51
, ,
BGP4 (RFC 1771),
, .
2.4.5.
[151, 152] >RES (. 8), ,
.
>RES , Z
( Z ). >RES
q k ,
m . k 1 10
>RES BFS , (Depth-first-search).
. 8. >RES
52
3. Text Mining
. ,
, k ,
. , k - T
, kT . RBFS, RBFS . , RBFS
,
. RBFS, RWA
.
, RWA, (Adaptive Probabilistic Search, APS) [156]. APS
, ,
. RWA
, APS
( )
. APS
, RWA.
2.5. -
-
- , , , .
,
. .
SQL,
, , :
, ( );
;
, , , , (
. snippet , ), . .
2.
53
-
, , , .
, -
, , ,
(, , , , ).
, (Google,
Alltheweb, AltaVista, . .), AND, OR NOT,
.
Lycos, : ADJ, NEAR, FAR BEFORE.
Google
(http://www.googleguide.com/),
( , , ), (OR) ().
, URL, , . .
Google, , (site:), (admission site:), , DVD
player $150..250, , , . .
HTML, PDF, RTF, DOC (MsWord), PS.
, . , (Custom Search Folders), ,
() .
, Vivisimo (http://www.vivisimo.com),
Mooter (http://www.mooter.com) Nigma (http://www.nigma.ru) .
iBoogie (http://www.iboogie.com/)
3. Text Mining
54
, Windows.
, , ,
- Zoom
[3] InfoStream [31], .
2.6.
, (recall) (precision).
,
. -
, .
,
.
(NIST)
Text REtrieval Conference (TREC,
http://trec.nist.gov/) [125]
(, http://romip.ru/).
:
a
b
c
d
:
(recall):
r=
a
.
a+c
(precision):
p=
a
.
a+b
(accuracy):
acc =
a+d
.
a+b+c+d
2.
55
(error):
err =
b+c
.
a+b+c+d
F- (F-measure):
F=
2
1 1
+
p r
(average precision):
1 k
ArgPrec= prec _ rel (i ),
k i =1
k , , i
, prec _ rel (i ) i - ( ).
i - , prec _ rel (i ) =0.
11- / TREC
(), ,
[142]. , . n , 0, 1/n, 2/n, ..., 1.
/ :
1. 0.0; 0.1; 0.2; ...; 1.0
( 11 ).
2. .
3. .
,
[54] (. 9). 20 , 4 . , , , , .
0.25, 0.5,
0.75 1.0. , 0 0.5 1.0 (
3. Text Mining
56
0,8
0,6
0,4
0,2
0,0
0,0
0,2
0,4
0,6
0,8
1,0
. 9. .
-
11- ,
,
. , , .
(recall) . , .
, . , :
;
;
2.
57
, . . ;
, ;
;
. .
,
. , .
, , , . ,
, .
,
.
. , . .
-
, Text Mining.
- , . . 0.9999.
(RBC, Google) .
3. TEXT MINING
,
, ,
,
.
(Text Mining),
. , , Text Mining . Text Mining
[75, 32].
Text Mining
, , , . Text Mining
. , Text Mining ,
,
, . ,
. Text Mining
. Text Mining,
.
, (Data Mining), Text Mining.
,
Data Mining . - GTE Labs:
3. Text Mining
59
,
[122].
90-
, Text Mining
Data Mining, . Text Mining ,
. Text Mining ,
, .
3.1. -
Text Mining -.
-, ,
:
- .
(J. J. Jerry), . (J. Jerry).
- ,
. (D. Mangeim), . (R. Rich).
- -
,
(. ).
- ( ),
(. ).
, -
, .
- : . -
(, ). -
.
60
3. Text Mining
3. Text Mining
61
, . ,
, .
3.2.1.
(Feature Extraction)
, . ( , , , ), , , ,
. .
:
) Entity Extraction ,
. , , , , .
) Feature Association Extraction .
) Event and Fact Extraction , .
- ,
.
, ,
. , - , , . .
,
, .
(. 10). ,
( ). ,
, , -.
,
. , ,
62
3. Text Mining
, ,
, , , .
. 10.
(, , ), .
, , (, , , , ) , . . .
3. Text Mining
63
3.2.2.
,
( ) . , ,
. , , , . ,
:
;
, , .
TVP ' , .
p j ( j = 1, ..., M )
,
D
d ( i ) D (i = 1, ..., N ) , Pj D , p j , e(ji ) :
1, d ( i ) Pj ,
e =
(i )
0, d Pj .
(i )
j
p j pk :
N
v j ,k = e(ji )ek( i ) .
i =1
v j ,k
TVP ' .
, ,
TVP ''
. Wi = {w1(i ) , ..., wn( i ) } , d (i ) .
p j ( j = 1, ..., M )
, :
64
3. Text Mining
IP( p j ) = Wi .
d ( i ) Pj
S = {s1 , ..., sK }
, D , t ( j ) = (t1( j ) , ..., t K( j ) )
ti( j ) , :
1, si IP( p j ), i = 1, ..., K ,
ti( j ) =
0, si IP( p j ), i = 1, ..., K .
p j pk :
K
v j ,k = t j tk = ti( j )ti( k ) .
i =1
, TVP " v j ,k .
, , ,
(. 11).
, , v j ,k > 0 , v j ,k > 0 ,
, ( i ), d ( i ) Pj , d ( i ) Pk . ,
:
IP ( p j ) IP( pk ) , , t j tk = v j ,k > 0.
.
, .
. , , ,
, ,
, .
3. Text Mining
65
(
, . . 11) , k-means (. . 5.2),
.
3.2.3.
(Automatic Text Summarization)
, ,
. . [55].
,
.
.
,
:
, ,
, ;
, ,
;
, ,
, .
( )
, -
66
3. Text Mining
. :
3. Text Mining
67
, .
.
, .
.
,
, . , . , ,
. .
.
, .
, . . .
:
.
, , . .
;
, ;
, , , .
, , Microsoft Word
.
3.2.4.
( , ) () [1]. ( ) , -
68
3. Text Mining
, . , . ,
,
, .
3.2.5.
, ,
, .
( ),
( ) .
,
, , .
( )
, ,
.
. ( ) : , .
, -, . , ,
.
, . ,
, .
, , , . -
3. Text Mining
69
,
.
, , , ,
. .
, , , . ,
, ,
: . , ( ). ( ).
, . . .
,
, , (),
[82], [103] [110].
.
. ,
, ,
.
. ,
. (
, ) .
, , ,
InfoStream, , , 6 12
(, ). ,
70
3. Text Mining
.
:
. , , , :
A A, A A.
A .
.
A B , . .:
A B
/ B A.
:
A B, B C
/ A C.
, ,
, ,
. , , .
, , :
A B B A,
A B, B C A C.
, , ,
,
, .
,
, , 3, 4 5 .
d ( i ) d ( j ) :
(i )
( j)
1, d d ,
ai , j =
(i )
( j)
0, d d .
3. Text Mining
71
i, j : ai , j = a j ,i ,
:
i, j , k : ai , j = 1, a j ,k = 1 ai ,k = 1.
( ), :
N
i =1
|ai , j a j ,i |
j =1
i =1
j =1
ai , j
:
N
i =1
ai , j a j , k ai , k
j =1 k =1
N
N
ai , j
i =1
j =1
N .
, , .
,
.
, -.
, , ,
( ), .
3.2.6.
,
,
. ,
. , -
72
3. Text Mining
, .
, . . ,
( ).
.
. -
. , TF IDF. :
1.
. (
), . -
. ,
, .
2. ( ).
3. , , .
4. ,
, .
5.
. ,
, .
,
.
, . [120], , , . :
1. (
Text Mining ).
2. .
3. Text Mining
73
3. , .
4. , .
, ,
, ,
, , , [94]. [31] , :
) , ;
) , , , ( , IDF - );
) , , (
, , . .);
) ( ,
).
:
n ;
D1 ;
Dn ;
Di i- ;
PlusDic ;
sim( Di , D j ) i j ;
sim( Di , PlusDic) i ;
Ranki , i - .
sim( Di , D j )
- .
,
, , ,
w Di , D j , D j :
74
3. Text Mining
sim( Di , D j ) = P( w Di | w D j ) P ( w D j ).
Newi Di , ),
:
Newi =
log(i + 1)sim( Di , D j )
j =1
,
,
, - .
3. Text Mining
75
4.
, !
, ,
, , !
4.1.
(Text Categorization, TC) (
, ).
(machine learning, ML)
(information retrieval, IR) [33, 134]. :
;
.
,
.
, , .
,
,
.
(
),
. ,
.
4.
77
, . , .
.
. , ;
. ,
. , ,
.
:
( ) ;
;
;
;
,
;
(. . , );
.
4.1.1.
D = {d (1) , ..., d ( N ) } , C = {c1 , ..., cM }
, , d ( i ) , c j
, d ( i ) c j (1 True)
(0 False). ,
.
, . .
, , :
78
4.
1. . ' .
2. . .
.
,
, , . , , C = {c1 , ..., cM } M
{ci , ci } .
,
[0, 1].
,
, . . .
4.1.2.
, ci CSVi
( ), D [0; 1], .
, , .
ci () i .
CSVi ( d ) > i , d ci . d k , . . k , CSVi ( d ) .
, .
. ci
,
. ,
, ci ,
.
4.1.3.
4.
79
CSV (i ) (d ) = d c (i ) .
,
CSV ( i ) (d )
Ci
d = ( d1 , ..., d N ) , d :
d c (i )
.
CSV (d ) =
| d || c ( i ) |
(i )
c (i ) , .
4.2. Rocchio
(profile, ) .
, ( ) .
, . (J. Rocchio) [129], ,
. Ci c ( i ) = (c1( i ) , ..., cN( i ) ) (N ),
ck(i )
Rocchio :
ck(i ) =
| POSi | d ( j )POSi
wk( j )
| NEGi | d ( j )NEGi
wk( j ) ,
wk( j ) tk d ( j ) (, ,
TF IDF), POSi , c (i ) , . . POSi = {d ( j ) | (d ( j ) , ci ) = 1} ,
NEGi , c (i ) : NEGi = {d j | ( d j , ci ) = 0}. ,
, . , = 1 = 0, Ci
,
.
4.
80
CSV ( i ) (d ) , d , Ci ,
.
4.3.
,
, . ,
.
.
( F ) (C ).
:
D ,
, ;
O = oi , j , i
,
,
. . , F
ij
i, j
mi , j M i - j - .
4.
81
4.4. -
- ,
- (
), , (, )
. ,
, . .
.
Ci , {d1(i ) , ..., d K(i ) ) :
( x = d1( i ) ) ( x = d 2( i ) ) . . . ( x = d K( i ) ) , Ci .
, ,
, , -, , -, . - ,
, . , , . :
(( )
( )
( )
( - ))
,
.
4.5.
, ,
82
4.
. 1011 . ,
, , 10 000 (. 12).
, , ,
, . .
. 12.
, , () . , .
4.5.1.
, . 13.
,
, ( ), [16].
( NET ), F , , OUT .
4.
83
. 13.
:
n
NET = xi wi ,
i =1
n , , xi i -
, wi , NET .
:
OUT = F ( NET ).
:
OUT = K NET ;
1, NET > T ,
OUT =
1, NET T ;
OUT =
1
;
1 + e NET
4.5.2.
,
, , . .
:
(FeedForward)
, ;
, .
84
4.
.
(. 14), 1958 . . (F. Rosenblatt). .
.
,
+1 1.
. (19281971)
:
1. wi (i = 1, ..., N ) b : .
2. xi (i = 1, ..., N )
d .
3. :
y (t ) = sign wi (t ) xi (t ) b ,
i =1
t , b .
4. :
wi (t + 1) = wi (t ) + r[d (t ) y (t )] xi (t ), i = 1, ..., N ,
wi (t ) i - t ; r ; d (t ) .
4.
85
,
.
5. 2.
. 14. : X ;
W ; (1), (2), (3) ;
Y
,
. ,
(, ). . , [45].
,
.
.
.
, .
4.5.3.
,
- (
). -
86
4.
() . - :
1. X = {x1 , ..., x N }
, , .
d ( j ) , wk( j ) ; , , ,
. (back propagation).
,
, , .
4.6.
4.6.1.
D C : P(C | D).
: D = ( w1 , ..., wN ) ,
wi i , N .
:
4.
87
P(C | D) = ( D) = ( i wi ) ,
i =1
C {0, 1} , = {1 , ..., N } , , :
( x) =
1
.
1 + exp( x )
, , i ,
0.
i , ,
i .
4.6.2.
C , F1 , ..., Fn :
P(C | F1 , ..., Fn ) .
:
P (C | F1 , ..., Fn ) =
P(C ) P ( F1 , ..., Fn | C )
.
P( F1 , ..., Fn )
:
P(C | F1 , ..., Fn ) = P(C ) P( F1 , ..., Fn | C ) = P(c) P( F1 | C ) P( F2 , ..., Fn | C , F1 ) =
= P (c) P ( F1 | C ) P ( F2 | C ) P( F3 , ..., Fn | C , F1 , F2 ).
,
F i , Fj i j :
P ( Fi | C , Fj ) = P ( Fi | C ).
:
n
4.
88
.
:
P( D | C ) = P( wi | C ).
i
:
P (C | D) =
P (C )
P ( D | C ).
P( D)
, C
C . :
P(C | D) =
P(C )
P(wi | C );
P( D) i
P(C | D) =
P(C )
P(wi | C ).
P( D) i
C ( ):
P(C | D) P (C )
P ( wi | C )
=
.
P (C | D) P (C ) i P ( wi | C )
:
ln
P(C | D)
P(C )
P( wi | C )
= ln
+ ln
.
P (C | D)
P(C ) i
P( wi | C )
P(C | D)
> 0, (. ., ,
P (C | D)
p (C | D) > p (C | D) ), , D
C.
ln
4.6.3.
(). , , .
4.
89
, ( 0 1), , . , 0.5, ,
.
, . (P. Graham) [95],
n w1 ,..., wn ,
, ,
:
Spm =
w
.
(1
)
w
+
w
i
, S , ,
, , ,
t. , , :
P ( S | A) =
P( A | S ) P( S )
.
P( A | S ) P( S ) + P( A | S ) P( S )
, ,
, - , , P( S ) = P( S ), :
P ( S | A) =
P( A | S )
.
P( A | S ) + P( A | S )
, A1 A2 , ,
t1 t2 . ,
( ). , , ( t1 t2 ), ,
:
P( S | A1 & A2 ) =
=
P( A1 | S ) P( A2 | S )
=
P( A1 | S ) P( A2 | S ) + P( A1 | S ) P( A2 | S )
p(t1 ) p(t2 )
.
p(t1 ) p(t2 ) + (1 p(t1 ))(1 p(t2 ))
= 1 .
90
4.
,
= 1 . , .
, ,
Spm .
, .
4.6.4.
.
. , , , , , , . .
, , ( ).
,
, .
(, ),
: , , , , (,
).
, ,
,
.
: H 1 , H 0 H1 .
: H1 , H1 . t , , p(t | H1 ) , 1/2. .
- , .
4.
91
,
, ( ). Spm :
Spm( x ) =
x
,
x + (1 ) x
(
) .
,
( H 1 ) . ,
, H1 H 1 .
() .
, , . ,
, . x ( 0, 1) .
, , . ,
, . .
. .
.
( ). , .
(. 15). xi = 1,
i, xi = 0.
( ), ,
+
+
w1 ,..., wn w1 ,..., wn .
. NET +
NET .
92
4.
. 15.
, NET +
NET . , OUT + OUT ,
, OUT +
OUT , , . 15.
4.7.
(Support Vector Mashine, SVM), . . [146, 84],
. . , . . c c ,
,
. ,
N- .
, ,
.
4.
93
. .
,
{x1 , ..., xn } R N {x1 , ..., xn } R N { y1 , ..., yn } {1, 1}.
yi 1 xi
c , 1 . ,
. ( N-
), . (. 16),
:
(), c , c .
: w ,
b xi :
w xi w xi :
N
w xi = w j xi , j .
j =1
w xi = b , . ,
w xi b ,
, . w ,
94
4.
b . ,
?
. 16.
SVM :
,
. SVM ,
w b , > 0 (
) :
w xi b + yi = +1,
w xi b yi = 1.
1 ,
, . ,
xi :
1 < w xi b < 1 ,
(. 17).
w . , , .
4.
95
. 17.
, ,
, SVM , .
, : yi ( w xi b) 1 (
, , yi {1, 1} ).
. xi yi , w
b .
, 2 / w .
w b , ,
w , :
2
w = w w.
.
, , i 0, {x1 , ..., xn }. :
yi ( w xi b) 1 i .
96
4.
, i = 0, xi .
i > 1, xi . 0 < i < 1,
,
.
:
w + C i
2
yi ( w xi b) 1 i , C ,
. , :
w2
+ C i min;
2
i
yi ( w xi b) + i 1, i = 1, ..., n.
:
1
max;
2 w w + C i i (i + yi ( w xi b) 1) min
w,b
i
i
0, 0, i = 1, ..., n.
i
i
w b , :
w = i yi xi ,
i =1
. . ,
i 0 . i > 0 ,
.
, :
y x x b = 0 .
i =1
i i
b , :
4.
y
i =1
97
= 0.
w
, :
1
i j yi y j ( xi x j ) min;
2
i, j
i
i yi = 0;
i =1
C 0, i = 1, ..., n.
i
,
xi , . ,
,
.
:
;
.
,
:
1. ( x ) x , .
2. ,
, (kernel function):
K ( x , z ) = ( x ) ( z ).
( x ) , K ( x , z ) , ( x ) . .
3. :
K ( x , z ) . xi x j K ( xi , x j ) , .
98
4.
4. w b , , ,
w ( x ) b , .
,
. ,
LibSVM
(http://www.csie.ntu.edu.tw/~cjlin/libsvm) [124] :
2
K ( x , z ) = exp( x z ),
.
, . 18. ,
. , , , . ,
( x ) , .
SVM :
;
. , SVM ;
,
.
. 18.
4.
99
:
, C ;
;
.
4.8.
. . c , c . , c . :
true positive (TP) , , false positive (FP) , ; false negative (FN) true negative (TN) . false negative , false positive .
. :
TP
.
TP + FN
, (
,
):
TP
.
TP + FP
.
,
100
4.
( ),
.
.
.
5.
, ,
, (,
, ).
, .
.
,
(, Yahoo! Open Directory) : ? , . . . , ,
(, ), .
, , .
( HTML- ,
, , URL-, . .). ,
, , . .
,
102
5.
(), .
. ,
, ,
,
[25].
, , , .
, , (
)
.
(), :
1/ p
D p ( x , y ) = ( xk yk ) p
k =1
p = 2 :
Dp ( x, y) =
(x
k =1
yk ) 2 .
, , :
Sim( x , y ) = x y,
x , y , ,
, , ,
.
, , . .
( ) .
[111, 134, 32]. -
5.
103
, [31].
- , .
5.1. -
5.1.1. -
LSA/LSI ( . Latent Semantic Analysis/Indexing - /)
[106] (SVD) [93].
D = {d ( j ) | j = 1, ..., n} A , ,
( m ). A r m n A = USV T , U V m r
r n , , S , ( si ,i 0 ).
S A . ,
S , A , .
A ,
S k
( Sk ), U V -
(, U k , Vk ),
Ak = U k Sk VkT A , k . , X M N :
x .
i =1 j =1
2
ij
, Ak k ,
A Ak F , , :
104
5.
Ak = arg min A X
X :rank ( X ) = k
LSA ,
k A , .
Ak k - , ( V ), ( U ).
(),
, , . . .
k LSA . , k
, , .
U k VkT . , U k k - . , VkT
k - . , k -
.
,
LSA. d (,
) , , . ,
.
d ( A ), : d ' = Sk1U kT d .
q , i- 1,
i , 0 , q
: q ' = qTU k Sk1.
q d q ' VkT {d } ( VkT {d } -
d - VkT ).
, ,
, , .
-
5.
105
, .
LSA - , .
HITS (Hyperlink Induced Topic Search)
. LSA , .
.
SVD O ( N 2 k ), N = | D | + | T |, D
, T , k .
LSA ,
. , LSA
.
5.1.2. -
- ( . Probabilitstic
Latent Semantic Analysis, PLSA) LSA,
. PLSA
,
.
, , k z1 ,..., zk ( k ). zi P ( zi ) ,
k
( P ( zi ) = 1) .
i =1
P (d | zi ) ,
zi Z , d D .
P ( d | z ) = 1.
i
d D
P ( t | zi ) ,
zi , t
T zi .
P ( t | z ) = 1.
tT
5.
106
, d t ,
t d ,
(. 19 a, ), :
P ( d , t ) = P( d ) P (t | d ),
k
P(t | d ) = P ( t | zi )P ( zi | d ) .
i =1
, (. 19 ) :
k
P ( d , t ) = P ( zi )P ( d | zi ) P ( t | zi ) .
i =1
k , PLSA
:
P( zi ) , zi ;
P(d j | zi ) , d j -
, zi ;
P (t j | zi ) , t j , -
zi .
. 19. :
) ; )
t d , tf ( d , t ).
:
L = tf (d , t )log P(d , t ),
dD tT
5.
107
,
.
PLSA EM (Expectation Maximization
),
1) , , 2) , [101].
:
P( z | d , t ) =
P ( z ) P ( d | z ) P (t | z )
,
P( z ') P(d | z ') P(t | z ')
z 'Z
L :
P(t | z ) tf (d , t ) P( z | d , t ),
dD
P(d | z ) tf (d , t ) P ( z | d , t ),
tT
P( z ) tf (d , t ) P( z | d , t ).
d D tT
L
. , .
, PLSA .
: 1) U , ui ,k
P(d ( i ) | z ) , 2) V , v
k
j ,k
P(t j | zk ) , 3) S k ,
P( zi ) . T .
P ( z ) USV
SVD, , U V PLSA
. , , k
S PLSA.
PLSA LSA L .
5.2. k-means
k-means (k-) {d (1) , ..., d ( N ) } ( -
108
5.
,
) : k , ( ) .
C j ( j = 1, ..., k ) C j , . k N k ,
,
.
, , :
Sim(d , C j ) =
d Cj
| d || C j |
,
Sim(d , C j ) .
C j ( j = 1, ..., k ) ,
, , , .
, . .,
(,
).
k-means Q :
k
LSI, k-means O ( kn ) , n
(). , .
5.3. -
- (Hierarchical Agglomerative
Clustering, HAC) ,
, ,
5.
109
, . ,
. .
HAC . ,
.
, ,
, . . Ci
C j :
Sim(Ci , C j ) =
1
Sim( x, y ).
Ci C j ( Ci C j 1) x , yCi C j , x y
| Ci C j |
Ci C j , x y , Ci C j .
HAC O ( n 2 s ) , n
, s .
5.4.
(Suffix Tree Clustering)
.
W S ,
( ) VW S (, )
V . , | V | 0. ,
substring sub ,
ring . V .
, .
, ().
, .
(, (E. Ukkonen)
[143]), O ( n ) ,
n .
. ,
110
5.
, ,
, .
,
,
. S t1 ,..., tn . , . . t1 , ..., ti S .
, . :
0. t1 .
1. t1t2 .
...
n 1. t1 , ..., tn1
t1 ,..., tn .
n. t1 , ..., tn t1 ,..., tn $
($ ).
abca$
. 20.
-
. , ,
Clusty (http://www.clusty.com) Nigma
(http://www.nigma.ru).
,
, .
. 20.
,
, , () . . ,
( ),
5.
111
. , , , , . ,
, , , . ,
, . , , .
. . 21 I know you know I know. 6
1 6.
. 21.
. . 22
cat ate cheese, mouse ate cheese too cat ate mouse too.
, a f. , ,
, .
, , . Bm Bn , | Bm |, | Bn | .
Bm Bn .
sim( Bm , Bn ) Bm Bn :
112
5.
| Bm Bn |
| Bm Bn |
> ,
> ;
1,
| Bm |
| Bn |
sim ( Bm , Bn ) =
0, ,
, 0 1,
0.6.
. 22.
. 23. ,
. 22 6- ( a f).
= 0.7, b = 0.6, c 0.6,
ate -.
,
Bag of Words, , . , , , .
.
5.
113
5.5.
, .
,
InfoStream [31].
114
5.
. 23.
5.
115
, ( 1 ),
. -
, . , . 1, ,
,
. .
, . , k-means ,
- .
. T ti (i = 1, ..., N )
() Pj ( j = 1, ..., M )
P , , .
E = P T P ,
.
, , , k-means LSI, , .
P ( ).
, .
- , .
, . , .
M , ,
116
5.
. A = M T M . A .
, B = M M T ,
.
B
A . ,
A ,
B , .
5.6.
,
. . (, ).
, , ,
.
, . .
, . , , . ,
-
, .
-, , HITS
(hyperlink induced topic search) PageRank, 1996
IBM . (J. M. Kleinberg) [105] . (S. Brin) . (L. Page) [80].
,
,
.
5.
117
5.6.1. HITS
HITS (Hyperlink Induced Topic Search),
. , - (. . 5.1) -
.
HITS
(, ) (, ). ,
,
, , , .
d j D
a ( d j ) h( d j ) :
a(d j ) =
|D|
i =1, i j
h ( d i ), h( d j ) =
|D|
i =1, i j
a ( d i ).
, HITS LSA.
A , aij , d i d j , . : A = USV T , S si ,i . AT A,
:
T
T
T
2 T
2
A A = VSU USV = VS V , S
118
5.
P(d , c) =
cC ,d D
P( d ) P ( c | d ),
cC ,d D
P( c | d ) = P( c | z )P( z | d ).
zZ
PHITS , P( z ) , P( c | z ) ,
P ( d | z ) , L( D, C ) .
: P ( c | z ) ; P ( d | z ) .
Z , P ( c | z )
z . ,
,
L . , - , PHITS HITS.
5.
119
5.6.2. PageRank
PageRank .
PageRank HITS, ,
.
- PageRank :
-, . -
. .,
, . , -
, URL. , WWW - . , PageRank -
, , .
, PageRank
n {d1 , ..., d n } , (- A ), C ( A) - A
.
, , - D , A , -
120
5.
URL .
- N -
, (URL)
1 ( ). PageRank
PR( A) A , :
n
PR( A) = (1 ) / N +
i =1
PR(di )
.
C (di )
. 30
.
HITS PageRank, ,
() , , ,
.
, , , . .
-
(-) , . , (SEO, Search Engine Optimization), -
.
, ,
- .
5.6.3. Salsa
5.
121
- (, ), a , Va - (, ). , .
s G Gbip
sh sa . Salsa
.
Gbip .
. 24. Salsa:
122
5.
Gbip ( Gbip ), ( ).
, .
, t
Gbip (
), Va
Vh , t , .
Salsa ,
: Gbip ( ), Gbip .
, : W G . Wr , W
, Wc ,
W . , H , WrWcT , A ,
, WcTWr .
Salsa A H ,
, ,
Gbip . A H , HITS.
[108] ,
v , :
v = c1 InDegree(v) ,
u :
u = c2 OutDegree(u ) ,
c1 c2 , InDegree OutDegree , .
. . , Salsa
, HITS, , ,
. -
5.
123
,
.
5.6.4.
2005 . . (J. Hirsch) ( h -),
[100].
h ,
h . p h , h
N p h ,
N p h , , h
(. 25).
- , ,
( h ),
h (
[49]).
. 25. h -: 0X
; 0Y
,
.
.
6.
,
.
, . ,
, , . , .
6.1.
, ,
.
6.1.1.
, . (V. Pareto)
, ,
. 1906 , 80 20 .
,
. , -
6.
125
, N = A X p , X , N
, X , A p
. , : X 1, p > 1 . , . . , , .
, 80/20, . , 20 %
, 80 % ,
. : 80 % -
20 % -.
(18481923)
, - , , ,
80 ,
20 .
. ,
x1 , x2 , ... xn , ...
. -
126
6.
x(r) ).
, N , x( r ) , . . N = r .
:
r=
A
x( r ) p
:
1
A p
x( r ) =
r
n ( n = 1, 2, ..., N )
x(r) , m (n ) :
1
n
A p
C
m (n ) = x(r) = = ,
r =1
r =1 r
r =1 r
n
= 11/ p; C = A1/ p .
(,
n >> 1 ), :
n
m (n )
1
C
C 1
dr
n .
1
r
= m (n ) / m ( N ) = n / N
(. . 26):
= 1 .
,
n , (
) .
, . 26, 0.2
20 % 0.8 80 % ( ).
6.
127
1
0,9
10, 9
0,8
0,7
10, 8
0,6
0,5
0,4
0,3
0,2
0,1
0
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
. 26. :
= 1 = 0 ( )
= 0.8, = 0.9
6.1.2.
. (G. Zipf) ,
. , ,
,
. .
- , , ,
: f r = c , f
; r ; c
( ).
, ,
0.060.07.
128
6.
(19021950)
, , .
. , [111]
, 11 000 .
(the, and, .), 1 % .
.
, , . . , .
, , , ,
, , :
N( f ) =
B
,
f
N ( f ) ,
f , B .
. , . p ,
(1 p ) , . ,
6.
129
, .
,
[73]. ,
.
, , . . (H. A. Simon) [135]. : n ,
, (n + 1) - :
1. N ( f , n) , f n . ,
(n + 1) - , f
f N ( f , n)
, f .
2. (n + 1) - .
, .
1975 . [131]
.
.
,
. , ( , ) , , .
, .
.
,
( ,
, ).
,
,
. , -
130
6.
, ,
- .
,
,
. :
, , . , . ,
, n
:
(1 p ) n p,
p .
,
, .
,
.
6.1.3.
. (S. Bredford), , ,
: , , .
: ,
, ,
, . . 1934 . [79]:
N3 N 2
=
= const ,
N 2 N1
N1 , N 2 ,
N3 .
6.
131
. ,
[2, 31], -, .
6.1.4.
. .
(H. S. Heaps)
, [98]. , , . , ! , (. 27):
v ( n ) = n ,
v , , n , . 10
100, 0.4 0.6.
132
6.
Heaps (k)
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
0
5000
10000
15000
20000
25000
30000
35000
. 27. , :
,
, , ,
[96], .
6.2.
( ),
, ,
(. 28):
2
1
2
f ( x) =
e
2
,
2
f ( x) =
1
x 2
(ln x )
22
, x>0
,
,
.
6.
133
. ,
.
. 28. .
. , , , ( ) :
f ( x) =
B
,
x
P ( X x) =
A
,
x
0 < x < ,
= 1,
P ( X x ) , X x , A , .
, . (B. Mandelbrot)
-. -
134
6.
, ,
. ,
P ( X x) =
x
B
dx,
x
P ( x )
= f ( x ).
x
, . (. 29).
. 29. ,
, , ,
: F ( ) = const . ,
dx f ( x ) dx = F ( ) d . , x = a tg ( ) , :
f ( x) =
1 a
,
a2 + x2
< x < .
x f ( x )dx
6.
135
1 , , ,
.
, ,
.
, A / x
. , . , .
, , [39].
6.3.
, , , , II . ,
.
, . , , . , x , :
f ( x ) = g ( ) f ( x )
,
f ( x) = x 3 .
:
f ( x ) = 3 x 3 = 3 f ( x ), g ( ) = 3 .
,
g ( ) , ,
x0 , x .
, x = x0 , :
f ( x ) = f ( x0 ) = g ( ) f ( x0 )
, g ( ) :
g ( ) = p
136
6.
p . :
g () = g ( ) g ()
f ( x1 , x2 ,...) = g ( ) f ( x1 , x2 ,...) .
,
f ( x, y ) = p f ( x, y ) .
, , ( )
f ( a x, b y ) = p f ( x, y ) .
,
a , b ,... .
,
. ,
f ( x, y ) = p f ( x, y )
= 1/ y :
x
f ( x, y ) = f ,1 = y p f ( x, y )
y
x
f ( x, y ) = y p f ,1 .
y
:
x
F =
y
x
f ,1
y
:
x
f ( x, y ) = y p F .
y
6.
137
, f ( x, y ) ,
,
F ( z) .
F ( z ) . , ,
x , , F ( z ) y z = x / y .
, x / y . , f ( a x, b y ) = p f ( x, y ) = 1/ y1/ b , , , :
f ( x, y ) = y F ( z ),
z=
x
y
,
a /b
p
b.
.
u (, T ) ( , T
)
. , u (, T ) . , . , :
u (, T ) = 3 ( / T )
( z ) , , u (, T ) , . , , max , u (, T )
( max T , ).
:
( N , p ) p ( Np ) ,
( N , p ) ,
() ,
, ( pN ) . , :
138
6.
( z << 1) x ( z >> 1) ln x
,
.
().
6.4.
, ,
. ,
, ( ,
, , ), .
. ,
: ,
() (). . ,
() (). (,
). (
Tc = 3600 C ) , , . .
. Tc ,
. ,
, .
.
[52] . , , p pc ,
. , , .
, ,
. , .
. I
, . , 00 C .
( -
6.
139
), . II
.
, . , .
II , . . , ( ) .
, , , = 0 , , , 0 .
.
, G ( , T ,...) .
. (,
. .) . , . .
:
G
= 0,
2G
>0.
2
,
. , .
(19081968)
140
6.
. .
= 0 . , G ( , T , ...) :
G ( , T ) = G0 (T ) + A(T ) 2 + B 4 + ... ,
= 0 (, , ) 0 ( ).
G / = 0 :
2 A + B 3 = 0 ,
= 0 = A / 2 B 0 .
T > Tc = 0 , T < Tc
0 . , T > Tc = 0 A > 0 . .
T < Tc , . .
A < 0 . : A > 0 T > Tc , A < 0 T < Tc , A(Tc ) = 0
.
A , ,
A = (T Tc ) .
2 = A / 2 B (T Tc ) ,
Tc T , G ( , T )
G ( , T ) = G0 (T ) + (T Tc ) 2 + B 4 + ... .
. 30 G ( , T ) T > Tc T < Tc .
II
G ( , T ) , (
, ). S C
S =
G
,
T
C = T
S .
T
6.
141
7.
, ! !
, ! !
, , ! !
, 40-
. (C. E. Shannon) [62].
, , .
, , , , , , . . , , ,
, ,
, .
, . , ( . .
. . ). ,
, . ,
. ,
, . , 100.
:
,
( N = 1 ). -
7.
143
( N = 100 ). ,
. , , . , , 100
50 N 1029 . ,
,
( 1029 ) . , , 1024 ( ), .
, ,
:
S = k ln N ,
k , .
pi = p = 1/ N = const :
N
i =1
i =1
S = k ln N = k p ln N = k p ln p .
, ,
:
N
S = k pi ln pi .
i =1
k k = 1/ ln 2
( !), :
N
S = pi log 2 pi ,
i =1
, . .
. , ,
, . .
144
7.
,
.
(mutual information), , , [111], -
.
c , , , t .
:
I (t , c ) = log
P (t , c )
,
P(t ) P( c )
P(t , c ) t c ; P (t ) t , P ( c ) c .
, t c .
. , ,
Autonomy IDOL (Intelligent Data Operating
Layer), , , .
IDOL,
, . , , , IDOL .
IDOL, , , .
.
.
7.
145
7.1.
U = {u1 , ..., uN }, :
N
H (U ) = K pi log 2 pi ,
i =1
pi ui , K .
(19162001)
, ,
:
N
i =1
i =1
H (U ) = pi log 2 pi =
1
1
1
log 2 = log 2 = log 2 N ,
N
N
N
, ,
.
N ( ).
, , n u1 , u2 , ..., un . :
7.
146
u10
1
u5
2
u21
3
u3
N
, .
i ui ( i = 1, 2, ..., n ) pi , .
N ui Npi . , p , Np1 u1 , Np2
u2 . . ( ), :
log 2 p = N pi log 2 pi .
i =1
N , , :
n
S = pi log 2 pi ,
i =1
N
, :
p = 2 NS .
( p ),
K :
K=
1
= 2 NS .
p
, ( )
,
( u1 p1 , u2 p2
. .).
, , ,
. , -
7.
{p ,
1
147
p2 , ..., pn } .
K ({ p1 , p2 ,... pn } ) = 2 ( 1 2 n ) .
, .
{1, 0, ..., 0} ( 1 u1 )
NS
{p ,p
,... p
, . .
( u1, u1, ..., u1 ) . (
0log 0 = 0 ) :
n
, ,
, , () .
, , ,
pi = 1/ n {1/ n, 1/ n, ..., 1/ n}
:
n
S = pi log 2 pi = log 2 n .
i =1
N ,
N / n ,
NS {1/ n ,1/ n ,...1/ n} )
K ({1/ n,1/ n,...1/ n} ) = 2 (
= 2 N log 2 n = n N .
7.2.
,
, :
1. [0, 1].
2. .
3. , , . . .
, p 1 p,
(. 31).
7.
148
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,0
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1,0
. 31.
( p , H)
H (U ) = [ p log p + (1 p ) log(1 p )] .
(. 32), p, q,1 p q ,
:
7.
149
. 32.
( OX p, OY q, OZ )
7.3.
U V ,
p (U ,V ) = p (ui , v j ) , (i = 1, ..., N ; j = 1, ..., M ) .
:
p (ui , v j ) = p(ui ) p ( v j / ui ) = p ( v j ) p(ui / v j ) , , :
7.
150
N
H (U ,V ) =
i =1
i =1
i =1
i =1
j =1
p(u ) p(v
i
j =1
H ui (V ) V -
ui U :
M
H ui (V ) = p( v j / ui ) log p( v j / ui ).
j =1
, , V U
V U :
N
j =1
i =1
j =1
H (UV ) = H (U ) + HU (V ),
H (UV ) = H (V ) + HV (U ).
2.
:
HU (V ) H (V ),
HV (U ) H (U ).
3.
U V :
HU (V ) = H (V ),
HV (U ) = H (U ).
7.
151
7.4.
,
.
,
.
.
U
p (u ) n u .
:
p (ui u ui + u ) =
ui +u
ui
U U n , p (ui u ui + u ), U :
n
i =1
i =1
i =1
p(u )u 1, :
i =1
, u 0
, :
u 0
- H (U )
(
). , [9]. :
7.
152
.
.
1. , U . U k , uk = ku, ,
p (uk ) = p(u ) / k , k :
p (u )
p(u )
log
kdu = h(u ) + log k .
k
k
, h(U ) . ,
,
.
2. , :
p ( x) =
1
e
2
( xm )
2 2
hmax (U ) = log 2 2 e.
,
m.
7.5.
Z Z = {z1 , ..., z N }. ,
7.
153
,
W = {w1 , ..., wN }.
,
w j
:
N
H w j ( Z ) = p( zi / w j ) log p( zi / w j ).
i =1
j:
N
HW ( Z ) = p ( w j ) H w j ( Z ).
i =1
,
, :
I ( ZW ) = H ( Z ) HW ( Z ).
, :
N
I ( ZW ) =
i =1
p( zi w j )
j =1
p( zi ) p( w j )
p( zi w j ) log
:
1. . ,
H ( Z ) HW ( Z ) I ( ZW ) = H ( Z ) HW ( Z ) 0.
2. Z W :
H ( Z ) = HW ( Z ) I ( ZW ) = 0.
3. I ( ZW ) = I (WZ ). :
I ( ZW ) = H ( Z ) HW ( Z ) = H ( ZW ),
I (WZ ) = H (W ) H Z (W ) = H (WZ ).
H ( ZW ) = H (WZ ).
4. Z W :
I ( ZW ) = H ( Z ).
154
7.
7.6.
, , . ,
.
V , N , U , M (). , , ,
.
HU (V ) , , V ,
U . :
M
HU (V ) = P(u j ) H u j (V ).
j =1
H ui (V ) , :
M
HU (V ) = P ( vk , u j ) log( P ( vk , u j ) / P (u j )).
j =1 k =1
V U :
I (V ,U ) = H (V ) HU (V ).
,
(, ). I (V ,U ) ,
V U .
:
I (V ,U ) = H (V ) HU (V ) =
7.
N
155
= P( vk ) log P ( vk ) + P( vk , u j ) log( P( vk , u j ) / P (u j )) =
k =1
k =1 j =1
= P( vk , u j ) log P( vk ) + P( vk , u j ) log( P( vk , u j ) / P (u j )) =
k =1 j =1
k =1 j =1
= P ( vk , u j ) log
k =1 j =1
P ( vk , u j )
P ( vk ) P ( u j )
:
1. I (V ,U ) 0 , ,
V U .
2. I (V ,U ) = I (U ,V ) , . . U V , V U .
I (V ,U ) = H (U ) HV (U ) .
3. I (V ,U ) H (V ), I (V ,U ) H (U ) , ,
U V .
4. I (V ,V ) = H (V ) ,
V .
8.
,
.
, (complex
networks) [116], ,
, , , , , . . ,
.
,
, , ,
.
, ( ),
, .
1954 . (J. Barnes)
.
XX , , , .
, (Social Network
Analysis, SNA). ,
, , ,
, WWW.
, , [87].
8.
157
8.1.
: , ;
;
.
, , , . .
; ; .
8.1.1.
:
, ;
,
;
;
;
(eccentricity) ( )
;
(betwetnness), , ;
.
8.1.2.
, :
, , , ,
, ,
,
. .
,
:
. , , ;
158
8.
( ), ;
. ,
;
(
).
8.1.3.
P ( k ), , i
ki = k . , P( k ) , . P( k )
( P (k ) = e m m k k !, m
),
( P(k ) = e k / m )
( P (k ) ~ 1 k , k 0, > 0 ).
(scale-free).
.
,
.
8.1.4.
,
,
. , . dij . ,
:
l=
2
dij ,
n(n + 1) i j
n , dij i
j.
. (P. Erds) . (A. Rnyi) ,
[88, 89].
8.
159
(19131996)
. ,
, , . , , ,
500. , , : ;
; , ,
; .
,
. , 90 % 8, , .
, . . , . ,
. , :
il =
2
1
.
n( n 1) i > j d ij
,
dij .
160
8.
8.1.5.
. (D. Watts) . (S. Strogatz) 1998
, [147], .
,
(clique). , ,
.
. k ,
k , . ,
, 1
k ( k 1). ,
2
, . ,
(,
) i C (i ). , .
. .
,
. , . .
8.
161
8.1.6.
(betweenness) , , .
. . bm m :
bm =
i j
B (i, m, j )
,
B(i, j )
B(i, j ) i j ,
B(i, m, j ) i j , m .
8.1.7.
.
, . . .
, . , .
. (Rka Albert) ,
- ,
WWW 326 000 [65].
,
, ( ). . , ,
.
8.1.8.
,
, ,
, . . , ,
. . , .
162
8.
8.2.
. . , , .
, . , , ,
, ,
[77].
4,6
, 20 %
. , ,
.
4,6 7
, . . ,
18 . ,
.
, ( 18 ) . , . ,
(. 33). ,
, . ,
-,
.
8.
163
. 33. : ) ;
) , ;
) , :
8.3.
,
( WWW, )
. 1967 . .
, , ,
[113].
. . ,
, (Small Worlds) [146].
, . . 34:
, , ( ) ,
.
. 35 . .
(
).
164
8.
. 34. -
, , ,
.
WWW ,
. , (S. Zhou)
. . (R. J. Mondragon) , ,
, ,
, .
(rich-club phenomenon). , 27 %
5 % , 60 % 95 %
5 % 13 % , 5 %.
8.
165
. 35.
( 0 )
, WWW
, , . .
. , , . ,
, .
, ,
, ,
, , . , , (entangled
networks). , . , , , ,
.
166
8.
8.4. WWW
8.4.1. WWW
, WWW,
, , . , , , ,
, . WWW , ,
SNA. ,
,
. , -.
1999 . . (A. Broder) IBM
AltaVista, IBM Compaq - [83], - (Bow Tie, . 36).
AltaVista 200
- , .
8.
167
- :
(28 % -)
(Strongly Connected Component, SCC), , , ,
;
22 % - - (IN).
, ,
;
22 % - (OUT), , ;
22 % - : , , , , .
, 90 % -,
.
, .
. ,
- .
,
,
-. , ,
168
8.
- Bow Tie . , , -,
[78].
Bow Tie, , , . ,
( ) ( edu 325729) , . . ,
i , 1 i k (
k 2.1 , k 2.45 ). , , WWW
, 11
, 0.15 (
0.0002).
-
, - ,
, , 16. , .
, ,
- .
. ,
, IBM
(N. Lamour) [32].
- , ,
, , .
-,
, . , , - .
- , -
AltaVista, -,
-, .
.
.
8.
169
. (L. Bjrneborn)
, . , . , .
,
WWW.
8.4.2.
-, , ,
. .
-,
, ,
,
( ) , . , . -, -,
:
;
, . . -, ;
,
,
- (
, );
, ;
-, , - .
-
InfoStream [31],
-. 2500 , , :
170
8.
Reuters
-
BBC
-
1051
983
882
787
773
675
662
631
623
598
595
, -,
,
Salsa [108].
log( N OUT + 1), log( N IN + 1) , N OUT , N IN
(. 37).
(. 38).
log (NOUT + 1)
8.
171
10
9
8
7
6
5
4
3
2
1
1
5
log (NIN + 1)
. 37.
172
8.
. 38.
-, .
.
, -, .
8.5.
. , ,
.
:
;
;
8.
173
.
, , TouchGraph Amazon ,
( , , ).
. TouchGraph, , Livejournal
TouchGraph LiveJournal Browser.
WWW TouchGraph Google Browser
(http://www.touchgraph.com/TGGoogleBrowser.html) , , . Google Browser Java-, -,
Google. ,
( Google) , . TouchGraph Google Browser , (. 39).
174
8.
. 39. - ( TouchGraph)
9.
- . .
, : , ?
.
?
9.1.
, , () . .
( . percolation , ) 1957 . . .
(S. R. Broadbent) . . (J. M. Hammersley) [81]. ( ,
, , ...), ,
. , , ,
[132].
176
9.
. . (19202004)
,
[48].
.
, ( p ) , , . pc ,
.
, .
p = 0 .
, p = pc
, ,
, . ,
, ,
.
.
, . . , , ,
. , , , ,
pc ,
. 39.
9.
177
R ~ 1/ G . . 40, ,
p
.
. 40.
( )
, , , , . , . , pc , ,
. pc
.
, , ,
, ;
;
. .
,
,
( ), (, ) , , , .
9.
178
9.2.
,
, , ( ).
:
P ( p ) ,
,
, ;
s s2ns ( p )
S ( p) =
( . mean cluster
s
n
p
(
)
s
s
size), ns s ,
,
;
( p ) ,
. G (r , p pc ) G ( ri rj , p ) = g (ri , rj ) ,
g (ri , rj ) i j ,
, , ,
. r G ( r, p ) , ,
G ( r, p ) exp (r / ( p )) .
, p pc ( p )
.
( p pc ), ,
, , .
, . ,
, , ,
d f ( d = 2) = d / 1.896, d f ( d = 3) 2.54.
9.
179
(. 41)
( . backbone) , , ( . dead
ends), .
,
( , 1/ h , 0 < h << 1 ).
h
II , . .
. 41. : ) ;
) ( )
, ,
, = (T Tc ) / Tc , T , Tc
, . , Tc ,
( ). Tc (. 41):
~ h | | , > 0, (T > Tc ), > 0;
~ | | , < 0, (T < Tc ), > 0,
, , h
( h << 1). -
180
9.
. 42.
h ,
Tc , , (h << h ) . ,
( h >> h ) .
h
, :
1
= h f ( / h ),
= (T Tc ) / Tc .
1
f ( z ) z = / h
:
z ,
f ( z ) ~ const ,
| z | ,
z +,
z 0,
z .
9.
181
II .
e , .
e ,
= ( p pc ) / pc , pc .
e
:
t
e = h f ( / h ),
z t ,
z +,
f ( z ) ~ const ,
z 0,
| z |q ,
z .
t q , = t + q.
, p
T p
(. 43).
. 43.
, e
, . . | |<< 1.
182
9.
9.3.
. ,
, , .
.
- f ( ) = p (1 ) + (1 p ) (2 ) .
(. 44). ,
min / max << 1 . .
. 44.
( r ~ 1/ ) ( , ):
r = r0 e x , >> 1 ,
r , x (0,1) ,
f ( x ) . , x rmin = r0 exp ( )
rmax = r0 .
>> 1 : rmax >> rmin max >> min . , ,
r = r0 e x -.
e
-
9.
183
, , . , , ,
. , ,
.
, , , ( rmin ). .
rc = r0 exp ( xc )
( r < rc ) .
, , , . , , , rc .
rc : :
1
f ( x ) dx = pc
xc
,
f ( x ) = 1,
x (0,1) xc = 1 pc . , :
(1 pc )
rc = r0 e xc = r0 e
e = 0 e x = 0 e (1 p ) .
c
, ( ) , , , ( ), . ,
.
. . (
184
9.
) . , , , .
9.4.
( ) , ,
WWW . . 45
( . fully directed) . .
.
. , , , . 46
OX . OX , || , ,
.
. 45.
-
, , . OX
9.
185
; , .
, , , . 47. p+ p
OX ( , ), p
. . .
: q = 1 p p+ p .
. 46.
. 48. , , p , , q , A
. . q p , q p
pc - .
q p+ fully directed . , q p+ q p . p+ p ,
186
9.
fully directed , .
. 47.
. 48.
9.
187
9.5.
, , . .
(giant connected component).
() N , pc 1/ N , . .
k 1 .
. (M. Newman) . [117] . .
[146] , shortcuts,
, ,
.
( ):
( kd )
1/ d
(shortcuts), N
, N , k , d .
(shortcuts).
l (, )
:
l=
f (z
1) const ,
1/ ( k )
1/ d
N N
f
,
k
f (z
1)
log z
.
z
z = N / 1 , N 1/ d 1, . .
(shortcuts) . :
l=
N N
f
N,
k
9.
188
. . , .
z = N / 1 , . . N 1/ d 1, :
l=
N N
1/ d
f log N ( kd ) log N ,
, , .
.
. ,
q = 1 p , . . (, ,
)? qc = 1 pc , (giant connected component), . . ( )
? [117] , pc ( Npc
) (shortcuts) (. 48)
(1 pc )
.
=
k
2kpc 1 + kpc (1 pc )
. 49 pc
, . ,
, ,
.
,
P ( k ) k ,
. pc
. , , > 3 :
qc = 1 pc = 1
1
,
2
k 1
3 0
k0 k K 0 .
9.
0,8
0,8
(pc, 1)
0,6
0,4
0,2
189
(pc, 1)
(pc, 2)
0,6
(pc, 2)
0,4
(pc, 10)
0,2
0
0,2
0,4
)
0,6
0,8
0 -4
10
(pc, 10)
-3
10
0,01
0,1
. 49. pc
: , .
k0 = 0 qc qc = 4 .
, () 4 , . . , .
,
[85].
9.6.
, [85],
, . , .
, (backbone) . ,
. , . .
, , . ,
.
, (, ) q .
, ,
, ,
, . () -
190
9.
. ,
DDos- [136].
1 % .
( . directed) .
()
,
.
- () WWW , HTTP
, , -,
. , Google, - , . -
(, -) , -
- ( -). , -, -, ,
, .
-, [141].
10.
,
[10]. , .
.
10.1.
, , , , ,
( ), , . . t , ,
:
y (t ) = y (t0 ) + v (t t0 ),
t0 , y (t ) t , v () .
(t ) , :
192
10.
(tn ) =
1
n
y (t ) ( y(t ) + v(t
i
i =0
t0 ) ) .
2
,
. .
, : (t ) t ( 1 2 1),
,
.
10.2.
( )
, :
y (t ) = y (t0 )e ( t t0 ) ,
.
,
t0 , ..., tn , . :
y (ti ) = y (t0 )e ( ti t0 ) = y (t0 ) e ( ti ti 1 +ti 1 t0 ) = y (ti 1 ) e ( ti ti 1 ) .
:
y (ti )
= e ( ti ti 1 ) .
y (ti 1 )
: (ti )
ti :
(ti ) = (ti ti 1 )
:
10.
(ti ) = ln
193
y (ti )
.
y (ti 1 )
ti :
(ti ) = ln
y (ti )
y (ti ) y (ti 1 )
.
y (ti 1 )
y (ti 1 )
(ti ) :
( tn ) =
1
n
( (t ) )
i =0
, (t )
,
, . : (t ) t , , 1.
, ,
. , .
10.3.
[5, 6, 11]
, y (t )
:
dy (t )
= ky (t ) ,
dt
k .
, y (t ) , [4, 5]. :
N ry (t ),
194
10.
N , y (t ) ,
r , .
, ,
( ) :
dy (t )
dt = ky (t )( N ry (t )),
y (t ) = y .
0
0
, , .
,
t = 0 n0 . , (
) , : 0 < t D > 0 t > D = 0 (
D = const ) , , u(t ) v (t ), :
u (t ), 0 < t < ,
y (t ) = v(t ), t > ,
u (t ) = v(t ), t = .
( D > 0 ) , , .
: .
.
,
N,
u (t ) :
du(t )
= pu(t )(1 qu(t )) + Du(t ),
dt
u(0) = n0 .
10.
195
, p
.
(
).
D .
q , u (t ) D = 0.
, v(t ) , , :
dv(t )
= pv(t )(1 qv(t )).
dt
u (t ) v(t )
t = :
v( ) = u ( ).
:
y = ay 2 + by ,
z = 1/ y :
z + bz + a = 0 .
:
z=
1
[C a ( x)dx]
( x)
:
( x ) = ebx .
C : , . :
u (t ) =
us
u
1 + ( s 1)exp[( p + D )(t )]
n0
196
10.
us u , :
us =
p+D
.
pq
, , , S- () , . 50.
5
4
3
2
1
0
. 50. u(t )
, n0 ,
.
,
, , .
, . 50 :
tinf =
1
u
ln( s 1) + .
p + D n0
, S-
, t ~ tinf u(t ) .
u(t )
:
u (t ) =
us exp[( p + D )t ]
,
u
exp[( p + D )t ] + ( s 1) exp[( p + D ) ]
n0
10.
197
t<
1
u
ln( s 1) + = tinf
p + D n0
u(t ) , . . t,
tinf , .
, , (. 51):
v (t ) =
v ( )
,
qv ( ) + (1 qv ( )) exp[ p(t )]
:
v( ) = u ( ).
u (t )
t < , , :
v (t ) =
vs ( p + D )
,
p + D (1 exp[ p (t )])
vs = 1/ q v (t ) .
5
4
3
2
1
1,5
2,5
. 51. v (t )
, vs , u (t ) .
, us ( t )
vs , . , , ,
,
198
10.
, .
, .
y (t ) . 52.
, ,
, ,
.
.
, ,
. , , -, , ,
, . .
.
5
4
3
2
1
0
. 52.
:
T Nm
0 i =1
mi (t )dt = MT ,
mi (t ) i - , N m , M , .
,
. ,
-
10.
199
,
.
.
:
Nm
dmi (t )
= pi mi (t ) rij mi (t ) m j (t ),
dt
j =1
N m .
, . () , .
-, , pi
rij ,
.
:
dmi (t )
= ( pi + Di (t , i ) ) mi (t ) rij mi (t ) m j (t ) .
dt
j
pi Di ,
, i , Di
.
10.4.
, .
, , ,
, .
, , ,
(). , , , .
, ,
.
200
10.
. (J. Von Neumann) [41] . (S. Wolfram)
[149].
. (19031957) .
[53]. , ,
,
, .
,
.
.
, (), . : , .
.
, , , .
, , . ,
, ,
.
10.
201
, , . () . , .
j
. j - t + 1 , , :
y j (t + 1) = F ( y j , O( j ), t ) ,
F , , ,
. ,
, . . y j O ( j ) , : y j (t + 1) = F (O ( j ), t ) . -
:
(
);
, . . ;
;
.
,
.
, , ,
. . , .
, ,
, yi , j ,
,
:
yi 1, j , yi , j 1 , yi , j , yi , j +1 , yi +1, j ),
( (G. Moore):
yi 1, j 1 , yi 1, j , yi 1, j +1 , yi , j1 , yi , j , yi , j+1 , yi +1, j 1 , yi +1, j , yi +1, j+1 ).
.
(. 53),
202
10.
. . ,
. .
. 53. . : wikimedia.org
t + 1 t :
yi , j (t + 1) = F ( yi 1, j 1 (t ), yi 1, j (t ), yi 1, j +1 (t ), yi , j 1 (t ), yi , j (t ), yi , j +1 (t ), yi +1, j 1 (t ), yi +1, j (t ), yi +1, j +1 (t ) ) .
. , ,
, . , . , .
(J. Conway)
M. (M. Gardner) [12].
[76]. ,
,
[43].
,
t + 1 . t (
).
20 000 . , , .
10.
203
[76].
: , , . : 1
; 0 . , ,
, ( 1, ).
, ,
. . ( )
m , p (
) pm > R, ( R ), ( 1).
, , .
,
. , , , ,
, :
1 ( ); 2 , , ( ); 3
, ( , ). :
, , (. 54 );
( );
, ,
: pm > 1 ( m 2 : 1.5 pm > 1 );
, ,
( ,
);
, ,
(
).
10.
204
.
40 40 (
). ,
, . . .
. 54.
,
http://edu.infostream.ua/newsk.pl ,
80 150 .
. 54.
: ;
;
,
. 55.
10.
205
: 1 ,
, 2 ,
: 3:1:0; .
. 55.
: (); (); ()
[107].
, :
x g = f ( t , g , g ) ,
t ( ), g ,
, g .
, xw ( t ) x g :
x w = 1 f ( t , w , w ) .
206
10.
, , , . . , :
x g + xw + xb = 1,
xw t .
, :
xb = 1 xg xw = f ( t , w , w ) f ( t , g , g ) .
. 55 ,
f ( t , , ) ( ):
f ( t , , ) =
C
,
1 + e ( t )
C .
. 56 xg , xw , xb , .
10.
207
. 56. ,
, :
( xg ); ( xw ); ( xb )
, ,
, - (), .
10.5.
, , ,
. , ,
, .
, ,
,
208
10.
. - , .
, ,
.
(self-organizing), , . (V. Ashby) 1947 , . (N. Winner),
. (G. Forster) .
. (Per Bak) [7, 69]. 19871988 . ,
. (C. Tang) . (K. Wiesenfeld) [70, 71]
,
,
.
, , , ,
.
(19482002)
, , , . , ,
,
. -
10.
209
, , , ,
.
,
, , .
, , . , . , ,
. . - [69], . . (H. Yager)
, ,
. (J. Feder) . (T. Joessang)
, .
,
,
, , .
,
, . , ,
(. 57).
. 57.
,
, .
, . h( x ) x ( x = 1, 2, ..., N ) .
10.
210
(. 58 ) h = h ( x ) z ( x ) = h ( x ) h ( x + 1) , . 58 .
. 58 .
1: x
z ( x ) > zc ,
. zc = 2
:
z ( x) z ( x) 2,
z ( x 1) z ( x 1) + 1 z ( x )>2
( 1)
, h ( x ) x
, 2 , ( ) .
:
z (1) = 0,
z ( N ) z ( N ) 1,
z ( N 1) z ( N 1) + 1 z ( N )>2
( 1)
1 ,
, ,
.
. 58 h ( x )
z ( x ) , . , , x = 6
z ( 6 ) = 3 > 2 .
, x = 5 x = 7
. 58 .
1,
, , , x t + 1 /
t . 1
, . 58.
, , 1 1 z ( x ) 2 ,
z ( x ) = 2 x . ,
10.
211
,
, z ( x ) < 2 . , . , .
17 17
h(x)
16
17 17
h(x)
16
14
14
12
12 12
9
11
10
8
6
4
z(x)
9 10 11
3
2
1
0
2
1
9 10 11
z(x)
2
1
. 58. . 6
11
.
( 2 2). 1 x y , zc 3:
z ( x, y ) z ( x, y ) 4,
z ( x, y 1) z ( x, y 1) + 1,
z ( x 1, y ) z ( x 1, y ) + 1
( 2)
z ( x , y )>3
z (0, y ) = z ( x,0) = z ( N + 1, y ) = z ( x, N + 1) = 0
( 2)
2 . . 2 ,
, , . , , ,
212
10.
,
.
.
z ( x, y ) , 1D, z ( x, y )
x y .
, z ( x, y ) 4 ,
. ,
, , ,
.
.
z ( x, y ) 4 , z ( x, y ) = 0 2 , . , .
,
.
, ,
.
z ( x, y ) 4 , .
. 59 500 500,
z ( x, y ) 0 3.
10.
213
. 59.
( z ( x, y ) = 3 ), 2 2,
z ( x, y ) = 4 , . . , , . x , y , , , . . 60
, . , . 59. ,
.
D ( s )
. , ,
z ( x, y ) = 3 z ( x, y ) = 4 ,
s ,
, z ( x, y ) 4 . . 61 a
D ( s ) , 500 500.
:
D ( s ) s , 2 D 1,1.
214
10.
(4354, 100578)
(696, 25340)
. 60. ,
, D(t )
, . s
, t ,
.
:
D(t ) t , 2 D 0.54.
, ,
, .
,
, . ,
(. . 11.3),
.
.
10.
10
10
10
10
10
10
-1
215
D(s)
= -1,085 0,015
-2
-3
-4
-5
-6
S
0
10
10
10
10
D(t)
10
-3
= 0,54 0,02
10
10
10
-4
-5
-6
t
0
10
10
10
10
11.
,
.
11.1.
(B. Mandelbrot)
1975
. , , : , , - [37]. ,
, , .
, , , .
11.
217
.
. (J. Hutchinson) 1981 . , ,
, . . , .
, . , , , , .
,
d , , ,
1 ( d = 1 ),
d = 2 , . . . , . (F. Hausdorf) . . .
, ,
, , . .
, . , , , .
(18681942)
218
11.
G d .
d ,
, . . i < .
d
:
ld ( ) = id .
i
Ld ( ) = inf
i,
i<
d
i
, d , , :
lim Ld ( ) 0.
0
d :
lim Ld ( ) .
0
, d x , :
0, d > d x ,
lim Ld ( ) =
0
, d < d x ,
(
). ( d x =1,
d x = 2, d x = 3 . .)
, ,
. d c . . , , :
Ld ( ) = N ( ) dc ,
11.
219
N ( ) , G .
( 0 ) :
d c = lim
0
log N ( )
,
log
. ,
d c , , : d x d c . , [13].
.
N ( ) , N ( ') ' . :
N ( ) ~
N ( ') ~
1 ,
'd
dc :
dc =
log ( N ( ) N ( ') )
.
log ( ')
11.2.
, .
,
(H. Von Koch) (. 62), ,
1. . , ,
1/3 ,
1/3
.
11.
220
. , n-
n, ,
n, , .
, . , 4 , 3 , . . n- n
3 ( 4 3 ) n .
. 62. 5
, , :
k
1 4
S = 1 + = 1,6.
3 k =0 9
.
. , .
: = 1, N ( ) = 3 . : ' = 1/ 3, N ( ') = 12 . :
dc =
log ( N ( ) N ( ') )
log ( 3/12 ) log 4
=
=
~ 1, 26 .
log ( ')
log ( 3)
log3
, 1915 . .
(V. Serpinski), . , , (. 63). .
. ,
,
11.
221
. , , .
, :
dc =
. 63.
, .
, .
,
. . ,
:
2
3
1
1 3 1 3 3 1
1 3 3 3
1
S = + + + = 1 + + + + =
= 1.
4 4 4 4 4 4
4 4 4 4
4
1
3/
4
, .
(. 64)
:
Z i +1 = Z i 2 + C , i = 0, 1, 2, ...
Z i +1 , Z i C .
C
. , Z i , (0, 0),
.
, Z i , C . Z i
, .
222
11.
, . ,
, ,
.
. 64.
X ' = AX + BY + C ,
Y ' = DX + EY + F ,
X , Y , X ', Y ' ,
A, B , C , D, E F .
IFS
, , Java-,
http://www.fractals.nsu.ru/fractals.chat.ru/ifs2.htm (. 65).
IFS , ,
( , ).
80- . (M. Barnsley) . (A. Sloane) ,
.
11.
223
, 500
1000 . .
. 65. : ) ;
) ; )
, ,
, ,
. ,
.
224
11.
. , , .
, L l ,
,
L = l 1 , = const. , , (. 66) 1.24 .
11.3.
, , , . ,
, , .
, . ,
, , .
-
. [131] [31].
, WWW . IBM Altavista [82].
11.
225
. 66.
(http://maps.google.com)
. , ,
, (, , , , )
[22, 23].
, [145, 20]. , - ,
, . , ,
,
(,
).
, ?
, , , . -
226
11.
.
-, , . , ,
, .
, -
News Is Free (http://newsisfree.com).
.
(. 67).
-
,
, .
. [2023],
, ,
().
. 67. (http://newsisfree.com)
11.
227
228
11.
. 67.
(http://newsisfree.com)
, ,
, , , , . , ,
, .
, ,
, . .
,
,
[20]:
N publ ( t ) = N k (t ) ,
N publ ( ); N k ( ); ; .
.
11.4.
(, ,
, . 67).
.
, , ,
, ,
, . .
,
. . , , -
11.
229
- -
InfoStream.
. .
, 14 069 ,
1 2006 . 31 2007 ., , :
OR OR
( AND ( OR OR Windows OR Linux)).
(. 68).
, , , .
11.4.1. DFA
DFA (Detrended Fluctuation Analysis) [121] .
. 68.
( ) ( )
230
11.
DFA , .
.
( F
Fn , n = 1, ..., N ) y (k ) :
k
y ( k ) = Fi F
i =1
y (k ) , k = 1, ..., N n, ,
y (k ) .
yn (k ) ( yn (k ) = ak + b )
.
D(n) n:
D (n ) =
1
N
[ y (k ) y
k =1
(k ) ] .
2
, D (n ) D (n) ~ n ,
. .
ln D ~ ln n , .
. 69, D (n ) n , . . .
11.4.2.
X t ( , , ,
t , t = 1, ..., N ), :
F (k ) =
N k
1
N k
(X
k +t
m)( X t m),
t =1
m , ,
, 0 ( -
11.
231
t t m). , X
.
. 69. D(n) ( )
n ( )
, ,
,
.
, . . :
a0
X t = + an cos(nt + n ),
2 n =1
:
a02 1 2
F (k ) = + an cos nk .
4 2 n=1
[59] , ,
, n .
11.
232
X , N S :
X t = N t + St .
(
m = 0 ):
F (k ) =
=
=
N k
1
N k
N k
1
N k
( N
k +t
Xt =
+ Sk +t )( N t + St ) =
t =1
N k
1
N k
k +t
t =1
N
t =1
k +t
Nt +
N k
1
N k
S
t =1
k +t
St +
N k
1
N k
k +t
St +
t =1
N k
1
N k
k +t
Nt .
t =1
, , . N S
, . , S . X .
, . , ( ), ,
.
, .
:
11.
233
. 70.
: ( ).
, X
N :
R(k ) =
F (k )
,
2
F (k ) ; 2 .
. 71 (
k, R(k).
, -
(. 72).
, , ,
(. 73).
234
11.
. 71.
11.
235
. 72. R ( k ) ( )
k ( )
236
11.
. 73. R ( k ) ( ),
k ( )
11.4.3.
:
F ( k ) = 1 + Ck 2 H 1 ,
C H . . 73 F (k ) , C 6.8 H 0.65 .
11.
237
. 74.
11.4.4.
(H. E. Hurst) H R / S , R
, S [102]. . . (18801978) ,
: R / S = ( N / 2) H . [58] ,
D :
D =2H .
, , . : ,
, ; ,
, ,
, . . . . ,
238
11.
,
, .
, , , . ,
, . .
, ( ). H > , , ,
. H < , , . H = .
F ( n ) ,
n = 1, ..., N , ,
, :
R / S = ( N / 2) H , N >> 1 .
S :
S=
F
1
N
( F (n)
1
N
n =1
),
2
F (n),
n =1
R :
R( N ) = max
X ( n, N ) min
X ( n, N ),
1 n N
1 n N
X ( n , N ) = ( F (i ) F
n
i =1
).
,
, , n H 0.65 0.75. , H -
11.
239
, ( , ). , F (n ) ( ),
D ,
D = 2 H 1.35 1.25.
,
-. , , . . , .
.
, , , .
.
11.5.
, , [58].
[8, 15, 24, 42] .
, (, ) . (Y. Kantor),
. , [ 0, 1] ,
100 % (. 75 ).
11.
240
)
. 75. ( ):
; ;
;
(
) p -
, (1 p ) - (
), : 1 > p > (1 p ) > 0 . , [ 0, 1] -
11.
241
0, 0.5) , [ 0.5, 1] (. 74 ).
, , . , , ( ) ,
, ( ) p -, ( )
(1 p ) -. , 2
p 2 - , (1 p ) -
(. 74 ).
(. 75 ) , .
, . 76.
. 76.
. ,
x * ,
x * = 1/ 5 , : 1/ 5 0.00110011...
( L ) ,
. 75, ( R ).
242
11.
, LLRRLLRR...
x * = 1/ 5 , , , p , 1 p . ,
, , x * = 1/ 5 , 4
p 2 (1 p ) , n - :
nk
x * p k (1 p ) ,
n - k n k .
nk
n 1/ 2 n , p k (1 p )
Cnk = n !/ (n k )! k ! (). , ,
nk
. 75, p k (1 p ) ( k n k ).
,
nk
nk
k
p (1 p ) ( p k (1 p ) ) :
nk
Cnk p k (1 p ) .
n >> 1 , ,
Cnk :
k
k
Cnk
n
k n
k
1
n
=2
k
nH
n
k
H
n
1
= n
2
H ( ) = log 2 (1 ) log 2 (1 ),
=k /n.
H ( ) . 77.
k / n (, n ), D .
11.
243
. 77. H ( )
, n 1/ 2 n ,
Cnk , :
k
log 2 Cnk
D = lim
= H .
n
n
n
log 2 (1/ 2 )
, H ( ) (. . 76),
.
k / n , ( ). , n
. , k = 0 k = n
. , , p n ,
, (1 p ) n .
. L L .
244
11.
L L ,
.
f () ( ), . f ()
L L ,
( , q ).
,
Dq . , Dq :
N
1 ln p
,
D = lim
q 1 ln r
i =1
r 1
pi , (
) r .
,
Dq q :
( q) = (1 q) Dq .
f () ( q) :
( q) = f () q,
q :
d
( q f () ) = 0.
d
, Dq ( ( q) ),
:
f ( ( q)) = ( q) + q( q),
( q) =
d ( q )
.
dq
11.
245
f () (
q )
q f .
.
Z i = i
pi
Dq .
0, N
n = N / m () m .
:
SmZ ( q) = ( Z k( m ) ) ,
n
k =1
(m)
k
= Z ( k 1) m+l .
l =1
, , log SmZ ( q)
log m ,
[128], . , ( q) :
log SmZ ( q) Z ( q) log m + const.
( - , 2007 . 2008 .),
, (
). . 78.
11.
246
100
80
60
40
20
0
50
100
150
200
250
300
350
400
450
. 78.
( ) ( ):
,
. 79 (m, q) q m
. :
f (( q)) = ( q) q '( q),
(. 79).
f
11.
247
( ) .
, . 80.
, , , .
(. 81),
. , , , , ,
, , , , .
. 79. (q, m) ( )
248
11.
. 80. f ( q, m )
11.
249
. 81.
( ) (*)
, , ,
, , . ,
, ,
,
(
) .
,
,
. 1965 , , ,
. , ,
,
[26].
. ,
, , , . ,
, , (. 82),
y = Ae kt , y
251
, t , A
, k .
, , . ,
. , , ,
, .
,
, .
. 82. -
( Netcraft 2008 )
, , , , , , , ,
.
252
253
, ,
,
. , , , .
,
(XML). ,
-.
XML ,
XML-, : W3C, DTD, XML Schema, XQuery
( XML-) . . RDF.
. , -
. . 2004 . W3C
OWL (Web Ontology Language).
OWL , . -
254
,
, , .
.
, :
XML, ;
RDF, ;
OWL,
.
, (. 81),
Universal Resource Identifier (URI), , .
, URI, , . URI- URL-, URI- , .
, . , .
255
. 81.
, . ,
, , ,
.
,
.
ARPANET
BFS
DBMS
DDos
DFA
DNS
DTD
HAC
HITS
HTML
HTTP
IETF
-
Advanced Research Projects Agency Network,
Breadth First Search,
Database Management System,
Distributed Denial of Service,
Detrended Fluctuation Analysis,
Domain Name System,
Document Type Definition,
Hierarchical Agglomerative Clustering,
-
Hyperlink Induced Topic Search,
HyperText Markup Language,
HyperText Transport Protocol,
Internet Engineering Task Force,
IRS
ISM
LSA
OSI
OWL
P2P
PLSA
RBFS
RDF
RFC
RWA
Salsa
SNA
SCC
SQL
SVD
SVM
TCP/IP
TREC
URI
URL
W3C
257
Information Retrieval System,
-
Intelligent Search Mechanism,
Latent Semantic Analysis, -
Open Systems Interconnection Reference Model,
Web Ontology Language, -
Peer-to-peer,
Probabilitstic Latent Semantic Analysis,
-
Random Breadth First Search,
Resource Description Framework,
Request for Comments,
Random Walkers Algjrithm,
Stochastic Approach for Link-Structure Analysis,
Social Network Analysis,
Strongly Connected Component,
Structured Query Language,
.
Singular Vector decomposition,
Support Vector Mashine,
Transmission Control Protocol/Internet Protocol,
Text Retrieval Conference,
Universal Resource Identifier,
Universal Resource Locator,
World Wide Web Consortium, W3C
258
WWW
WAIS
XML
World-Wide Web,
Wide Area Information Service,
Extensible Markup Language,
( . Summarization)
, .
, .
(. Weighting) ,
.
, , .
,
, , . -
, .
( . Hypertext) ,
( ).
. . , , . -, ,
HTML.
, (. Hyperlink) . ,
. , ,
, HTML-, , FTP WWW-.
( ) ,
260
( ,
). ,
.
(. IRS Index) -
, . - , .
, (Internet) ,
,
TCP/IP. .
(. Information space) , , , .
, - (. Information
Retrieval System, IRS) , , . - , . , -,
- -.
(. Keyword):
1. , .
2. , ,
.
( . Content) .
(, -) , , .
- .
, .
- ,
- .
( . cache) , , -
261
.
, ,
.
- ( . Latent Semantic Analysis,
LSA) -
. - , , ,
, . LSA -, .
( . Lemmatization)
, () . , . .
,
, .
, . , -
, -.
( . Relation )
, :
;
- : , , ;
: , ,
;
, .
, .
, . . , ,
, .
262
, , a
.
(. Search Engine)
- . , ( ),
.
, (. Recall) -
.
(. Full-text search engine)
- ,
( ) .
( . profile ) ()
, .
( . Ranking)
, , .
( . Relevancy ) , , - ,
. .
. , -
, .
, , :
;
. , ;
.
( . Semantic Web)
W3C,
, , , -
263
, WWW , ,
. XML.
, ,
.
( . Snippet , ) , -, , .
(SPAM) , ,
,
. .
( . Stemming) ,
. , , : , . .
- (. Stop words) ,
/ . . - ,
, .
(. DBMS)
, , .
, :
;
;
/ ;
.
( . tag):
1. , .
2. . , .
264
( . Text corpus) ,
, .
, - .
, ()
.
( . Term) .
.
, , ,
, , , . , . . . . 1, 2- . . .
( . Fractus , )
( ) (),
.
, , . :
.
; - .
ARPANET (Advanced Research Projects Agency Network, )
, . 1969
(Defense Department's Advanced Projects Research Agency).
ARPANET , . 1990 .
Data Mining ( ):
1. Data mining ,
(G. Piatetsky-Shapiro, GTE Labs)
2. Data mining (selecting),
265
(patterns)
(SAS Institute).
Deep Web (, , )
WWW- , . , , ,
. Deep Web
-,
.
DNS (Domain Name System) ( ), IP- TCP/IP.
DNS
IP .
HTML (HyperText Markup Language)
. HTML-
(), , , . HTML ,
. HTML
SGML.
HTTP (HyperText Transport Protocol)
, WWW.
- .
MARC , 1966 16
. 1972
-2 .
OSI (Open Systems Interconnection Reference Model) . .
.
OWL (Web Ontology Language) - XML/RDF. - OWL
- , .
.
266
P2P (Peer-to-peer) ,
. ,
(peer) , . -,
.
RDF (Resource Description Framework)
W3C .
, . , .
RFC (Request for Comments) , ,
. RFC 1969 . RFC
.
SQL (Structured Query Language)
,
.
TCP/IP (Transmission Control Protocol/Internet Protocol)
, ( ) . :
TCP (Transmission Control Protocol) , ;
IP (Internet Protocol) , , .
Text Mining . Text Mining ,
, , .
. Text Mining
.
W3C World Wide Web Consortium W3C , 1994 . CERN DARPA .
W3C -
267
[1] . ., . . //
, 06. 2003. URL: http://www.osp.ru/pcworld/2003/06/165855/
[2] . .
// 1996. URL: http://libconfs.narod.ru/1996/4s/4s_p1.html
[3] . . - // - . . 1. . 6. 2004. . 2027.
[4] . ., . ., . ., . .
// . . . : . . 4 / .
.:
,
2007.
. 440464.
URL:
http://window.edu.ru/window_catalog/ redir?id=45290&file=440-464.pdf
[5] . . : //
- . . 1. . 3. 2003. . 110.
[6] . . . .: ,
1971. 240 .
[7] ., . // , 1991.
3. . 1624.
[8] . ., . . . :
, 2001. 128 .
[9] . . 4. , , . .: , 2005. 216 .
[10] . ., . . :
// - . . 1.
. 11. 2005. . 2133. URL: http://dwl.visti.net/art/nti05/
[11] . . .: , 1976.
[12] . . .: , 1972.
[13] . ., . ., . .
. . . 2. : , 2007. 263 .
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
269
. . . . 2: , , // . 11. . 62.
2006. URL: http://www.ccc.ru/magazine/depot/06_11/read.html?0302.htm
. . . 2-. .: ,
, 2006. 208 .
. . // . 4. 1997. URL: http://www.osp.ru/text/302/179189/
. . : . .
. . .: . ., 1989. 320 .
. . .: , 1973. 165 .
. . . : - , 1966.
. . // . . 2. 8. 2002. . 718.
. . // . . 2. . 12. 1985. . 1419.
. . // - . . 2. . 1. 2003. . 1
7.
. ., . .
// - . . 2.
. 2. 2004. . 1114.
. ., . ., . .
// . . . . 389.
2. 2003. . 279282.
. , . .
// '2001,
URL: http://www.dialog-21.ru/Archive/2001/volume2/2_26.htm
. // . 2003. 11. URL:
http://www.silicontaiga.ru/home.asp?artId=2066
. . // , 1965. . 1. . 1. . 25
38
. JXTA P2P // Java World. 10, 2001. URL:
http://www.javaworld.com/javaworld/ jw-10-2001/jw-1019-jxta.html
. ., . . .
.: , 1977. 280 .
., . // . 28(4). 2002. . 226242.
. . . .: , 2006. 240 . URL: http://dwl.visti.net/art/monogr-osnov/ spusk3.pdf
270
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
. . Internet. .: -, 2005.
URL: htt://poiskbook.kiev.ua
. . URL:
http://logic.pdmi.ras.ru/~yura/internet.html
. ., . . . M.: , 1990.
. ., . ., . ., . . . WEB-PLAN Group, 2001. URL:
http://www.nbuv.gov.ua/texts/libdoc/01nsaopi.htm
. . . .: , 1988. 176 .
. . .: , 2002. 656 .
. , . .: , 2004. 256 .
. ., . ., . . . , , . .; : , 2005. 368 .
. ., . ., . .
//
. , 2000. . 204210.
. . .: , 1971.
382 .
. ., . ., . . // . , . . 11, 2, 2003. . 3954.
. . . . 2-. .: ,
2001. 296 .
. // Intrnet. 1998. 2.
URL: http://www.citforum.ru/pp/search_03.shtml
. . : // EXPonenta Pro. , 2003. 1. URL:
http://nature.web.ru/db/msg.html?mid=1193685
. . // Internet. 2002.
10. URL: http://www.dialog-21.ru/direction_fulltext.asp?dir_id=15539
. . . 1. . : ., 2007. 640 .
URL: htt://book.itep.ru/1/intro1.htm
. ., . ., . . : . .: , - , 2007. 304 .
. ., . ., . ., . ., . . //
MegaLing'2006 -
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
271
272
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
[86]
[87]
Bak P., Tang C., Wiesenfeld K. Self-organized criticality // Phys. Rev. A., 1988.
Vol. 38. 1. P. 364374.
Bandini S., Mauri G., Serra R. Cellular automata: From a theoretical parallel
computational model to its application to complex systems // Parallel Computing. Vol. 27, Issue 5, April 2001. P. 539553.
Bell A., Fosler-Lussier E., Girand C., Raymond W. Reduction of English function words in Switchboard // Proceedings of ICSLP-98. Vol 7. 1998. P. 3111
3114.
Berners-Lee T., Hendler J., Lassila O. The Semantic Web. Scientific American,
2001. URL: http://www.sciam.com/article.cfm?articleID=00048144-10D21C70-84A9809EC588EF21
Berry M. W. Survey of Text Mining. Clustering, Classification, and Retrieval.
Springer-Verlag, 2004. 244 p.
Bhargava S. C., Kumar A., Mukherjee A. A stochastic cellular automata model
of innovation diffusion // Technological forecasting and social change, 1993.
Vol. 44. 1. P. 8797.
Bjorneborn L., Ingwersen P. Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14):
12161227. 2004.
Boyle A. Net not as interconnected as you think. URL: http://news.zdnet.com/
2100-9595_22-502388.html
Bradford S. C. Sources of Information on Specific Subjects. Engineering: An
Illustrated Weekly Journal (London), 137, 1934 (26 January), p. 8586.
Brin S., Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW7, 1998.
Broadbent S. R., Hammersley J. M. Percolation processes // I. Crystals and
mazes, Proc Cambridge Philos. Soc. P. 629641. 1957.
Broder A. Identifying and Filtering Near-Duplicate Documents, COM00 //
Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching. 2000. P. 110.
Broder A., Kumar R., Maghoul F. etc. Graph structure in the Web // Proceedings of the 9th international World Wide Web conference on Computer networks: the international journal of computer and telecommunications networking.
Amsterdam,
2000.
P. 309320.
URL:
http://www.almaden.ibm.com/cs/k53/ www9.final/
CJC Burges. A Tutorial on Support Vector Machines for Pattern Recognition.
URL: http://www.music.mcgill.ca/_rfergu/adamTex/references/Burges98.pdf
Cohen R., Erez K., ben-Avraham D., Havlin S. Resilience of the Internet to.
Random Breakdown // Phys.Rev.Lett. 85, 4626 (2000).
Donetti L., Hurtado P. I., Munoz M. A. Entangled Networks, Synchronization,
and Optimal Network Topology // Physical Review Letters. Vol. 95, 18,
2005.
Dorogovtsev S. N., Mendes J. F. F. Evolution of Networks: from biological networks to the Internet and WWW, Oxford University Press, 2003.
[88]
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
[98]
[99]
[100]
[101]
[102]
[103]
[104]
273
274
[105] Kleinberg J. M. Authoritative sources in a hyperlink environment. // In Processing of ACM-SIAM Symposium on Discrete Algorithms, 1998, 46(5):604632.
[106] Landauer T. K., Foltz P. W., Laham D. An introduction to latent semantic analysis. Discourse Processes. Vol. 25. 1998. P. 259284.
[107] Lande D. Model of information diffusion // Preprint Arxiv (0806.0283), 2008.
5 p. URL: http://arxiv.org/abs/0806.0283
[108] Lempel R. and Moran S. The stochastic approach for link-structure analysis
(SALSA) and the TKC effect // In Proceedings of the 9th International World
Wide Web Conference, Amsterdam, The Netherlands, 2000. P. 387401.
[109] Lu Q., Cao P., Cohen E., Li K., Shenker S. Search and replication in unstructured peer-to-peer networks. // Proc. of ICS02, New York, USA, June 2002.
[110] Manber U. Finding similar files in a large file system. Proceedings of the
1994 USENIX Conference, p. 110, January 1994.
[111] Manning C. D., Schtze H. Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press, 1999.
[112] Maymounkov P., Mazi`eres D. Kademlia: A Peer-to-peer Information System
Based on the XOR Metric. URL: http://kademlia.scs.cs.nyu.edu
[113] Milgram S. The small world problem, Psychology Today, 1967, Vol. 2. P. 60
67.
[114] Miller E., Swick R., Brickley D., McBride B., Hendler J., Schreiber G., Connolly D. Semantic Web. W3C (MIT, ERCIM, Keio) 2001. URL:
http://www.w3.org/2001/sw/
[115] Mockapetris P. Domain Names Concepts and Facilities // Request for Comments: 1035, 1987. 55 p.
[116] Newman M. E. J. The structure and function of complex networks // SIAM Review. 2003. Vol. 45. P. 167256.
[117] Newman M. E. J., Watts D. J. Scaling and percolation in the small-world network model, Phys. Rev. E, 7332, 1999.
[118] Onnela J.-P., Saramaki J., Hyvonen J., Szabo G., Lazer D., Kaski K., Kertesz J.,
Barabasi A.-L. Structure and tie strengths in mobile communication networks.
Proceedings of the National Academy of Sciences. May 1, 2007, vol. 104.
18, 73327336.
[119] Page S. E. Computational models from a to z // Complexity. Vol. 5, Issue 1, 1999. P. 3541.
[120] Papka R. On-line News Event Detection, Clustering, and Tracking. Ph. D. Thesis, University of Massachusetts at Amherst, September 1999.
[121] Peng C.-K., Havlin S., Stanley H. E., Goldberger A. L. Quantification of scaling
exponents and crossover phenomena in nonstationary heartbeat time series //
Chaos. Vol. 5. 1995. P. 82.
[122] Piatetsky-Shapiro G., Fayyad U., Smith P. Advances in Knowledge Discovery
and Data Mining. Cambridge, Mass: AAA/MIT Press. p. 135. 1996.
[123] Platt J. Sequential Minimal Optimization. URL: http://research.microsoft.com/
users/jplatt/smo.html
275
[124] Powell A. L., French J. C., Callan J., Connell M., Viles C. L. The Impact of Database Selection on Distributed Searching // Proc. of ACM SIGIR'00, pages
232{239, Athens, Greece, 2000.
[125] Program to evaluate TREC results using SMART evaluation procedures. URL:
http://www-nlpir.nist.gov/projects/trecvid/trecvid.tools/ trec_eval/ README
[126] Redner S., Directed and diode percolation. Phys. Rev. B, 25, 3242, 1982.
[127] RFC1625 WAIS over 39.501988. Network Working Group. Request for
Comments: 1625. M. St. Pierre, J. Fullton, K. Gamiel, J. Goldman, B. Kahle,
J. Kunze, H. Morris, F. Schiettecatte, 1994. URL: http://www.faqs.org/rfcs/
rfc1625.html
[128] Riedi R. H., Vehel J. L. Multifractal Properties of TCP traffic: a numerical study
// Technical Report 3128 INRIA Rocquencourt. March 1997.
[129] Rocchio J. Relevance feedback in information retrieval // In G. Salton ed., The
SMART Retrieval System: Experiments in Automatic Document Processing,
Englewood Cliffs, New Jersey, Prentice-Hall, p. 313323, 1971.
[130] Salton G., Fox E., Wu H. Extended Boolean information retrieval. Communications of the ACM. 2001. Vol. 26. 4. P. 3543.
[131] Salton G, Wong A, Yang C. A Vector Space Model for Automatic Indexing //
Communications of the ACM, 18(11):613620, 1975.
[132] Sarshar N., Boykin P. O., Roychowdhury V. P. Scalable Percolation Search in
Power Law Networks. Preprint. 2004. URL: http://arxiv.org/abs/condmat/0406152
[133] Scime A. Web mining: application and techniques. Idea Group Publishing,
2005. 427 p.
[134] Sebastiani F. Machine Learning in Automated Text Categorization. URL:
http://nmis.isti.cnr.it/sebastiani/Publications/ACMCS02.pdf
[135] Simon H. A. Biometrika 42, 425 (1955).
[136] Snarskii A. FreeBSD Stack Integrity Patch. 1997. URL: ftp://ftp.lucky.net/pub/
unix/local/libc-letter
[137] Soumen C. Mining the web. Discovery knowledge from hypertext data. Publisher: Morgan Kaufmann, 2002. 344 p.
[138] Stanley H. E., Amaral L. A. N., Goldberger A. L., Havlin S., Ivanov P. Ch.,
Peng C.-K. Statistical physics and physiology: monofractal and multifractal
approaches // Physica A. 1999. Vol. 270, p. 309.
[139] Stauffer D., Aharony A. Introduction to percolation theory. Taylor & Francis,
London, Washington DC, 1992. 182 p.
[140] Stanley H. E., Amaral L. A. N, Goldberger A. L., Havlin S., Ivanov P. Ch.,
Peng C.-K. Statistical physics and physiology: monofractal and multifractal
approaches // Physica A. 1999. Vol. 270. P. 309.
[141] The Deep Web: Surfacing Hidden Value, 2000 BrightPlanet.com LLC, 35 p.
URL: http://www.dad.be/library/pdf/BrightPlanet.pdf
[142] The Twelfth Text Retrieval Conference (TREC 2003). Appendix 1. Common
Evaluation Measures. URL: htt://trec.nist.gov/pubs/trec12/
276
[143] Ukkonen E.
On-line
construction
of
suffix
trees
URL:
http://www.cs.helsinki.fi/u/ ukkonen/SuffixT1withFigs.pdf
[144] Understanding the Impact of P2P: Architecture and Protocols URL:
http://www.cachelogic.com/home/pages/understanding/architecture.php
[145] Van Raan A. F. J. Fractal geometry of Information Space as Represented by
Cocitation Clustering // Scientometrics. 1991. Vol. 20, 3. P. 439449.
[146] Vapnik V. N. Statistical Learning Theory. NY: John Wiley, 1998. 760 p.
[147] Watts D. J., Strogatz S. H. Collective dynamics of small-world networks. //
Nature. 1998. Vol. 393. p. 440442.
[148] Wikipedia, Support Vector machine. URL: http://en.wikipedia.org/wiki/ Support_vector_machine
[149] Wolfram S. A New Kind of Science. Champaign, IL: Wolfram Media Inc.,
2002. 1197 p.
[150] Wolfram S. ed. Theory and Applications of Cellular Automats. Singapore:
World Scientific. 1986.
[151] Yang B., Garcia-Molina H. Comparing hybrid peer-to-peer systems // Proc. of
VLDB'01, Rome, Italy, 2001.
[152] Yang B., Garcia-Molina H. Efficient Search in Peer-to-Peer Networks // Proc.
of ICDCS'02, Vienna, Austria, 2002.
[153] Yeager N., McCrath R. WebServer Technology. Morgan Kaufmann, San Francisco, California, 1996.
[154] Zamir O. E. Clustering Web Documents: A Phrase-Based Method for Grouping
Search Engine Results. PhD Thesis, University of Washington, 1999.
[155] Zeinalipour-Yazti D. Information Retrieval in Peer-to-Peer Systems //
M. Sc Thesis, Dept. of Computer Science, University of California Riverside, June 2003.
[156] Zeinalipour-Yazti D., Kalogeraki V., Gunopulos D. Information Retrieval in
Peer-to-Peer Networks // IEEE CiSE Magazine, Special Issue on Web Engineering,
2004.
p. 113.
URL:
www.cs.ucr.edu/~csyiazti/papers/
cise2003/cise2003.pdf
[157] Zhou S., Mondragon R. J. Topological Discrepancies Among Internet Measurements Using Different Sampling Methodologies, Lecture Notes in Computer
Science (LNCS), Springer-Verlag, 3391, p. 207217, Feb. 2005.