Академический Документы
Профессиональный Документы
Культура Документы
. .
. .
OL D ata Mwwg
O LA P
Data M ining -
,
A. .
. .
B. .
. .
O L A P hD ata MiNiNg
071900
654700
-
-
2004
681.3.06(075.8)
32.973.26-018.273
26
26
. ., . ., . ., . .
: OLAP Data Mining.
.: -, 2004. - 336 .: .
ISBN 5-94157-522-
:
, (OLAP)
(Data Mining) .
: , .
. Data
Mining. -, Data
Mining, Xelopes,
.
681.3.06(075.8)
32 973.26-018.273
:
.
.
ISBN 5-94157-522-
A A., Kynpi
. ., 2004
, 5
.................................................................................................... 9
Data Mfiting ..................................................................11
1. .................................................. 13
1.1. .......................................................13
1.2. ................................................................................16
1.3. OLTP- .................21
........................................................................................................................... 26
2. ......................................................................................27
2.1. ..............................................................................27
2.2. ...................................................................................................... 34
2.3. ....................................................................................................... 39
2.4. ..................................................................................45
........................................................................................................................... 45
3. OLAP-............................................................................................ 49
3.1. .................................................................................49
3.2. OLAP-.....................................................................................53
3.3. .......................................................54
3.3.1. ................................................................................54
3.3.2. .....................................................................55
3.3.3. FASMI.....................................................................................................57
3.4. OLAP-......................................................................................58
3.4.1. MOLAP............................................................................................................ 59
3.4.2. ROLAP............................................................................................................ 62
3.4.3. HOLAP............................................................................................................ 65
........................................................................................................................... 66
4. .......................................................... 67
4.1. Data Mining...............................................................................67
4.2. Data Mining.................................................................................................. 68
__________________________________________________________________________
6.1.2. ...............................................................................132
6.1.3. ................................. 135
6.2. ................................................................................... 137
6.3. ..............................................................................................................141
6.3.1. Apriori........................................................................................... 141
6.3.2. Apriori..................................................................146
......................................................................................................................... 147
7. ........................................................................................... 149
7.1. ........................................................................149
7.1.1. .................................................................... 152
7.1.2. , ,
..........................................................................154
7.2. ................................................................................... 156
7.3. .......................................................................158
7.3.1. .......................................................... ............... 158
7.3.2. ............................................................................159
.........................................................................159
...............................................................................161
7.3.3. ......................................................................... 162
-means (Hard-c-means)............................................................... 162
Fuzzy C-Means.............................................................................166
-.................................................... 168
7.4. ................................ 174
7.4.1.
...........................................................................................174
............................................................. 174
....................................................................................... 179
-......................................................... ............. 181
7.4.2. -............................................................ 182
-
........................................................ - ........... 190
-
...................................................................................... 191
-...............................................................................192
......................................................................................................................... 204
8. Data Mining............................................................................209
8.1. .............................................................................................. 209
8.2. CWM......................................................................................................209
8.2.1. CWM......................................................................... 209
8.2.2. CWM..............................................................................211
8.2.3. Data Mining.........................................................................................214
8.3. CRISP.....................................................................................................218
8.3.1. CRISP......................................................................... 218
8.3.2. CRISP..........................................................................218
8.3.3. CRISP...................................................................220
8.4. PMML....................................................................................................225
8.5. Data Mining............................................................................. 233
8.5.1. SQL/....................................................................................... 233
8.5.2. OLE DB Data Mining..............................................................235
8.5.3. JDMAPI......................................................................................... 237
.........................................................................................................................237
9. Xelopes..................................................................................241
9.1. .......................................................................................241
9.2. Model.................................................................................................. 244
9.2.1. Xelopes.......................................................................... 244
9.2.2. Model...................................................................................246
9.2.3. .............................................................................. 247
9.3. Settings................................................................................................ 248
9.3.1. Settings..................................................................................248
9.3.2. Settings..................................................................................250
9.4. Attribute...............................................................................................250
9.4.1. Attribute.................................................................................250
9.4.2. .............................................................................. 251
9.5. Algorithms............................................................................................252
9.5.1. ........................................................................................ 252
9.5.2. MiningAlgorithm....................................................................................253
9.5.3. MiningAlgorithm.............................................................. 254
9.5.4. ............................................................................... 256
9.5.5. ..................................................................................................... 256
9.6. DataAccess...........................................................................................256
9.6.1. ........................................................................................ 257
9.6.2. MininglnputStream................................................................................258
9.6.3. Mining-............................................................................... 258
9.6.4. , MininglnputStream.......................................... 258
9.7. Transformation.....................................................................................259
9.8. Xelopes.....................................................261
9.8.1. ........................................................................................ 261
9.8.2. .......................................... 264
9.8.3. .................................................................... 266
9.8.4. .................................................................. 268
.........................................................................................................................271
1. ..............................................................273
1.1. ...................................... 273
1.2. .............................................................................................277
1.3. ............................................................................... 279
1.4. ........................................................................... 280
1.5.
.................................................................................................286
1.6. ...................................................................... 293
___________________________________________________________________________ 7
2. .....299
2.1. ....299
2.2. ................................ 304
2.2.1. ................................................................304
2.2.2. ...... 305
2.2.3. ............. 307
2.3. ........................................................ 310
2.4. , ............311
2.5. .........................................................................314
2.5.1. ....................................................................................... 314
2.5.2. ...................................................................................... 316
................................................................. 316
................................................................. 317
............................................................. 318
2.5.3. ..................................................................318
2.6.
......................................................................................... 319
3. -....................................327
................................................................................................... 331
............................................................................................. 335
,
.
,
.
, , ,
, , , ,
,
.
,
, ,
.
:
, .
.
. ,. 7.4 . . .
,
. ,
,
. ,
, "
". , ,
.
, .
10
,
,
, .
:
- Data Mining
.
. Prudsys
.
Data Mining
2002 , Berkeley,
(5,000,000,000,000,000,000) . ,
23 . , ,
, .
2003 France Telecom (DSS system) 30,000
, a Alexa Internet Archive 500,000 .
, (Knowledge Dis
covery in Data workshop), 1989 ,
(1,000,000) .
KDD-2003
(1 = 1,000 , 1 = 1,000 ).
- .
-
Data Mining.
Data Mining ( Knowledge Discovery In Data
) ,
. Data Mining
, ,
.
Data Mining 1989
2003
. Data Mining
. , ,
, , . ,
12
Data Mining
, , . .
Data Mining
, - (Google)
(search engines) .
, ,
, , , , . .
Data M ining
(link analysis), ,
, .
Data Mining.
1970- 2-
,
, .
, OLAP,
Data Mining.
Data Mining .
Data Mining, ,
.
Data Mining.
Xelopes Prudsys,
Data Mining.
, .
,
Data Mining .
-, KDnuggets
, , , 2004
1.1.
.
, (. 1.1),
, , .
.
. ,
, , ,
.
, ,
,
. ,
. ,
, ().
(DSS, Decision Support System).
,
. ,
, :
;
;
.
14
, , ,
, ,
.
,
, -.
,
, .
,
.
,
.
.
.
,
, . .
. ,
. ,
(, ,
15
. .) ,
. ,
, ,
. ,
,
. ,
.
""
:
-
.
;
-
, . -
;
,
, / (
) .
,
(. 1.2).
. 1.2.
16
.
. , OLTP (On
line transaction processing), ()
.
().
.
.
. :
-
SQL (Structured
Query Language);
.
OLAP
(On-line analytical processing),
;
.
Data Mining (" ").
1.2.
,
. ().
[! : ,
,
.
, , .
,
, .
- ,
.
, , ,
60- .
.
.
.
17
( )
Information Management System (IMS) IBM.
1968 .
,
.
Data Base Task Group (DBTG)
Conference on Data Systems Languages (CODASYL).
DBTG 1971 .
,
,
.
Integrated Database Management System (IDMS) Cullinet Soft
ware, Inc.
. "
" relatio .
.
. , 1970 .
12 .
1.
. ,
. ,
, , .
,
, .
,
.
2.
,
( 2, 3).
, .
.
1 , , .
, .
.
,
.
3. NULL
, NULL.
0.
18
4. :
.
, .
: (,
), , .
, . .
.
5.
.
SQL.
6.
, .
(View).
.
( ,
).
7.
,
.
.
SQL.
8.
, ,
( ,
.).
9.
, ,
(
).
, .
10.
.
:
;
, .
19
, ,
, . .
,
, .
: .
,
.
11.
,
.
12.
,
.
,
.
.
, .
, ()
.
.
, .
().
, , ,
. :
1- , , ,
( );
2- , 1- ,
(
);
3- , 2- ,
(
) . .
,
. ,
, , .
,
BLOB
20
21
.
.
.
,
.
.
OLTP-,
.
OLTP-
,
, , .
.
.
, ,
(, Executive Information Systems). ,
, ,
, - ,
. ,
, ,
.
,
.
1.3.
OLTP-
OLTP-
.
, ,
, .
, OLTP-,
.
, OLTP .
. I . I .
22
1.1
OLTP-
()
, 100 %
, OLTP
.
23
OLTP, , ,
. ,
,
(,
. .).
,
, . .
, ,
, ,
,
. .
OLTP-, , ,
( ).
" "
.
.
OLTP-,
, .
- .
(,
, ,
).
.
.
, ,
( ).
,
OLTP-, .
, .
.
, (view),
.
,
. (
) .
24
OLTP-
. ,
, .
,
. ,
,
. OLTP-
. ,
. , ,
(
, ).
,
, OLTP-
- .
, OLTP
,
. OLTP-
(,
). , ,
.
OLTP- -
.
,
(, ,
. .).
,
.
OLTP-, ,
,
. ,
(, )
, . .
. , OLTP,
.
.
.
,
OLTP-, ,
25
.
.
.
, , . .
, . .
, ,
. . ,
. . 1.3 Oracle OLTP,
. 1.4 ,
.
. 1.3.
OLTP
. 1.4.
OLTP-
, . .
.
, ,
.
OLTP- ,
,
.
,
.
OLTP
.
26
, ,
.
: ,
. -,
- .
, -
. ,
- ,
(OLAP).
Data Mining.
: ,
.
.
.
, .
,
-.
, ,
()
,
.
,
(OLTP-). ,
- .
OLTP-
- .
OLTP .
OLTP
.
OLTP-
.
2.1.
OLTP , ,
. 1.1, ,
().
. ,
, 1988 ., . 1992 . . "
".
,
.
,
OLTP- .
( , , . .)
, , , ,
( ).
() .
.
[! : -,
, , ,
.
28
.
. ,
(,
, , . .). ,
,
. ,
.
,
(,
-,
, ).
.
, ,
. ,
,
, -.
, .
.
.
.
, ,
.
. ,
, ,
. , ,
. , ,
.
,
.
.
,
. (. 2.1).
. , ,
29
, 1 % !
.
.
,
.
, , , ,
, . , ,
.
.
, .
(
) .
.
, .
()
. ,
.
(. 2.2).
:
, ;
, .
30
.
. ,
,
, .
,
.
. ,
, .
, .
OLTP-, , . .
.
.
.
,
, ,
. .
.
31
.
,
. OLTP-
,
.
.
,
,
. :
;
;
;
.
.
,
,
: , ,
, , .
,
.
,
, ,
. .
,
.
,
. ,
TPC-D, , 100
4,87 ,
.
32
( )
.
,
. ,
, , ,
, ,
. .
.
,
, ,
. ,
, .
,
,
. .
,
(Data Mart).
[! } ( ) ,
.
,
(,
,
). , ,
. ,
.
(. 2.3)
, .
:
;
;
.
33
(OLTP)
)
____
(OLTP)
(OLTP)
(LAP, Data
Mining)
(OLAP, Data
Mining)
. 2.3.
:
,
,
;
,
.
.
(. 2.4).
,
,
.
, ,
.
:
,
;
;
.
34
(OLTP)
(OLAP, Data
Mining)
(OLTP)
(OLAP, Data
Mining)
(OLTP)
. 2.4.
:
( , );
.
, :
() (. . 2.1);
(. . 2.2);
(. . 2.3);
(. 2.4).
/
()
.
2.2.
(. 2.5):
;
;
.
35
(OLTP)
(OLTP)
(OLTP)
( LAP,
Data
Mining)
. 2.5.
, .
, OLTP- (
, , .).
. ,
(, , , . .).
, (,
, . .).
.
.
(,
).
.
(, ).
() .
.
:
,
;
,
;
36
,
.
,
, .
.
,
. ,
.
. , ,
. ,
, .
. ( ).
, , , , :
( )
, . :
, ,
, . .;
( )
, .
, ,
(, , ,
. .);
( )
, , ,
;
( ) ,
.
(, ,
. .), ;
( )
(, , ,
, . .);
( ) ,
.
,
. .
37
,
.
.
, ,
,
(. . 2.5):
(Inflow) ,
;
(Upflow)
;
(Downflow)
, ;
(MetaFlow)
;
(Outflow) ,
;
(Feedback Flow) ,
.
. ,
:
.
, ,
. 60 %
.
, ,
, ETL- ( extraction, transformation, L
loading: , , ).
, , ETL-.
ETL-
.
ETL-
.
ETL- (. 2.6).
ETL-,
. :
1.
(,
38
, . .
:
OLTP- (
, );
.
2. OLTP- .
:
OLTP-,
;
OLTP-
;
.
t
t
(OLTP)
( LAP,
Data
Mining)
(OLTP)
(OLTP)
. 2.6. ETL-
,
.
:
(aggregation) .
. ,
,
39
.
, ;
(value translation)
,
. , , ,
. . .
,
;
(field derivation)
. ,
,
.
;
(cleaning)
.
, ,
, ,
"" .
,
.
, .
. ,
.
2.3.
, ,
. ,
, " ,
" "
. ,
.
.
:
;
;
40
;
;
.
.
.
, .
.
1. () ,
. ,
. ,
1000 10 000 "
"".
2. -
. OLTP-
,
,
. ,
( NULL).
3. , ,
. ,
,
. :
999-99-9999, 999,
99999. ,
, . ,
888-88-8888
- "" $99,999.99
, .
4. ,
, . ,
"" "" " "
10.
5.
, .
6. ,
.
(, ).
41
,
.
.
,
. :
: age = 22, bdate = 12.02.50.
. ,
,
.
.
. ,
, .
. -
,
:
(
, ):
empl=(name=MJohn Smith",...);
emp2= (name=" J. Smith", ...) ;
(
, ):
empl=(name="John Smith", bdate=12.02.70);
2=(name="J.Smith", bdate=12.12.70).
. , ,
, .
. ,
,
. :
: , ,
.;
;
-;
;
;
,
, . .
42
, ,
, .
, ,
. ,
,
, , ,
. .
, , ,
,
.
, :
;
;
;
.
.
.
.
: Data
Mining.
. , , ,
, , , ,
, , ,
(, ) .,
.
Data Mining
, .
Data Mining,
, , .
. ,
-,
,
. ,
, . , 99 %
43
" = "
1 % .
.
, ,
.
.
.
,
.
.
, ,
.
.
, , - ,
.
.
.
. ,
, .
.
, ,
.
.
.
( ).
.
. (
, .,
. .)
.
44
,
. ,
,
, . .
(), , . ,
.
,
.
.
.
.
,
, .
.
.
, .
, . ,
.
.
, , .
, , ()
,
.
.
.
.
45
2.4.
.
, ,
, ,
, ,
, , .
, :
, ,
;
.
, , ,
.
,
,
.
, .
:
;
;
.
.
. 7,
:
;
;
.
,
.
, ,
.
.
46
,
.
.
-, , ,
,
.
: . ,
,
,
.
,
.
,
. ,
, .
, .
: , ,
.
: , , , .
: .
, .
, .
.
,
.
, . ,
, ,
, , ,
, .
. ,
, , ETL-.
.
: , ,
, , .
47
:
, , ,
.
.
.
.
.
OLAP-
3.1.
. 2, ,
. . ,
.
.
.
. ,
. ,
, , . .
, . .
,
. ,
.
, , ,
,
. 1993 . . .
,
",
, . .
".
50
. , "
, "" , , .
.
, .
, (multi-dimensional
conceptual view) ,
,
.
.
.
, ""
: " ". ,
. , ""
: " " " ".
(Dimensions) ,
, (Measures).
,
, , . .
zzz^
///77
()
. 3.1.
,
(. 3.1) (, ,
OLAP-
51
,
). , .
:
(Slice) (. 3.2)
,
, .
, " "
, .
, ,
,
(, "", "", "" . .).
"" ,
.
/ / / V V
/ , . / / / . / .
/
A r : /~ /z 7
/
/
/ :/
/
/ /
/ /
/ s
/
/
\ \ Z.Z
/
//
//
/
/
/
/
/
/
/
. 3.2.
(Rotate) (. 3.3) ,
. ,
, .
,
, ,
(
).
,
"" (
), "" (
52
).
: "
, ""
.
"" "", ,
"", , ,
"" , ""
"" . ""
"".
""
"", , ,
"",
"" (: Pivot).
OLAP-
53
. 3.4.
3.2. OLAP-
, OLAP-.
[BwmaHue!},OL (On-Line Analytical Processing)
,
,
.
OLAP-
, ( ad-hoc) -. OLAP- .
OLAP
. . 1993 . "OLAP -: .
12 , ,
.
54
3.3.
3.3.1.
12 , OLAP.
1. OLAP-
,
.
2. OLAP-
, , ,
.
3. OLAP-
, ,
, .
4.
OLAP-
, .
5. - OLAP-
-", . . ,
,
. ,
OLAP
.
6. OLAP-
, .
,
.
7. OLAP- .
,
.
8. OLAP-
OLAP-
55
. ,
,
.
9. OLAP-
,
, ,
. ()
,
.
10. OLAP-
, ,
. ,
,
.
11. OLAP-
, . .
.
,
.
, ,
^ , N
. , ,
, ,
(),
, .
12.
,
, ,
19 . ,
15,
20 . ,
.
3.3.2.
, - OLAP,
, , 1, 2, 3,
6 , 10, 11
56
. , 12
OLAP. 1995 .
:
13. OLAP-
,
.
14. OLAP- OLAP-
, :
, , .
15. OLAP-
.
, OLAP,
, .
16. OLAP:
OLAP-, -,
.
, .
17. OLAP-,
, .
,
.
18. OLAP-
.
17- .
, 18 ,
. , S, R D.
() :
( 1);
( 10);
( 3);
( 13);
OLAP- ( 14);
"-" ( 5);
( 2);
( 8).
OLAP-
57
(5'):
( 15);
OLAP:
( 16);
( 17);
( 18).
(R):
( 11);
( 4);
(
7).
(D):
( 6);
( 12);
( 9).
3.3.3. TeciFASMI
.
FASMI (Fast of Shared Multidimensional Information), 1995 .
(Nigel Pendse) (Richard Creeth)
.
, , ,
, . .
. , OLAP
: Fast (), Analysis (), Shared (),
Multidimensional (), Information ().
.
FAST () OLAP-
5 .
1 , 20 .
,
,
30 . <Alt>+<Ctrl>+<Del>,
,
. ,
, ,
.
58
, "
". ,
.
ANALYSIS () OLAP-
, ,
, .
,
.
.
SHARED () OLAP-
(,
). ,
.
.
MULTIDIMENSIONAL () OLAP-
,
,
.
, ,
.
,
.
INFORMATION () OLAP-
.
,
, .
. OLAP-
1000
OLAP-.
, ,
, ,
, . .
3.4. OLAP-
OLAP- :
OLAP- ,
OLAP-
59
. OLAP-
;
OLAP-
,
.
OLAP-
. ,
OLAP- ,
. , . .
, .
:
M O LA P
;
R O LA P
;
HOLAP
.
OLAP-
DOLAP JOLAP.
DOLAP (desktop) OLAP.
OLAP-,
,
.
JOLAP , Java, OLAP-API-,
OLAP. Hyperion Solutions.
, API, IBM, Oracle
.
3.4.1. MOLAP
MOLAP-
.
. .
, . .
.
,
.
60
, , ,
"" .
,
(. 3.1).
3. 1
07.03.99
690
30
07.03.99
830
40
07.03.99
500
25
07.03.99
700
35
07.03.99
600
15
07.03.99
1500
100
07.03.99
690
30
07.03.99
830
40
07.03.99
500
25
07.03.99
700
35
07.03.99
2000
50
07.03.99
2250
150
07.03.99
230
10
07.03.99
1000
OLAP-
61
OLAP-:
,
, . .
,
;
,
SQL
, .
, :
, , ( )
2,5... 100 ;
, ,
,
.
. ,
, ,
, .
,
;
.
, .
,
:
(
), . . ;
;
;
62
,
.
3.4.2. ROLAP
ROLAP- . ,
,
. ,
, ,
,
, OLAP.
:
" (. 3.5) " (. 3.6).
Employee_Dim
'Li EmployeeKey
EmpioyeelD
EmployeeName
_ HireDate
CustonierJDini
Sales_Fact
CustomerKey
CustomerlD
CompanyName
ContactName
ContactTitle
Address
I f TimeKey
i f CustomerKey
: ShipperKey
ProductKey
o ....i--: 9 EmployeeKey
Product Dim
ProductKey
ProductID
ProductName
SupplierName
Li
CategoryName
ListUnitPrice
RequiredDate
LineltemFreight
City
Region
PostalCode
Country
LineltemTotal
LineltemQuantity
LineltemDiscount
Phone
Fax
Jim eJDim
TimeKey
TheDate
DayOfWeek
[Month]
[Year]
Quarter
DayOfYear
Shipper_Dim
ShipperKey
ShipperlD
ShipperNarne
Holiday
Weekend
YearMonth
WeekOfYear
. 3.5. ""
OLAP-
63
Em ployee_Dim
EmployeeKey
EmployeelD
EmployeeName
HireDate
|CategotyName
Oustom er_Dim
Product2_D im 1
Sales_Fact *
9 Timekey
ProductKey
ProductID
CategorylD
ListUnitPrice
CompanyName
CustomerKey
ShipperKey
ContactName
ProductKey
Address
EmployeeKey
RequiredDate
ContactTitle
City
*
LineltemFreight
LineltemTotal
CustomerKey
CustomerlD
LineltemQuantity
LineltemDiscount
Region
PostalCode
Country
Phone
_ Fax
Time_Dim
Li TimeKey
TheDate
DayOfWeek
[Month]
[Year]
Quarter
DayOfYear
Holiday
Weekend
YearMonth
WeekOfYear
. 3.6. ""
(Fact Table) (Dimension Tables).
, , ,
.
. :
, (Transaction facts).
(
);
, (Snapshot facts).
(, )
, .
;
64
, (Line-item facts).
(, )
(,
, , );
, (Event or state
facts). (
,
).
, , ,
. ,
.
,
.
,
, . .
.
, .
, ,
,
. ,
,
.
.
.
(
) , ,
( )
. , , ,
, "
.
"--" .
; ,
, ,
, .
"" (Snowflake Schema).
(. . 3.6).
OLAP-
65
,
,
.
, ,
.
,
.
,
,
,
,
" .
OLAP-
:
, ROLAP
.
, MOLAP;
,
, ROLAP-
, . .
;
.
ROLAP
. ,
MOLAP,
, . .
. ""
.
3.4.3. HOLAP
HOLAP- ,
ROLAP MOLAP. MOLAP,
, - , ROLAP
, .
66
HOLAP ROLAP
M O LA P . HOLAP
,
, ,
.
, ,
.
,
.
, . . .
.
. ,
.
:
, , .
OLAP-.
OLAP (On-Line Analytical Processing)
. ,
,
.
OLAP- 12 ,
18 :
, ,
.
1995 .
FASMI, OLAP "
".
OLAP- OLAP- OLAP-.
OLAP- (MOLAP),
(ROLAP) (HOLAP).
MOLAP
.
ROLAP
,
.
, .
, , .
, .
68
.
, ,
. ,
(, ),
Data Mining.
.
, ,
. ,
.
.
,
, . ,
.
Data Mining .
.
: , , .
69
. ,
, ,
.
.
(descriptive)
.
. ,
,
.
.
(predictive) .
.
. , , ,
.
.
,
.
supervised learning (
) unsupervised learning ( ).
Machine Learning ( ),
Data
Mining.
supervised learning
. - Data Mining
.
. , ,
, .
,
,
, ,
. .
Unsupervised learning , ,
,
. , ,
.
unsupervised learning.
70
- .
.
4.2.2.
,
, . . . ,
,
:
. ,
( ): ,
, , . .
" "".
.
( )
.
(, ,
, : "", "", "
. .).
. ,
10 ( ).
,
.
.
Data Mining
'
.
, ,
.
:
, , . .;
;
.
:
( "
"");
71
( Mspam" "mail);
( 0, 1,..., 9).
,
:
{, }, {spam, mail}, {0, 1,..., 9}.
,
.
, .
.
. ,
, .
:
,
, ;
, ;
.
.
.
:
, ,
. ,
;
,
;
.
( ).
.
,
(. 4.1).
. "+"
. ,
: "+"
.
72
, .
, "+
,
. .
,
, ,
, ,
,
, overfitting underfitting.
,
" " ,
. ,
,
. underfitting ,
. ,
,
.
4.2.3.
Data Mining.
.
.
.
73
,
().
(Basket Analysis).
,
, ,
, ,
. .
, . .
, , ,
, :
{, };
{, }.
, , ,
, , .
, ,
,
.
.
, ,
.
,
(, ).
, ,
, .
,
. ,
,
.
.
, .
.
. .
, ,
,
.
74
,
,
.
. ,
:
{5, 2, , , \9...},
\ /, 2
7. ,
, .
,
, , .
4.2.4.
"" , .
(cluster), , , .
, , , ,
.
.
,
.
,
.
,
. , ,
-: ,
, ,
. -
: , . ,
, .
() :
, - ,
. .
,
, ,
, ,
,
75
, . .
.
,
, ,
. ,
,
. 1869 . 60
. , ,
. ,
,
. 50
.
,
.
unsupervised learning.
, .
,
.
-
, . ,
, .
,
.
Data Mining, ,
, .
, ,
.
, .
-, (
). , , ,
, ,
.
-,
. ,
, /
.
: (/
76
), ( ), (
).
4.3.2.
,
, , .
Data Mining, -
77
.
, .
, ,
. Data Mining
, . .
. Data Mining,
.
. "
,
?"
4.3.3.
. ,
,
, . .
,
25 % .
, 4 5 , ,
50 100 ,
. , ,
20 % .
, . 10 %
-
, $4 . ,
Data Mining, (churn
prevention), (fraud detection),
.
Data Mining
.
.
4.3.4.
Data Mining.
, .
, ,
78
. ,
,
. ,
Data Mining , , ,
.
,
, , ,
(,
,
. ). , , ,
Data Mining .
, Data Mining
, .
Data Mining
.
4.3.5.
,
, ,
Data Mining.
.
.
,
.
, , .
,
/, ,
,
. . Data Mining
, .
,
,
(Drug Design).
,
79
, .
,
,
.
,
Data Mining,
.
,
.
10 12 , $300 500 .
. Data Mining,
.
4.3.6.
Data Mining
.
, ,
.
,
, . ,
(Decision System
Support) Data Mining.
,
,
.
Data Mining ,
.
, ,
.
, . . " .
Data Mining ( , ,
.)
. ,
, , ,
, .
, . . ,
, ,
15 25 % .
80
Data Mining
,
, .
,
. , (,
), Data Mining,
.
, ,
. Data Mining
20 30 %.
4.3.7.
, ,
() . ,
, , . Data
Mining ,
(fraud detection).
4.3.8.
Data Mining ,
.
, ,
.
, , 4Td call .
4.4.1. (predictive)
.
81
. , , ,
,
. .
:
,
.
;
,
.
.
4.4.2. (descriptive)
, , . .
.
. ,
,
.
:
. ,
, ;
(),
, .
(, ) (),
. "
" ,
.
, ;
(
, ), -
( ).
. ,
, ,
82
. : , ,
,
"" . "",
, ,
,
, ""
, , . , ,
, . . ,
. ,
,
;
.
, 30 ,
, , ,
, 5 ,
95 .
; ,
. , Data Summarization
- ,
,
, ,
.
,
, .
, ,
,
;
.
, ,
X Y. .
Data Mining. , Data Mining
, , ,
, , ,
Data Mining
. ,
Data Mining.
83
4.5.2.
.
:
;
(, );
.
( )
(
84
).
:
( ,
);
( ,
;
).
:
(, ,
, ). ,
;
(
).
.
, ,
,
.
.
JI. . ,
.
,
.
.
,
. ,
. JI.
, ,
, .
, ,
, .
.
85
.
,
,
.
, ,
.
JI. :
.
.
,
.
, .
,
;
, .
.
, ,
, , ,
,
. ,
,
. ,
.
..., ....
:
..., ... ;
.
,
, .
, , ,
;
..., ...
,
.
86
, ,
,
.
, .
.
, ,
.
, ,
, ,
.
:
> > > >
.
,
.
4.5.3.
()
, (
, )
.
, ,
.
,
,
. , ,
. ,
,
.
(). , Attar
Software -,
, , .
California Scientific Software
-,
87
. NIBS Inc.
, , -, , ,
.
, -
().
, :
,
;
, -
;
. ,
(, C++
. .), ,
;
- ,
-,
, ,
.
, . . ,
.
,
,
(
).
Data Mining
( ,
).
2
.
88
4.5.4.
,
.
(
).
(, )
.
,
(
- ,
).
.
,
( )
, .
.
, ,
, .
, , .
, ,
,
(,
, , a D ).
;
(
).
,
, ,
- .
, , ,
, ,
,
. ,
89
,
,
.
, ,
,
,
- , , ,
.
,
, ,
,
.
, , ,
,
. , Haykin (1994, . 2)
:
[! -
,
. : (1)
(2)
,
.
, (1996), . ,
,
, .
4.6.
4.6.1.
Data Mining, , ,
. .
, ,
Data Mining
.
90
, (. 4.2):
;
();
Data Mining ;
;
.
. 4.2.
, Data Mining.
, . . .
,
Data Mining.
, , ,
, ,
.
91
Data Mining.
,
.
.
, ,
, . ,
, . ,
Data Mining, ,
.
.
,
. .
, Data Mining
. , ,
.
Data Mining
.
, ,
,
, .
Data Mining .
4.6.2.
, Data Mining
. ,
, .
.
Data Mining .
,
, :
, , ,
, . .
,
, .
,
, , , .
92
,
, ,
, .
Data Mining
.
Data Mining,
. -,
, .
,
, ,
. , . , ,
. ,
. ,
,
, ,
Data Mining
( ,
).
.
, -, ,
,
- ,
. ,
, ,
, . .
, ,
.
().
().
, , , , , ,
Data Mining,
. .
,
, ,
93
.
.
, ,
.
,
, ,
(,
OLAP).
Data M ining (,
) ,
: , , ,
.
Data Mining :
, .
.
supervised learning ( )
unsupervised learning ( ).
.
,
, .
( ) .
,
.
(
) .
. ,
, ,
.
Data Mining
: , , ,
.
:
,
94
, Data Mining ,
, .
Data Mining
. .
Data Mining
: , , ,
.
5.1.
,
.
. :
I
ij .
(. 5.1).
Het
5. 1
96
5 . 1 ()
:
//= {\,2, ...,xh,
xh , ^.
: , ,
. .
Data Mining
:
= { 2, ...,xh, ...,,}.
\ :
7
,
, . ,
{, ,
}.
= {\, 2,
..., } ,
.
R ,
.
5.2.
5.2.1.
,
.
: ,
.
97
: :
() ().
.
"", "" "".
. :
( = = ) ( = );
( = = ) ( = ) .
.
.
.
. ,
,
,
. , :
( = ) ( = );
( = = ) ( = ).
, ,
. .
,
, .
5.2.2.
,
. . 5.1
, . 5.1.
.
.
, ,
, ,
. , ,
.
.
.
98
. 5.1.
, . .
. ,
,
, .
-
, , ,
.
.
,
(, ,
).
.
.
, ,
, . ,
, . 5.1, :
= = = ;
= = = ;
= = = ;
= = = .
,
. . ,
""
.
99
5.2.3.
.
( + 1)- . i} = {\, 2,
, } ,
:
yt = 0 + COj \ + 2 2 + ...+ (0, ,
0, ], ..., cow ,
.
,
.
.
, , 1 0.
1, 0.
. ,
. .
.
.
, = {, , }
{, 1, 2}.
.
,
. 1
, .
0. ,
: {001, , 100}.
.
5.3.
5.3.1. 1-
.
100
5.2
( = ) ( = )
2/5
( = ) ( = )
0/4
( = ) ( = )
2/5
( = ) ( = ) *
2/4
( = ) ( = )
2/6
( = ) ( = )
1/4
( = ) ( = )
3/7
( = ) ( = )
1/7
( = ) ( = )
2/8
( = ) ( = ) *
3/6
, 1R
.
. ,
, .
101
,
.
, .
, , . 5.1,
:
8
||
12
15
15
| |
10
11
12
||
20
21
23
25
:
{ 4,5; 4,5-7,5; 7,5-10,5; 10,5-12; 12-17,5; 17,5-20,5; 20,5-24;
24 }.
(overfitting). , ,
, . .
. , ,
(. . ),
. ,
.
, 1R,
, .
,
. ,
.
102
().
,-
. ,
:
( = r I ) = ( \ = ,.) ( = ,)/ { ).
, ,
.
:
\ = ch
2 =
... = ch')] = .
. ,
, (\ = )
:
( I = ,) = ( I = \ I = ) X ( 2 = ] I . = ,) X ... ( = '" I = ,).
:
( = ,. I ) = ( I = I = ,) ( 2 = ] \ = ,) X ... X
(= ? I = ) ( = ,)/().
xh ch
d :
P(xh = Cj I = ) = P(xh = 4
= ,)/( = ),
,
Xh~ ch
d - , .
, . 5.1
:
=
^(
^(
^(
/(
= 2/9;
= 4/9;
= 3/9;
= 3/5;
| =
^( = | = ) =0/5;
^( = | = ) =2/5.
103
( = ) ,
,., .
:
( = ) = 9/14;
/*( = ) = 5/14.
, ,
( )\
= ;
= ;
= ;
= ,
:
/*(= |) /^ = | = ) X
X /^( = | = ) X
X /^( = | = ) X
X ( = | = )*/>( = )/()\
/*( = | ) = /^( = | = ) X
X /^( = | = ) X
X /( = | = ) X
X /*( = | = ) X /*( = ().
, :
= I) = 2/9-3/9-3/9-3/9-9/14//>() = 0,0053
/ ()\
'( = cr \ ) = ( = | )/ % ( = , | ).
,
:
'(
104
, ,
.
, ,
, ch
d
. 0,
, 0. ,
, .
.
,
.
,
.
,
.
.
, . . , .
.
| :
(*-)2
* )=
4 l
, .
/ () ,
(, - !2 + /2)
/ ().
5.4.
5.4.1. " "
,
",
,
, .
105
:
,
. , \
( ).
, , ,
, , ,
;
, .
.
,
\ 9 ch2 ,..., cn
h ;
2, ..., , ,
\ .
,
.
,
.
.
" ". ,
,
,
. ,
.
, .
(),
, ,
.
.
: ,
,
, , . .
("")
. .
.
:
, (prepruning).
106
" "
,
;
, " " .
JI. .
: " ";
. ,
, ;
, . .
.
,
,
. , . .
- .
,
" ", . "
" . , ,
,
, .
, , , 23 ,
, .
, ,
. :
, ,
? , JI. (L. Hyafill) . (R. Rivest) , NP-, ,
, .
(pruning).
()
,
. ,
, .
:
;
,
.
107
,
,
. ,
,
.
ID3 ,
ID3 4.5. ,
\\ (
).
(
), /,\, /,2, ..., ^.
xh \, 2,
, xh
/?1, ch2, ..., chm.
,
. .
(. 5.2).
. 5.2.
108
freq(cv, / ) /,
. ,
/ \
freq(cr ,7)
, . 5.1, ,
, 9/14.
,
, :
freq {cr,T)
Info () = - ^ f r e q >
MJlog4
7=1
V
,
. :
Info(/) = 9/14 1og2(9/l 4) - 5/14*log2(5/l 4) = 0,940 .
, /,,
:
m
^ 0 1,{) = ^,1\\0{,).
/=1
, :
= (5/14)-0,971 +(4/14)0+(5/14)0,971 - 0,693 .
:
Gain(x/,) = Info(7) - Info*/,(7).
.
:
()
= 0,247
()
= 0,029
Gain( ) = 0,152
Gain( ) 0,048
Gain().
,
.
109
. ,
.
2, ..., ,
.
,
, :
()
= 0,571
Gain( ) = 0,971
Gain( ) = 0,020
, ,
^? .
, . 5.3.
. 5.3.
: ,
(
110
),
.
, Gain(J0
. ,
,
. Info* ,
.
,
. ,
InfoY , , Gain(J0
.
4.5
ID3. (overfitting), . .
"" , . ,
,
, .
"", ,
, - ,
InfoY(7) = 0.
Gain(Jf) , ,
, .
,
.
4.5 .
, ,
, , . ,
Info(7), :
m
/=1
7! ) -
,
m .
:
gain ratio (*/,) = Gain(x/,) / split info (*/,).
.
111
, , ,
gain ratio. ,
log2 ^,
,
log2/7. ,
,
, , , , .
,
,
. ,
:
(> 1); .
. ,
, , .
, ,
, ,
.
,
, , ,
.
.
, ,
. ,
,
, .
.
X .
U .
, ,
:
In fo (r) = - X f r e q ( c , . , r ) / ( | r | - t / ) l o g 2 (freq(6V, r ) / ( | r | - t / ) ) ,
/ =1
/=i
freq(c/? 7)
.
112
:
Gain(X) = (\T \-U )/\T \ (Info(7))- Info*(7)).
gain ratio.
, gain ratio ,
n + 1 .
xh c/7b ch2, ..., /
.
, .
chl
, /? 1.
, , ,
,. ,
1,
\, 2 ... 1 :
|,| / (|71 - U), \2\ / (|7] - ),\ ,\ / (|7] - U).
, :
m
/= 1
: ,
.
5.4.2.
,
.
,
.
" " ,
. , . .
,
.
(. 5.4).
. , ,
-"+"
,
, .
113
. , ,
, . 5.3.
5.3
114
5.3
()
,
.
, , ,
. ,
, .
,
, :
1. (
), ,
,
, .
2.
,
.
3. ,
.
4.
.
.
115
, . 5.3. ,
,
:
(?) =
:
= - 2/8;
= 1/8;
= 1/8;
= 3/12;
= 1/12;
= 0/12;
= 4/12;
= 0/12;
= 4/12.
, :
= .
( = ?) =
, ,
. ,
, (. 5.4).
5.4
116
5 . 4 ()
, :
= 2/4
= 1/4
= 1/4
= 3/6
= 1/6
= 0/6
= 4/6
, . 5.5:
( = = )
=
5. 5
,
, :
= 2/2
= 1/2
= 1/2
= 3/3
= 1/3
117
, :
( = = =
) =
,
, ,
.
5.5.
5.5.1.
, ,
.
, ,
.
. , ,
.
. 5.5.
. 5.5.
(
),
118
,
.
:
(5.1)
F ; c(y,j{x))
(loss function), j{x,) ,
/
, () .
,
. , "
.
, ,
(, ).
f(x) =.
,
, ( ,
, ).
. ,
(
; +1,
-1) ( 0-1 loss
) 1 0
:
) =1 ( = -1
, ^ ) = -1, = 1 ), .
:
c'(x,y,j(x))
.
119
.
, ( ).
, ,
j ( x ) - y .
(,
).
,
(, ,fix)) = '{fix) - ).
j{x) - .
, ^,.
, :
c(x,y,fix)) = ( f i x ) - y ) 2.
(5.2)
5.5.2. .
: .
F
coo, coj,..., .
,
(5.1). , ,
(5.2), F:
f
:
120
R{f) Y,, :
:
Y ry = Y TYa >.
:
= {Yr ) 1Y 1.
.
/ .
5.5.2.
,
. (5.1).
F .
-
.
:
: -> '.
,
.
(. 5.6).
. 5.6.
.
,
. ,
( ),
.
.
Support Vector Machines (SVM),
, .
121
,
, ,
. d.
( )
. , ,
, .
, ,
(support vectors).
,
:
122
91.
*) , :
/( ) = < , >+,
9 ^ ,
9?,
<, > \ ,
0.
,
:
INI = >/( )
j{x)
~||||2 :
< , > + - ' <
\ y i- < w , x > - b < e
:
/ = 2 ( - - a i )< xi,x > + b ,
(5.3)
/=1
, * ,
:
! * ,- ; ) = < >
{ i=1
,., * [0, ]
)
.
), SVM
.
(19)
:
(, *'):= ( (* ), (*')).
123
{ , ) .
5.3 :
/ ( x) :=Z ( a - - , ) ( ) + .
/=1
SVM ,
, .
,
SVM ,
.
, .
, k (xh ),
. . 5.6
, SVM-.
(, ) =
) = ( +
5.6
(, ) = tanh(y + 0)
SVM :
;
. ( , ),
;
, ;
overfitting;
.
:
;
;
;
.
124
5.6.
,
. ,
, :
, . .
.
,
( ) .
, ,
, (
). ,
. ,
, . ,
-
.
,
.
, ,
, . ,
.
.
, ,
. ,
, .
.
,
.
, .
,
.
,
. , ,
. ,
, .
.
, , "" ,
- .
125
,
. ,
.
(,
, ,
, :
,
. .).
,
. ,
.
.
,
,
.
.
, . ,
3 ( ).
,
( ,
).
(
[0, 1]
),
. (. 5.8).
. 5.8.
126
, ,
, , . . ,
, , , .
, (
, , ). ,
,
, (
) ,
. .
,
(. 5.9).
, "", ,
, :
( )
. (. . "",
,
). (), .
,
.
,
, , ,
(. 5.10). ,
.
, .
. , ,
.
127
.
. (
, ) ,
. ,
, ,
. ,
,
.
.
. ,
, ,
. ,
, ,
-,
. .
.
, ,
. .
,
,
. , ?
.
, .
128
, ,
.
,
.
,
, , ,
( ) .
:
. , , , 1R
Naive Bayes.
,
. , ID3,
4.5 .
. ,
SVM.
1R
,
.
Naive Bayes
.
ID3 4.5 ",
, ,
.
.
SVM ,
m- , - 1
( fix )), ,
.
6.1.
6.1.1.
.
. ,
(itemsets), :
i, , ;
.
, , ,
- (. 6.1).
30.00
12.00
10.00
4.00
14.00
15.00
6.1
130
:
/ = {, , , , , }.
/, ,
.
I :
= , , 1 ) .
,
.
, ,
. , ,
:
\ = {, , };
= {, , }.
, ,
:
= {;,2, ...,, ...,rw},
m .
, :
D= {{, , },
{, , },
{, , , , },
{, , }}.
Data Mining D
. 6.2.
6.2
12.00
4.00
14.00
10.00
4.00
15.00
131
6. 2 ()
15.00
10.00
12.00
10.00
4.00
10.00
15.00
10.00
, ip
:
Deoda={{, , },
{, , },
{, , , , }}.
(itemset)
:
F = {ij | ij I J = 1..}.
,
F = {, }.
, , ^- (
2- ).
, F,
:
DF ={Tr \ F ^ T / , r = \..m } ^ D .
:
^{,}={{, , },
{, , , , }}.
132
, F,
(support) F
Supp(F)'
Supp(F) = ^ d .
{, } 0,5, . .
( 1 2), 4.
Suppmjn. (large itemset),
,
:
Supp(F) > Suppmjn.
,
:
Z = {F |S upp(F )> S uppmin}.
Suppmjn = 0,5 :
{} Suppmjn=0,5;
{, } Suppmin= 0,5;
{} Suppmin= 0,75;
{, } Suppmjn= 0,5;
{, , } Suppmjn= 0,5;
{, } Suppmjn0,75;
{} Suppmjn= 0,75;
{, } Suppmin= 0,5;
{} Suppmi =0,75.
6.1.2.
.
, .
.
.
133
D= {{(), ()},
{(, ), (), (, , )},
{(, , ), ()}}.
134
,
. ,
.
. 6.3.
ID
6.3
(), ()
(, ), (), (:, , )
(, , ), ()
:
1 ,
, , .
, , {(), ()} 2/3, . .
0 1.
{(), ()},
( , ).
.
, ,
.
.
, , . 6.4.
01.01.03
15:04:23
1001
01.01.03
16:45:46
1001
01.01.03
18:32:26
1004
01.01.03
20:07:11
1005
01.01.03
20:54:43
1001
6.4
...
135
I ,
. Ss^
, sid.
(eid, t), eid , a t ,
. ,
sid :
*^sid = {(eid\, tt), (eid2, t2),..., ( eid, /)}.
, . 6.4, :
(, 0:25) (, 0:38)
0:15
0:30
(D, 0:53)
0:45
1:00
(, 1:25)
(, 1:42) (, 1:51)
I
1
1
1:15
1:30
1:45
2:00
. 6.1.
.
, , .
, , . , . 6.1
,
. , ,
{ , } : {(, 0:12), (, 0:25)},
{(, 0:38), (, 1:42)}, {(, 1:25), (, 1:42)}.
: {(, 0:12), (, 1:42)}, {{, 0:12), (, 1:51)},
{(, 0:38), (, 1:51)} {(, 1:25), (, 1:51)}, . .
.
6.1.3.
/
,
, . . ,
, . 6.2.
136
, . 6.1,
:
;
;
D ;
;
D ;
;
;
;
;
.
, /
. , , ,
, :
SuppCTp^SuppO;),
ij .
,
, , ,
. ,
S u p p {K0K0Cb.,
= 2/4, S u p p {Wa, } = 2/4, ..
0, 1 2.
}
,
, ,
, .
, , F - {/1/ /}
F = {Ig | I g /^ +1} ,
137
F = { i j 8 \ i e l g , /^ +1} .
.
. ,
/, .
.
.
, , .
, . . ,
.
:
, , ,
.
,
.
(/; 7)
(ijT). , ,
. , :
.
,
. , , ,
, ,
.
.
, .
, ,
, ,
.
6.2.
, ,
.
Data Mining.
, ,
138
.
:
;
.
:
() (),
(
), /, (
) , .
, :
(, ) ()
, ,
.
,
.
X Y,
1 / , / 7 = .
X = > Y , r X e I , Y G l , X v Y = q>.
.
. :
,
, .
, ;
, . , ,
- , . .
, .
,
;
,
.
, .
, . .
.
.
139
. , ,
F (. . X u Y = F),
, .
, {, , }
:
() () ;
() () ;
() (, ) ;
(, ) () ;
() () ;
() () ;
() (, ) ;
(, ) () ;
() () ;
() () ;
() (, ) ;
(, ) () .
,
. ,
.
.
(support) ,
. , , ,
~> Y , F ,
X Y:
SuPPX^Y =Supp/,
IDf=XuY
\D\
, , ,
, ,
S u p p (05) () S u p p {, , )
1/2.
(confidence) ,
X Y.
=> Y , X ,
, \
C o n f^ r =
I Dx I
= supp
Suppx
, , ,
, ,
, :
Coilf () () 2 / 3 ,
C o n f () ()
2/3,
140
C o n f (, ) (^)
15
Coilf () (, ) 2 / 3 .
, .
Y ^
, Y, . .:
C o n f ^ r = S^ PPvv-' 1' < Supp .
SuppA,
, Y
, => Y.
.
(improvement) ,
. ,
X Y, ,
X , , Y:
=>
_ 1Df=xvy 1_ SuppXuK
\DX \\DY \ Suppx SuppK
,
i m p r (, ) () 0 , 5 / ( 0 , 5 * 0 , 5 ) 2 .
, ,
Y , ,
, .
, . . ,
Y:
X => Y .
, . . SuppHeK = 1-S u p p K.
, ,
, . ,
. , :
(, )
, . . .
.
. , ,
.
141
, .
, :
= {
=> = { ),
= { => = &}
:
,
10 ., , , 7 .
6.3.
6.3.1. Apriori
,
, , . Apriori
1994 . (Ramakrishnan Srikant) (Rakesh Agrawal). , :
:
SuppF < Supp^ , E a F .
, 3- {, , }
2- {, }, {,
}, {, }. , ,
{, , }, {, }, {, },
{, }, .
142
Apriori
. /- /-
. : (candidate
generation) (candidate counting).
/- .
/- ,
.
, -.
,
,
/- . 1- 1- .
.
:
Li = { 1- }
(k=2; Lk_!
; ++)
= Apriorigen(Fk-1)
//
t D
Ct = subset (Ck, t) //
Ct
.count ++
Lk = { | .count >= Suppmm} //
= \^jLk
, :
Lk ^- ,
.
(ij < ip, j <) F
Supp/r> Suppmin:
4 {(^1 >Supp,),(F2,Supp2), ..^(F^Supp^)},
Fj
Ck ^- .
(/, < ip, j < )
F Supp.
143
.
1. k = 1 1- ,
Suppmjn.
2 . = k + 1.
3. ^- ,
, .
4. ^-
. ^- ( - 1)
.
{ - 1 )-
( - 1)- q.
q , ,
(p.item -\ < q.item -1). - 2
(p.item] = q.itemb p.item 2 = q.item2, ..., p.iterru _2 = q.item*_2).
SQL- .
insert into Ck
select p.itemx, p.item2,
from Lit.! p, Lk_i q
p.itemk_lf q.item^!
p.itemk_2 = q.itemk_2,
5. D ,
*, .
, ( - 1 )
, . .
Lk-\. :
(1) - s
(s g Lk_!)
6. *
.
7. Lk *,
Suppmin. 2.
.
, . 6.1,
Suppmin = 0,5. \
( ) (. 6.5).
144
Supp
{0}
{1}
0,5
{2}
0,75
{4}
0,25
{3}
0,75
{5}
0,75
6.5
2, 3, 5
6 , :
Z ,= {{1}, {2},{3}, {5}}.
.
2 - , (. 6 .6 ).
Supp
{ U2}
0,25
{}
0,5
{ 1, 5}
0,25
{2 ,3 }
0,5
{2 ,5 }
0,75
{ 3, 5}
0,5
6.6
2, 4, 5 6 , :
1 2 = {{1,3}, {2,3}, {2,5}, {3,5}}.
3-
. (. 6.7).
Supp
{2, 3, 5}
0,5
6.7
145
, :
1 = { { 2 ,3 ,5 } } .
4- ,
:
L = L xu Z 2 u Z 3 - {{1}, {2 }, {3}, {5}, {1 , 3}, {2, 3}, {2, 5}, {3, 5}, {2, 3, 5}}.
. ,
.
,
-. -
, .
.
- , .
, ,
-. ,
, , ,
- .
, -, . .
, -.
- - , .
,
-, ,
.
- - ,
. ""
,
, Ck n 1 - .
- . ,
, - . .
- ^-, ,
. , ,
,
.
""
, ,
. ,
, . ,
, .
146
( + 1 )-
. .
6.3.2. Apriori
AprioriTid Apriori.
,
(^- )
, TID ,
.
<TID, {Fk}>, Fk
^- ,
TID. } = D
,
}, . > 1
, .
, ,
:
<T.TID, {cGC k| cg
TID (. .
) .
^- ,
. ,
. ,
, , . .
.
, , . .
^- ,
.
Apriori MSAP (Mining
Sequential Alarm Patterns), .
:
Lk ,
\.
147
MSAP , ,
(Urgent Window).
, .
, Apriori.
, ,
.
.
.
.
.
,
.
, .
(support) ,
.
(confidence) , ,
.
(improvement) ,
.
.
.
.
Apriori , :
.
7.1.
30- .
,
60- 70- .
.
, ,
.
, ,
, , ,
, .
, ,
.
( ).
,
,
unsupervised learning.
, .
,
(descriptive).
-
, . ,
, .
,
. .
Data Mining,
, ,
.
150
,
,
. , ,
- ,
. ,
,
,
.
, ,
.
, .
(cluster), , , .
, , , ,
.
.
( )
,
.
( ) .
.
,
.
, .
-, (
). , , ,
, ,
.
-,
() (
). , , /
().
():
(/ ), (
), ( ).
151
.
,
, .
:
1)
, ,
/
;
2)
, ,
0...1. ,
.
JI. .
, ,
. ,
,
. J1. ,
: "... ,
,
".
,
,
. ,
. , ,
.
, ,
.
, , .
, , ,
.
, .
.
,
,
.
,
. ,
152
, .
, ,
, .
,
. ,
.
7.1.1.
:
;
.
:
( );
;
.
.
/,
. F
/ , . . F: I . F
, .
.
/ :
/ = {/ /2,
- } ,
ij .
,
30-
. . ( ).
Iris setosa, Iris versicolor Iris virginica.
50
: , . . 7.1
.
:
ij
, ,
, .
153
:
Xh = {vh\ Vh2,
.
:
{\, ?2, ? Cb >
,
/:
= {/ ip \ij e l , i p e I d(ij9 ip) < },
,
; d{ij9 ip) ,
.
7.1
5,1
3,5
1,4
0,2
Iris setosa
4,9
3,0
1,4
0,2
Iris setosa
4,7
3,2
1,3
0,2
Iris setosa
4,6
3,1
1,5
0,2
Iris setosa
5,0
3,6
1,4
0,2
Iris setosa
51
7,0
3,2
4,7
1,4
Iris versicolor
52
6,4
3,2
4,5
1,5
Iris versicolor
53
6,9
3,1
4,9
1,5
Iris versicolor
54
5,5
2,3
4,0
1,3
Iris versicolor
55
6,5
2,8
4,6
1,5
Iris versicolor
101
6,3
3,3
6,0
2,5
Iris virginica
102
5,8
2,7
5,1
1,9
Iris virginica
103
7,1
3,0
5,9
2,1
Iris virginica
104
6,3
2,9
5,6
1,8
Iris virginica
105
6,5
3,0
5,8
2,2
Iris virginica
154
d(ij, ip)
ij ip, :
) d{i), ip) > 0 , /, ip;
) d(ij, ip) = 0 , , /} = ip;
) d(ip ip) = d(ip ip);
) d{ / 1, ip) < c/(if, ir) +d(ir, ip).
d{ip ip) , ,
. ,
.
, ,
D.
/.
/(/, ip) j . ,
:
d ( e u e2)
D = d ( e 2,e])
d (e n,e])
d ( e l,en)
d ( e 2,en)
d (e n,e2)
.
,
:
(D + Dm) / 2.
7.1.2. , ,
- Rm.
.
,
.
:
X 0 c :R m , w-
;
, = (,
) () , / = 1,(2 ;
155
,
1^
- =
2^xi ;
Q /=i
1
S = ------ ~ / ~ ( ).
/=i
, .
.
,
.
(. . 7.1.1):
(7>
.
.
, ,
() (. .
).
d H^xn xj ) = Y ) <>-
(7-2)
. ,
"",
- (- ).
dx (xi,.) = max | x - x | .
7
1< t < m
(7.3)
'
,
,
. ,
( ),
:
<**/(*,>*/) = (*, ~Xj ) S~\ x, - Xj ) .
(7.4)
, .
:
156
(7.5)
, ,
.
, ,
,
.
.
7.2.
,
. ,
, .
, ,
.
, ,
. . 7.1
, . 7.1.
7,5 -1
4,5 4
2,5
3,5
. 7.1.
4,5
157
, ,
.
,
, , . 7.2.
2,5
3,5
4,5
. 7.2.
,
.
.
, ,
.
.
, . . - .
158
.
. . ,
.
(dendrograms).
( . dendron ).
.
, . 7.3.
7.3.
7.3.1.
,
. ,
.
, ,
,
, .
. ,
.
, , ,
?
.
.
.
, ,
.
.
,
(),
.
, .
,
.
159
, ().
, ,
,
() . :
, ,
,
. ,
.
:
) ,
( );
) (),
,
( ).
7.3.2.
/ :
I {^*1}? ^2 {7*2}? ? {//77}
(
, cCJ) . ,
m- 1 , :
] ~ {7 1 }? 2~ {/2}? ? {ip, iq) ,..., {im}.
, ,
( - 2), (-3), (-4) . .
,
/.
.
.
,
:
= CLpd , + , +
+ * I '
160
q
s,
,
a p, a q, .
. 7.2 , < .
7. 2
-
(Nearest neighbor)
1/2
1/2
1/2
(Furthest neighbor)
1/2
1/2
1/2
,
(Median
clustering)
1/2
1/2
1/4
(Between-groups linkage)
1/2
1/2
(Within-groups linkage)
(Centroid clustering),
.
,
,
(Ward's method).
,
- ,
+ ,
+ +
kp
+ ,
+ +
161
, ,
/ .
.
.
, . ,
I
.
1965 .
\ = I.
,
.
, ,
Dc \=\/Nc\ X X d(iP, iq) V ip, i4 e C.
\
.
, ,
, \, , 2.
\ pop,
, . .
,
\.
,
.
,
. .
1990 .
,
162
7.3.3.
,
().
.
( ) ,
().
,
. :
( ) ,
;
:
d 2A [mj9c ^ j = 1 ^ - ^ || = {mj - ^ )
mj
(7-6)
. ,
;
;
/;
J=J(M , d , , U);
.
-means (Hard-c-means)
. 7.1.
.
(. 7.4). .
. , ,
, ,
. , /.
1, 2 3.
.
,
. ,
(,
). . 7.5 .
163
. 7.4.
. 7.5.
.
,
. , ,
164
. . 7.6
. , ,
, (
1, 2, 5 .).
, .
. 7.6.
,
(, ) .
.
, ,
Iris setosa.
.
, .
,
.
, .
,
, .
,
, ,
, (
),
165
.
.
,
.
, ^- ,
,
.
,
, .
:
M = {m}}dJ=], d ()
;
, (7.6);
= {(/)}^=1,
d
.(') _ 221d
uomj
(7.7)
, 1< i < c ,
U = {utj},
1 np\\d{m ,,(/)) = m in d(m [1))
1< <
(7.8)
<1
(7.9)
(7.10)
,
.
, .
166
.
1. (,
), 8 (
), / = 0 .
2. :
---------- l5^
c-
(71)
I
/=1
3. ,
,
ij
[0
( 7) 2)
Fuzzy -Means
.
,
.
.
167
:
= { ^ =], d ()
;
(. (7.6));
= {(,)}^=1,
c<'>=J=L-----------,
15=, -,
(7.13)
"
-1
U = {utJ},
uv = ------------ !---------- ,
(7.14)
^ ( / ,, (,)) '
\ \ d 2A(mj,c{k))
(75>
/=1 7=1
we( l , ) ( ),
. w = 2;
:
uij [, 1];
5 =1;
/=1
(7.16)
=1
,
,
.
.
1 . 2 < c < d .
2.
.
168
3. 8 .
4. w e ( 1, ), w = 2.
5. (,
).
6. () :
" ) ,
------------- .
1 / .
(7.17)
10
7=1
7.
() :
d A(*, >/ ) ^
~ mj
~ 1)
1 8)
8. :
^ ------------------------- 1 < / < ,
1< j < d ,
(7.19)
=1
(7.16).
9. |t / (/) 1 / (/-1)|| < 8 . ,
, 7 / = / + 1.
A:-means,
, (. 7.7),
. .
-
(. 7.8), .
,
, .
169
170
,
:
Z (</)"/ - <>)(, ~ </)
F 0) = --------------------------------- ,
(7.20)
7=1
Xjk - F^l\ ,* -
,
. ,1...,(_1)
/- , />7
. . 7.9.
. 7.9.
-
. :
d\ " = ((,) - / ) } (<,) - mi ) >
( 7 -2 1 )
171
0) = ^|/ (-)|+1
( 7 .2 2 )
a F (,) (7.20).
:
M = {w7}^=, , d ()
;
, (7.21) (7.22);
= {(/)}^=1,
d
(0 = >=1_____
| ---------- }
I< ,
(7.23)
= ------------------------- >
(7-24)
1<
U = { },
( 2
d \ , .(//, c(t)) y
=\
(/,/,) =
(/,(,)),
(7.25)
/=1 7=1
we( l ,
( );
:
[0, 1];
2>=1;
/=1
<<</,
(7.26)
/=1
,
,
.
172
.
1. 2 < c < d .
2.
> 0.
3. w e ( 1 , ), 2.
4. ,
.
5. :
).(/) _ iz !d
1< / < .
(7.27)
6. :
a
(>)
(7.28)
7.
^ /2.((c/('),/y) = (c/(') - w / ) |F (,)|r+1 ( f (,)) ' { - ^ .
(7.29)
8.
,
(7.30)
(7.26).
9. ||f7(/) - / ()||< 8 . ,
, 5 1 = 1+ 1.
,
. -
173
,
.
.
. ,
.
, ,
,
( ,
,
.). ,
,
. :
;
, ;
()
.
:
, ,
, , , .
, ,
;
,
( ),
, ( )
,
, ,
;
,
.
,
. Fuzzy C-Means, ,
174
,
.
,
,
, (
).
,
.
, .
,
.
7.4.
7.4.1.
^ ,..., ^.
1 k- R
\ ... . ,
R eF (Xxx...xXk).
2 k- R
\ = ..= = X. ,
* ).
3 ()
R , = 2.
ReF(X\ 2).
R e F (X 2).
[ 1.\ -
-
|*() .
, /
X.
175
,
.
X ,
[0, 1]. , :
* ) : /- > [ 0 , 1];
f l,
0,
X^U
&
^ .
),
X , .
.
, (
,
R c X 2).
4 . R
, \ / (, x)eR. , R
.
5 . R , Vxe X (, x)R. , R
.
[ 2. \
, . .
, ,
.
} 2 X , ( x \)e R (2, x2)gi?.
6 . R
, ( 2)/? (2, X\)eR Vx,,x2 g I .
, R .
? . R
, (xb x2)ei? (2, xi)e/? \ = 2
Vx1?x2 X . , R
.
\3 3 .:
, . .
176
, ,
.
8 . R ,
Vx,,2,3 X (;ci,X2)ei? (2, ) ()/?.
, R .
9 . R
, .
10 . R
, , .
11 . R
, ,
.
1 2 . R
,
(jti, x2)e R (2, X\)eR \/] ,2 X .
.
, . . ReF(x2).
13 .
13.1 . R
,
((*,*)) > 0 .
, R .
13.2 -.
-, |n/e((x,x)) > , > 0 .
13.3 -. - ,
|ii/e((x,x)) = , > 0 V i g J .
13.4 . [iR((x,x)) = \
R ,
VxeX.
13.5 . |?((*,*)) = 0 \ /
R ,
.
177
[ 4. \
,
X:
(7.31)
, - ,
, .
14 .
14.1 . R
, \/],2 /^ , ^ ) ) > 0
14.2 .
R ,
Vjc, , 2 g X
( ,, 2)) > 0 |/((*,, 2)) = |/{(2, ])) > 0.
14.3 -. R
-, \/]92 ((*,,*2) ) >
^ ( ( * 2 , *, ) ) > > 0 .
14.4 -.
R - ,
Vjc, , 2 X |/((jc,, 2)) > [iR((*,, 2)) = |/((2, ])) >
> 0.
14.5 -.
R -
, - ,
\/]92 |1 ((,, 2)) = |n/e((x,, 2)) = \iR((x2, )) =
> 0.
14.6 . R
, , ^ ( ( , , ^ ) ) = 1
[xR((2, jc, )) = 1 Vx,, 2 X .
14.7 . R
, ^ ( ( , ,^ ) ) > 0
|n7?((*2 >*i)) > 0 } = 2 Vx,,x2 e X .
178
(----------------- 1
\
,
X :
.
_ i
" \
/q
, - ,
, .
[ ( ).
,
,
. 13 14,
,
,
,
.
, , ,
.
15 -. R ,
-, -.
-,
-:
| , ) ) = 1
>
V jc j 92
\/.
------------------------,
[ 7.
,
, . ., , ,
,
,
,
.
179
:
;
b
.
,
. ,
.
16 , w-
, , ,
.
, .
d.
Q ( X q ), -.
, d , ,
", q".
,
.
17 .
0 [iX) : X - [ 0 , 1] 0 ,
:
(7.33)
,
( ) .
* 8._\ ,
d , a [iX) (0) = 1 .
|^ .
18
,
X.
180
17 18
.
.
-
, -
17. , 16 17 8
:
(7.34)
(7.35)
1 \
<1,(2]
. ^
,
18 .
.
, ,
,
, ( 1).
{. ,
,
Xt ( 0 ).
,
(, d(x q, ), q ,
= l , Q ), .
,
, ,
, .
1 2
" , kxi" . 7.10.
1 , 2 '.
181
. 7.10.
:
Q ,
, q";
,
,
,
( q)\
.
-
- ,
.
.
19
0 ^ : X 2 -> [0 , 1]
xQ X , :
(7-36)
182
, ^
-.
,
X. :
^X[(a,b) ... (, 6 ) , ^ ( a , b ) .
: " \ ...
,
^. ,
.
20 X
: X 2 ->[ 0 , 1 ],
, ) = & 1( , ..., ^ ( * , 6 )),
/-, , (, ) ,
(7.37)
i-\Q ,
a , b e X . , ,
X.
20 ,
X, -.
:
-,
2 0 ,
X;
;
,
, ,
.
7.4.2. -
X
.
.
,
,
183
,
.
21 . R
, V x 19x2,x3 e X |Lifl((xj,x2))> 0 7(2 , 3)) >
|((,, 3)) > 0 .
22 -. R
-, 7(,,2) ) > [iR((2, 3)) >
^ ( ( ^ ) ) > Vx,,x2 ,x3 e X > 0 .
[?. :
|/(1,2)) = 1
/((2 ,3)) = 1
|/(1,)) = 1 ,
.
23 -. R
-, Vx,,x2 ,x3 e X
(*|>*)) = ( * 1>*2 ))> (* 2 >*)))> T - t - .
MIN-. R MIN-, Vx,, 2, 3 X
Ixr ((*1 >*?)) = min(|aft ((,, 2)),
((2, ,))).
1?10.\ ( ).
( 6 ),
,
(
13.1 13.5), ( 14.1 14.7)
( 2123).
-, ,
, -
,
, , ,
-.
.
2 4 -.
R -,
, - -.
184
--------------------- ,
[ 11.} ( --
). -
,
,
, ,
.
25 R ,
X, ,
:
I* 1
R = \jR '.
(7.38)
/=1
.
26 . : [, 1][0, 1]>[0, 1]
(/-), , ,
(,\) = V ae[0, 1].
, /- :
(,) = ( , ) ;
(, (, z )) = ( (, \ z) ;
( , ) < ( , ')
(,1) =
< ' ;
\/ [0,1].
, ,
( , )< m \n { x ,y } .
27 t-. .
- ( B)(t) = T(A(t),B(t)) \/t X .
28 .
S :[ 0 , 1] [0 , 1] [ 0 , 1]
(-), , ,
S(a90) = , \ / [0, 1].
, - S :
S( jc, ) = S ( y , ) ;
S (x ,S (y,z)) = S (S (x ,y),z) ;
185
< ' ;
\/ [0 , 1].
, ,
S (x,y) > { , } .
29 -. S . - ( B)(t) = S(A(t), B(t))
VteX.
[ 12. \ (
.) (7.38)
, :
^ = {aik)i=\...N ~ { b k j } ^ ~ iCij} i=\...N j
k=\...
k=\...Q
Cij = S(T{an,blj),
(7.39)
j =\...M
n a lQ,b0j) ) = Q
S n a lk,bkj) ,
(7.40)
S -, a -;
R" = R - R = R"~' R ,
n
(7.41)
A vj B = S(A ,B ),
(7.42)
R X=R\
S -.
1. R -,
: R R2 ... R" ....
, Rk Rk l > \ .
Rk Rk+' , My (jc, j^) <
(jc, ) . , Rc: R2 .
12 (7.41) :
] = S(T(rn , ),..., T(rlQ,r(Jl)) = S T ( r lk,rkj).
= i k = j --
186
=\
k*i
k*j
:
1) > {7^}. max { , {7^}} = ;
k*i
k*i
2) < {} =
k*i
*j
k*i
j
2,
, ,
k*i
* I
< \xr2 ( , ) , R R2 .
, R2 R3,..., Rk R k+l, ,
, R = R2 ; ...; R" =... -.
.
L, R! , R
-, -.
: 2
rt >
R2 :
rik >
> ,
\/ = \9\ \ . R 1:
V,,...,M = 1,| X | .
, , R -, )^ >
V i, = l J F j .
2.
/=1
R \
- R X R ^ .
^ . 3 ^ . .
12
. \\ .
R = [^)R! = S Rl .
/=i
,=1
187
\\
(\\ ;
= S Rl = sup S R 1
' - 1 /
- , S{xt) > max{jc,}. , sup(5'(x/)) = max{jt,}.
R=
\\
(\ \
fm
, R = [ j R l = S Rl = sup S Rl = m ax |S ' R 1 . /=i
1 2 . . ,
[\x\ .]
, , max j S Rl = R|x |. ,
M
\x\
f\x\
[\x\ .]
R = ( J Rl = 5 R' = sup 5 R' J = max | S Rl j =
.
.
3. -
-.
R\ ?2 -. , R = R^ u /?2
-.
/? = /?, u /?2 = S(Rl,R2) >max{ R],R2} .
, R .
,
-,
* , ) ^ { ^, (*>*)> /?2 (X*)} = { 1, 1} = 1 ,
(,) < 1 , , |,) = 1 .
, R .
, R
. R\ R2 - :
(X ) ^
( , ) =
(, ) > ,
(, ) >
\iR ( y 9x) = \iR (9 ) >
,
\/9 . ,
(9) >
(*,>>)>
188
\iR(x9y) =
= [iR( y 9x ) > a 9 . . R --
. - ,
-,
. , :
|Hy?(jc, jc) = 1 \ / X ;
/;,>) = |1 /,< ) >
.
\ / 9 --
, R a - ,
.
.
[ 1. \ 1 3, ,
a -
-.
1. -
a - X.
-:
|( ( * , )) = 1 \ / X ;
VjC| ,jc2 X .
a -:
( ( , ) ) = 1 \ / ;
\/]92 X ;
((**2) ) ^
\xR((xl9x3) ) > a
\/]9193 > 0 .
3 , -
a -. ,
, a -,
. ,
a - a -.
189
2 ]^ >
^(^^^,
|(/?.;7 ) > .
V/, j = 1,| X | , . .
, ) > ,
a - -.
, ,
a - a -.
.
.
30 - X
ajnf, X
a -, a > ajnf
. , a -
-.
31 a -. - a L > ajnf.
.
4 ( ). R
X
, X
.
2. a -
X ,
X .
11, a -,
, 4 ,
X
, X
. .
.
32 -
a , e [ a inf,l),
R (a L, ) * ( ,) V / * . / .
R (a L ), ,
190
-
.
-
.
: = {/}^1? , = (xiU...!)xin) , x ^ e R ,
n , Q = | X | X.
- X.
- -. MIN- -.
1. xi = (xiU...,xm)
( 18) (7.34):
=1
d( x ,xt)
|Ujc (*/) = 1----------- ---------- 9
XJ]<,>**))
2.
( 19) (7.36):
( * .. * / ) =
I 1 \ ( * / ) - ) 1 , Uj , q = 1, Q-
3. X (
20) (7.37):
l(a,b) = (%
(a,b)) = m j n ^ (a,b), a , b e X .
/=1,0 '
-
X.
4.
X , 25, 12,
2 -
-.
, :
191
. 7.11.
q - 2 , Q
/.
2, ^
.
--
- ^
--
.
. 7.11.
-
-,
192
, .
-,
,
a - (
)
" "
,
-.
, -,
, .
-,
.
.
-
.
1. .
,
,
, .
: X = { ..., % } (. 7.3).
7.3
(0,5; 3)
*13
(0,35; 2,7)
(0,09; 1,1)
*8
(0,4; 3)
*14
(3,4; 0)
*3
(0,1; 0,9)
(0,42; 2,5)
*15
(3,6; 0)
*4
(0,12; 0,9)
*10
(0,48; 2,5)
*16
(3,6; 1)
*5
(0,4; 2)
\ ]
(0,45; 2,8)
*17
(3,4; 1)
*6
(0,5; 2)
*12
(0,45; 2,2)
*18
(3,5; 0,5)
X,
( , ; 1)
*2
X
(. 7.12).
, - X
.