Академический Документы
Профессиональный Документы
Культура Документы
Presidency University
September,2016
So far we have done
0 9 3 6 11
9 0 7 5 10
D= 3 7 0 9 2
6 5 9 0 8
11 10 2 8 0
Example of Hierarchical Clustering
0 9 3 6 11
9 0 7 5 10
D= 3 7 0 9 2
6 5 9 0 8
11 10 2 8 0
0 9 3 6 11
9 0 7 5 10
D= 3 7 0 9 2
6 5 9 0 8
11 10 2 8 0
0 9 3 6 11
9 0 7 5 10
D= 3 7 0 9 2
6 5 9 0 8
11 10 2 8 0
0 9 3 6 11
9 0 7 5 10
D 3 7 0 9 2
=
6 5 9 0 8
11 10 2 8 0
I mindij = d53=2
i ,j
Stage 1
11 10 2 8 0
I mindij = d53=2
i ,j
Stage 1
(35) 1 2 4
(35) 0 3 7 8
1 3 0 9
6
2 7 9 0 5
4 8 6 5 0
I mindij = d(35)1 = 3
i ,j
Stage 2
4 8 6 5 0
I mindij = d(35)1 = 3
i ,j
Stage 2
4 8 6 5 0
I d(135)2 =
min{d(35)2 , d12 } =
I mindij = d(35)1 = 3
min{7, 9} = 7
i ,j I d(135)4 =
min{d(35)4 , d14 } =
min{8, 6} = 6
Stage 3
(135) 0 7 6
2 7 0 5
4 6 5 0
I mindij = d42 = 5
i ,j
Stage 3
I mindij = d42 = 5
i ,j
Stage 3
(24) 6 0
Stage 4
6
5 Cluster Dendrogram
4
4
Height
1
2
d
hclust (*, "single")
K-means
Item x1 x2
A 5 3
B -1 1
C 1 -2
D -3 -2
K-means
Item x1 x2
A 5 3
B -1 1
C 1 -2
D -3 -2
I If A is not moved:
d 2 (A, (AB )) = (5 − 2)2 + (3 − 2)2 = 10d 2 (A, (CD )) =
(5 + 1)2 + (3 + 2)2 = 61
Step 2
I If A is not moved:
d 2 (A, (AB )) = (5 − 2)2 + (3 − 2)2 = 10d 2 (A, (CD )) =
(5 + 1)2 + (3 + 2)2 = 61
I If A is moved to the group (CD )
I Then the cluster centers are :
x̄ 2(2)−5
x̄
Group(B): 1,new = 2−1 = −1 2,new = 2−1 = 1
2(2)−3
x̄
Group(ACD): 1,new =
2(−1)+5
2+1 x̄
= 1 2,new = 2(− 2)+3
2+1 = −0.33
I and consequently we get:
d A B d A ACD
2 ( , ( )) = (5 + 1)2 + (3 − 1)2 = 40 2 ( , ( )) =
(5 − 1)2 + (3 + 0.33)2 = 27.09
Step 2
I If A is not moved:
d 2 (A, (AB )) = (5 − 2)2 + (3 − 2)2 = 10d 2 (A, (CD )) =
(5 + 1)2 + (3 + 2)2 = 61
I If A is moved to the group (CD )
I Then the cluster centers are :
x̄ 2(2)−5
x̄
Group(B): 1,new = 2−1 = −1 2,new = 2−1 = 1
2(2)−3
x̄
Group(ACD): 1,new =
2(−1)+5
2+1 x̄
= 1 2,new = 2(− 2)+3
2+1 = −0.33
I and consequently we get:
d A B d A ACD
2 ( , ( )) = (5 + 1)2 + (3 − 1)2 = 40 2 ( , ( )) =
(5 − 1)2 + (3 + 0.33)2 = 27.09
I Since A is closer to the center of (AB) than it is to the center
of (ACD), it is not reassigned.
Step 2
I If B is not moved:
d 2 (B , (AB )) = (−1 − 2)2 + (1 − 2)2 = 10d 2 (B , (CD )) =
(−1 + 1)2 + (1 + 2)2 = 9
Step 2
I If B is not moved:
d 2 (B , (AB )) = (−1 − 2)2 + (1 − 2)2 = 10d 2 (B , (CD )) =
(−1 + 1)2 + (1 + 2)2 = 9
I If B is not moved:
d 2 (B , (AB )) = (−1 − 2)2 + (1 − 2)2 = 10d 2 (B , (CD )) =
(−1 + 1)2 + (1 + 2)2 = 9
I If C is not moved:
d 2 (C , A) = (−1 − 5)2 + (−2 − 3)2 = 41d 2 (C , (BCD )) =
(1 + 1)2 + (−2 + 1)2 = 5
Step 2
I If C is not moved:
d 2 (C , A) = (−1 − 5)2 + (−2 − 3)2 = 41d 2 (C , (BCD )) =
(1 + 1)2 + (−2 + 1)2 = 5
I If C is not moved:
d 2 (C , A) = (−1 − 5)2 + (−2 − 3)2 = 41d 2 (C , (BCD )) =
(1 + 1)2 + (−2 + 1)2 = 5
1
(Weisberg, 1985, p. 231)
Old Faithful Geyser Eruptions
1
(Weisberg, 1985, p. 231)
Old Faithful Geyser Eruptions
1
(Weisberg, 1985, p. 231)
Old Faithful Geyser Eruptions I
data=apply(faithful,2,'scale')
d=dist(data)
mytree=hclust(d)
plot(mytree)
Height
0 1 2 3 4 5
58
17
95
91
142
266
61
11275
77
201
65
39
119
44
115
192
42
263
172
247
163
50
146 84
244
69
249
153 211
167
240
10164
133
199
234
117
178
121
217
232
265
127
131
206
271
135
188
103
14889
242
161
26919
21
108
93
221
213
171
209
63
14
22
37
106
251
129
19016
48
139
99
137
185
369
169
27
219
259
72
124
181
236
237 2
11
53
55
223
204
150
159
215
33 47
165243
174
239
29
164
152
195
79
87
145
198
207
216
123
238
228
Old Faithful Geyser Eruptions
20
260
226
13
67
28
220
225
227
35
98
214
253
57
155
122
83
d
231
74
257
II
156
229
250
31
96
18085
81
241 7
12540
52
130
109
138
243
113
86
168
170
218149
18
15
97
107
56
100
38
54
64
191
88
104
210
189
252
258
136
248
82
202
41
60
176
76
151
68
200
94
261
51
111
267
144
193
4570
177
254
224
25
272
16632
59
173
182
186
30
114
78
205
73
230197
46
8
26
80
110231
126
140
256
235 66
270
92
134160
158
203
175
26871
34
105
132
183
264
222
141
196
12
208
184
246
187
194
90
179
118
245
4310
5
62
262
102
120
255
162
233
Visualizing the clusters I
classes=cutree(mytree,h=2.5)
table(classes)
## classes
## 1 2 3
## 125 97 50
plot(faithful,col=classes)
Visualizing the clusters II
●
●
● ●
●
●
90
● ●● ●●●
● ● ●
● ● ● ● ●●
● ●
● ●● ● ●
● ● ● ●●●
●● ●● ●●● ●● ●
● ● ● ● ● ●●● ● ●
● ● ●●●●
● ●● ● ● ●
● ● ● ●● ● ● ● ● ●● ●
80
● ● ● ● ●● ●
● ● ● ● ●● ● ● ●●
● ●
● ● ● ●● ● ● ●● ●● ●●
●● ●● ● ● ● ● ● ●
●● ●● ● ● ● ● ●
● ● ●
● ● ●
●●
● ● ● ● ●●
waiting
● ● ● ●● ●
●
● ● ● ● ●
70
● ● ●●
● ●
●
●
● ●
● ● ●
● ● ● ●
● ● ●
● ● ● ●
60
● ● ● ●● ●
● ●● ● ● ●●
● ●● ●
● ●
●● ●
● ●● ● ● ●
●
●●●●● ● ●
●● ●● ● ●
● ● ● ● ●
●●●● ● ●
50
● ● ● ●
●●● ● ●
● ● ●
● ● ●
●●
● ●
●● ●
●
eruptions
Partitioning the dendogram I
plot(mytree)
rect.hclust(mytree, k=3, border="red")
Height
0 1 2 3 4 5
58
17
95
91
142
266
61
11275
77
201
65
39
119
44
115
192
42
263
172
247
163
50
146 84
244
69
249
153 211
167
240
10164
133
199
234
117
178
121
217
232
265
127
131
206
271
135
188
103
14889
242
161
26919
21
108
93
221
213
171
209
63
14
22
37
106
251
129
19016
48
139
99
137
185
369
169
27
219
259
72
124
181
236
237 2
11
53
55
223
204
150
159
215
33 47
165243
174
239
29
164
152
195
79
87
145
198
207
216
123
238
228
Partitioning the dendogram II
20
260
226
13
67
28
220
225
227
35
98
214
253
57
155
122
83
d
231
74
257
156
229
250
31
96
18085
81
241
125 7
40
52
130
109
138
243
113
86
168
170149
218
18
15
97
107
56
100
38
54
64
191
88
104
210
189
252
258
136
248
82
202
41
60
176
76
151
68
200
94
261
51
111
267
144
193
4570
177
254
224
25
272
16632
59
173
182
186
30
114
78
205
73
230197
46
8
26
80
110231
126
140
256
235 66
270
92
134160
158
203
175
26871
34
105
132
183
264
222
141
196
12
208
184
246
187
194
90
179
118
245
4310
5
62
262
102
120
255
162
233
Landsat Satellite Image Data
2
see http://edc.usgs.gov/guides/landsat mss.html
Landsat Satellite Image Data
2
see http://edc.usgs.gov/guides/landsat mss.html
Landsat Satellite Image Data
2
see http://edc.usgs.gov/guides/landsat mss.html
Applying K-means
I K-means is implemented by the command kmeans with K
being specied.
data=read.table("/home/user/pics/satimage.txt",header=T)
data=scale(data); cl=kmeans(data,6);
Applying K-means
I K-means is implemented by the command kmeans with K
being specied.
data=read.table("/home/user/pics/satimage.txt",header=T)
data=scale(data); cl=kmeans(data,6);
cl$size
data=read.table("/home/user/pics/satimage.txt",header=T)
data=scale(data); cl=kmeans(data,6);
cl$size
cl$tot.withinss; cl$betweenss;
[1] 35207.24
[1] 128850.8
Visualizing Clusters I
library(cluster)
clusplot(data,cl$cluster,
color=TRUE,shade=TRUE,lines=0)
Visualizing Clusters II
CLUSPLOT( data )
5
● ●●
● ●
0
● ●
●● ●● ● ●●●● ●
●● ● ●●
● ● ●●
●
●
● ● ●● ●●●● ●
● ●
● ●●●
● ●●
●● ● ●
●
●●●●
●
●●
●●
●
●●●
●●
● ●
●●
●● ●
●●●●
● ●
●●● ●●●
●
●●
●●
●
●
●●
●
●●
●● ●
●●
● ●
●
●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
● ●
●●
● ●
●●
●●●
●
●●
●
● ●
●●●
●●
●
●● ●●
●
●●
●
●
● ● ●
●●
●●
●
●●●
●●●
●●●
●
●
●●
●
●●
●
●●
●
●●●
●
●●●
●●
●
●●●● ●
● ●●●● ● ●
●●● ● ● ●
● ●
●● ●
●● ● ●●
● ●●●●
●●
●●●●● ●●●● ● ●●●
●●
●●
●●
●●●●
● ● ●
Component 2
● ●● ● ●
●● ●● ●
●●
●● ●●
● ●●
●●
● ●● ●●●●●
●●
●● ●● ●
●●●● ● ● ● ● ●● ●
●
● ●
●
●●● ● ●●●●●●●● ●● ● ●
●● ● ●● ● ● ●
● ● ● ●●
−5
● ● ● ●●● ●
●●● ●●● ● ● ●
●● ●● ● ● ● ●
●
−10
−15
−10 −5 0 5 10
Component 1
These two components explain 84.09 % of the point variability.