Вы находитесь на странице: 1из 8

Clustering Example

The purpose of the analysis was to look for "sub-populations" of adult females, with respect to a selection of clinically
relevant variables.

It is a good idea to work with Z-scores of the variables If the variables being used differ in their variability. Otherwise,
the variables with greater variability will dominate clustering.

Analyze Descriptive Statistics Descriptives




Analyze Classify Hierarchical Clustering





Select the variables for the analysis and
click the "Save standardized values as
variables" box.

The clustering will be done with the
resulting Z-score variables, zruls, zsoss,
etc.
Select the variables to be clustered.

Notice that we can cluster people or variables, as we
discussed earlier in the semester.
The "agglomeration schedule" will help us
decide how many clusters to include in our
solution.

Knowing the cluster membership of each
case for different # of clusters can be very
useful also, but we'll use a different way of
looking at this information.



SPSS gives two different "pictures" of what
cases are combined into clusters at each
step.

The dendogram is large and often messy,
but can be really helpful for identifying
when clustering steps involve "combining
groups" vs. "adding strays".
This is where you select the clustering
method (how to decide which clusters will
be combined on each step) and the
dissimilarity measures (how to represent
how similar the cases/clusters are to each
other)

You can tell SPSS to work with transformed
values. I prefer to save the transformed
values separately (as above), so that they
are available for additional analyses.
This allows you to save the cluster
membership of each cases for each
clustering solution you specify.

Usually 2-12 is enoughdepends upon
whether groups or "strays" are being
combined to form the successive clusters.

Agglomeration Schedule
235 289 .092 0 0 78
245 338 .223 0 0 10
212 387 .409 0 0 48
210 226 289.703 101 93 119
212 215 304.766 108 78 121
207 208 320.378 100 90 114
207 247 336.982 113 97 118
219 242 355.247 103 0 118
206 213 375.485 104 109 117
206 297 402.101 116 105 121
207 219 432.390 114 115 120
210 218 469.263 111 110 120
207 210 542.696 118 119 122
206 212 633.798 117 112 122
206 207 976.000 121 120 0
Stage
1
2
3
111
112
113
114
115
116
117
118
119
120
121
122
Cluster 1 Cluster 2
Cluster Combined
Coefficients Cluster 1 Cluster 2
Stage Cluster First
Appears
Next Stage
The agglomeration schedule shows the step-by-step clustering process.
Which clusters were combined on that step
The resulting total "error" in the clustering solution

6 clusters 5
5 clusters 4
4 clusters 3
3 clusters 2
2 clusters 1
We look for the "big jump" in error
-- as a sign that two "different"
clusters have been combined.

Pretty big jump on step 120 (from
4 3 clusters), suggesting that 3
is "too few" and 4 is "just right".

Have to worry about "strays"!!!!

It can be very helpful to also consider the
frequencies of the clusters for the different solutions
and to look at the dendogram. Both of these can
tell you about the sizes of clusters and which
clusters are combined on successive steps.

Analyze Descriptive Statistics Frequencies


Ward Method
41 33.3
19 15.4
8 6.5
43 35.0
12 9.8
123 100.0
1
2
3
4
5
Total
Valid
Frequency Percent
Ward Method
41 33.3
19 15.4
20 16.3
43 35.0
123 100.0
1
2
3
4
Total
Valid
Frequency Percent
Ward Method
41 33.3
39 31.7
43 35.0
123 100.0
1
2
3
Total
Valid
Frequency Percent
Ward Method
84 68.3
39 31.7
123 100.0
1
2
Total
Valid
Frequency Percent
Most
likely
solutions
Dendogram

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+

235
289
227
357
246
298
217
347
363
336
229
245
338
215
261
317
340
303
386
265
333
388
323
212
387
321
244
248
359
385
255
268
252
278
302
214
307
331
222
362
251
325
281
339
367
361
297
345
371
380
400
313
221
393
206
259
341
335
314
390
373
283
216
233
291
342
256
271
403
312
401
228
292
213
236
296
384
269
318
270
272
334
404
254
284
375
343
355
247
208
332
207
225
295
353
241
391
219
405
240
350
399
242
218
344
274
352
309
260
319
239
264
224
305
316
226
322
311
356
267
379
237
210














Be sure to use the Z-score versions of the variables used
in the clustering.

Move the variable that holds the cluster membership for
each participant for the chosen clustering solution into the
Independent List window.

Click Options




















A graph of the profile of each group is an important part of the interpretation of the clustering

Analyze Compare Means Means

Consider the order in which you list the variables a well selected order can make the graph easier to view

Be sure to only request means other stats (e.g.,
standard deviation, N, etc.) will make the charting
more complicated and less interpretable.
Creating the Graph of these Means -- The Cluster Profiles

Go to the SPSS Output window
Right-click on the Means Table
o Click SPSS Pivot Table Object Edit
o The Means Table will have a jagged box around it
Right-click on the Means Table again
o Click Create Graph Line
Report
Mean
-.4253659 -.4586084 -.3073686 .7185988 -.3003678 -.1480552 -.2308712 -.4889418
-.5416278 -.5330907 -.5011546 .8075645 1.0722159 .9480438 .9206476 .9732350
.7391507 .8161275 .7476080 -.8963086 -.6860777 -.7186848 -.6148729 -.4165012
.0000000 .0000000 .0000000 .0000000 .0000000 .0000000 .0000000 .0000000
Ward Method
1
2
3
Total
Zscore:
significant
other social
suppor
Zscore:
family social
support
Zscore: friend
social support
Zscore:
loneliness
Zscore: state
anxiety scale
Zscore: trait
anxiety scale
Zscore:
depression
Zscore:
stress





















Remember that clustering is an "exploratory analysis

You should routinely "explore" and compare solutions derived from combinations of different
Measures
Methods
# clusters
looking for that solution with the greatest consistency and meaningfulness.

Cluster Description / Interpretation

Group 3 -- The group wed all like to be in
High social support
Low loneliness
No mental health problems

Groups 1 & 2 both have
Low social support
High loneliness
However
o Group 1 has no major anxiety, depression and
stress, whereas
o Group 2 has high anxiety, depression and stress
So, both of these groups are lonely for good reason,
however that loneliness is associated with poor mental
health indicators only for Group 2

Heres the 4-group solution
1
2
3
4
Total
Ward Method
Z
s
c
o
r
e
:


s
i
g
n
i
f
i
c
a
n
t

o
t
h
e
r

s
o
c
i
a
l

s
u
p
p
o
r
t
Z
s
c
o
r
e
:


f
a
m
i
l
y

s
o
c
i
a
l

s
u
p
p
o
r
t
Z
s
c
o
r
e
:


f
r
i
e
n
d

s
o
c
i
a
l

s
u
p
p
o
r
t
Z
s
c
o
r
e
:


l
o
n
e
l
i
n
e
s
s
Z
s
c
o
r
e
:


s
t
a
i
t

a
n
x
i
e
t
y
Z
s
c
o
r
e
:


t
r
a
i
t

a
n
x
i
e
t
y
Z
s
c
o
r
e
:


d
e
p
r
e
s
s
i
o
n

(
B
D
I
)
Z
s
c
o
r
e
(
s
t
r
e
s
s
)
Variables
-1.0000000
-0.5000000
0.0000000
0.5000000
1.0000000
V
a
l
u
e
s
Report
Statistics : Mean



Group 3 Group 4

High social support
Low loneliness
No mental health problems

Group 1 Group 1

Lower social support
High loneliness
No major anxiety, depression or stress
Group 2
Group 2

Lowest social support
Most lonely
High anxiety, depression & stress
Group 3

Middle levels of social support
Middle levels of loneliness
High anxiety, depression & stress
The two new groups look interestingly different, prompting us to go with the 4-cluster solution over the 3-cluster
solution

Вам также может понравиться