Академический Документы
Профессиональный Документы
Культура Документы
X1 X2 X3
Use correlation
1¡Cor(Xi ;Xj )
d(Xi; Xj ) = 2
d(i,j)
d(i,h)
j
h d(j,h)
Euclidean distance:
p
d(i; j) = (xi1 ¡ xj1)2 + (xi2 ¡ xj2)2 + ::: + (xip ¡ xjp)2
Manhattan distance:
d(i; j) = jxi1 ¡ xj1j + jxi2 ¡ xj2j + ::: + jxip ¡ xjpj
Maximum distance:
1
1 1 1
d(i; j) = (jxi1 ¡ xj1 j + jxi2 ¡ xj2 j + ::: + jxip ¡ xjp j ) 1
=
= maxpk=1 jxik ¡ xjk j
Special cases of Minkowski distance:
1
q q q
d(i; j) = (jxi1 ¡ xj1j + jxi2 ¡ xj2j + ::: + jxip ¡ xjpj ) q
Manhattan Maximum
distance distance
Close
No subgroups
anymore
A 13.3 38.0
Example 2 B 12.4 45.4
C -122.7 45.6
4 objects D -122.4 37.7
OR
OR
Scale if,
- variables measure different units (kg, meter, sec,…)
- you explicitly want to have equal weight for each variable
Don’t scale if units are the same for all variables
Most often: Better to scale.
Appl. Multivariate Statistics - Spring 2012 15
Dissimilarities
M P H
More flexible than distances M 10 1 8
D1: d(i,j) >= 0
P 10 5
D2: d(i,i) = 0
D3: d(i,j) = d(j,i) H 10
Object j
X=1 X=0
Object i
X=0 c d
b+c
d(i; j) = a+b+c+d
Proportion of variables,
in which people disagree
Object j
X=1 X=0
Object i
X=0 c d
Uninformative
Simple matching coefficient
b+c
d(i; j) = a+b+c
Proportion of variables,
in which people disagree
ignoring (0,0)
mm
d(i; j) = p
Proportion of variables,
in which people disagree
P
Aggregate: d(i; j) = p1 pi=1 d(f)
ij
dist, daisy