Академический Документы
Профессиональный Документы
Культура Документы
Clustering
Frank Keller
keller@coli.uni-sb.de
Computerlinguistik
Universitat des Saarlandes
Clustering p.1/21
Clustering p.2/21
Clustering p.3/21
Clustering p.4/21
Clustering Algorithms
Examples
h
b
k
i
c
g
f
i
g
Hard, non-hierarchical,
disjunctive
non-
cluster.
Clustering p.5/21
Examples
g a c i e d k b j
Clustering p.6/21
Using Clustering
f h
0.4
0.1
0.5
0.1
0.8
0.1
0.3
0.3
0.4
0.1
0.1
0.8
a feeling for what the data look like, what its properties
are. First step in building a model.
0.4
0.2
0.4
0.1
0.4
0.5
0.7
0.2
0.1
0.5
0.4
0.1
Soft, non-hierarchical,
disjunctive
Clustering p.7/21
Clustering p.9/21
Clustering p.10/21
(1)
Euclidian distance.
Calculate the centroid (mean) for each cluster, use it as
~x = (x1 , x2 , . . . , xn )
(2)
|~x ~y| =
(xi yi)2
i=1
(3)
~ =
1
~x
|c j | ~xc
j
Clustering p.12/21
Example:
Properties of the algorithm:
8
7
6
5
4
3
2
1
0
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
Clustering p.13/21
Applications of Clustering
Clustering p.14/21
Applications of Clustering
cosmonaut
astronaut
moon
car
truck
Soviet
American
spacewalking
red
full
old
Clustering p.16/21
there is one.
Perform task-based evaluation, i.e., test if the
Clustering p.17/21
Clustering p.18/21
Summary
Clustering p.20/21
References
Manning, Christopher D., and Hinrich Schtze. 1999. Foundations of Statistical Natural Language
Clustering p.21/21