Академический Документы
Профессиональный Документы
Культура Документы
Jing Gao
SUNY Buffalo
1
Motivation
• Complex cluster shapes
– K-means performs poorly because it can only find spherical
clusters
• Spectral approach
– Use similarity graphs to encode local neighborhood information
– Data points are vertices of the graph
– Connect points which are “close”
2
1.5
0.5
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.5
-1
-1.5
-2
2
Similarity Graph
• Represent dataset as a weighted graph G(V,E)
• All vertices which can be reached from each other by a path
form a connected component
• Only one connected component in the graph—The graph is
fully connected {v , v ,..., v }
1 2 6
V={xi} Set of n vertices representing data points
E={Wij} Set of weighted edges indicating pair-wise similarity
between points
0.1
5
1
0.8 0.8
0.8
0.6 6
2
4
0.7
0.8 0.2
3
3
Graph Construction
• ε-neighborhood graph
– Identify a threshold value, ε, and include edges if the
affinity between two points is greater than ε
• k-nearest neighbors
– Insert edges between a node and its k-nearest
neighbors
– Each node will be connected to (at least) k nodes
• Fully connected
– Insert an edge between every pair of nodes
– Weight of the edge represents similarity
– Gaussian kernel:
wij exp( || xi x j || 2 / 2 )
4
ε-neighborhood Graph
• ε-neighborhood
– Compute pairwise distance between any two
objects
– Connect each point to all other points which have
distance smaller than a threshold ε
• Weighted or unweighted
– Unweighted—There is an edge if one point
belongs to the ε–neighborhood of another point
– Weighted—Transform distance to similarity and
use similarity as edge weights
5
kNN Graph
• Directed graph
– Connect each point to its k nearest neighbors
• kNN graph
– Undirected graph
– An edge between xi and xj : There’s an edge from xi to
xj OR from xj to xi in the directed graph
• Mutual kNN graph
– Undirected graph
– Edge set is a subset of that in the kNN graph
– An edge between xi and xj : There’s an edge from xi to
xj AND from xj to xi in the directed graph
6
7
Clustering Objective
0.1 5
1
0.8 0.8
0.8
2
0.6 6
4
0.7
0.8 3
0.2
8
Graph Cuts
C1
C2 cut (C1 , C2 ) w
iC1 , jC2
ij
0.1
5
1 0.8
0.8 0.8
2
0.6
4
6
cut(C1, C2) = 0.3
0.7
0.8 0.2
3
9
Bi-partitional Cuts
10
Example
• Minimum Cut
E
.24
.19 B
1 .08
.22
.45
D .09 A
.2
C
11
Normalized Cuts
Vol (C ) w
iC , jV
ij
12
Example
E
.24
.19 B
1 .08
.22
.45
D .09 A
.2
C
13
Example
E
.24
.19 B
1 .08
.22
.45
D .09 A
.2
C
14
Problem
15
Matrix Representations
• Similarity matrix (W)
– n x n matrix
– W [ wij ] : edge weight between vertex xi and xj
x1 x2 x3 x4 x5 x6
x6 0 0 0 0.7 0.8 0
• Important properties
– Symmetric matrix
16
Matrix Representations
• Degree matrix (D)
– n x n diagonal matrix
– D(i, i) wij : total weight of edges incident to vertex xi
j
0.1 x1 x2 x3 x4 x5 x6
5
1
0.8 0.8
x1 1.5 0 0 0 0 0
0.8
0.6 6 x2 0 1.6 0 0 0 0
2
4
0.7 x3 0 0 1.6 0 0 0
0.8 0.2
3 x4 0 0 0 1.7 0 0
x5 0 0 0 0 1.7 0
x6 0 0 0 0 0 1.5
• Used to
– Normalize adjacency matrix
17
Matrix Representations
• Laplacian matrix (L) L=D-W
– n x n symmetric matrix x1 x2 x3 x4 x5 x6
cluster assignment
20
Optimal Min-Cut
21
Spectral Bi-partitioning Algorithm
x1 x2 x3 x4 x5 x6
x1
1. Pre-processing 1.5 -0.8 -0.6 0 -0.1 0
x2 -0.8 1.6 -0.8 0 0 0
– Build Laplacian x3 -0.6 -0.8 1.6 -0.2 0 0
matrix L of the x4 0 0 -0.2 1.7 -0.8 -0.7
graph x5 -0.1 0 0 -0.8 1.7 -0.8
x6 0 0 0 -0.7 -0.8 1.5
2. Decomposition
0.0 0.4 0.2 0.1 0.4 -0.2 -0.9
– Find eigenvalues X Λ=
2.2
X=
0.4 0.2 -0.2 0.0 -0.2 0.6
0.4
-0.4
-0.7
0.9
-0.4
0.2
-0.8
-0.4
-0.6
-0.6
-0.2
2.5
of the matrix L 3.0 0.4 -0.7 -0.2 0.5 0.8 0.9
x1
– Map vertices to x2
0.2
0.2
corresponding x3 0.2
components of λ2 x4 -0.4
x5 -0.7
x6 -0.7
22
Spectral Bi-partitioning Algorithm
The matrix which represents the eigenvector of the
Laplacian (the eigenvector matched to the corresponded
eigenvalues with increasing order)
0.04
6 5 4 3 2 1
0.41 -0.37 0.64 0.39- 0.37-
x3 0.2 x6 -0.7
24
Normalized Laplacian
27
K-way Spectral Clustering Algorithm
• Pre-processing
– Compute Laplacian matrix L
• Decomposition
– Find the eigenvalues and eigenvectors of L
– Build embedded space from the eigenvectors
corresponding to the k smallest eigenvalues
• Clustering
– Apply k-means to the reduced n x k space to
produce k clusters
28
How to select k?
• Eigengap: the difference between two consecutive eigenvalues
• Most stable clustering is generally given by the value k that
maximizes the expression
k k k 1
50
45 λ1
40
35
max k 2 1
Eigenvalue
30
25
λ2
Choose k=2 20
15
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
K
29
Summary
30
Hierarchical Clustering
Jing Gao
SUNY Buffalo
31
Hierarchical Clustering
• Agglomerative approach
Initialization:
Each object is a cluster
Iteration:
a ab Merge two clusters which are
b abcde most similar to each other;
c Until all objects are merged
cde into a single cluster
d
de
e
32
Hierarchical Clustering
33
Dendrogram
• A tree that shows how clusters are merged/split
hierarchically
• Each node on the tree is a cluster; each leaf node is a
singleton cluster
34
Dendrogram
• A clustering of the data objects is obtained by cutting
the dendrogram at the desired level, then each
connected component forms a cluster
35
Agglomerative Clustering Algorithm
36
Starting Situation
• Start with clusters of individual points and a distance matrix
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
.
.
. Distance Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
37
Intermediate Situation
• After some merging steps, we have some clusters
• Choose two clusters that has the smallest C1 C2 C3 C4 C5
distance (largest similarity) to merge C1
C2
C3
C3
C4
C4
C5
Distance Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
38
Intermediate Situation
• We want to merge the two closest clusters (C2 and C5) and update
the distance matrix. C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
Distance Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
39
After Merging
• The question is “How do we update the distance matrix?”
C2
U
C1 C5 C3 C4
C1 ?
C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?
Distance Matrix
C1
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
40
How to Define Inter-Cluster Distance
p1 p2 p3 p4 p5 ...
p1
Distance?
p2
p3
p4
p5
MIN
.
MAX
. Distance Matrix
Group Average .
Distance Between Centroids
……
41
MIN or Single Link
• Inter-cluster distance
– The distance between two clusters is represented by the
distance of the closest pair of data objects belonging to
different clusters.
– Determined by one pair of points, i.e., by one link in the
proximity graph
5
1
3
5 0.2
2 1 0.15
2 3 6 0.1
0.05
4
4 0
3 6 2 5 4 1
43
Strength of MIN
44
Limitations of MIN
45
MAX or Complete Link
• Inter-cluster distance
– The distance between two clusters is represented by the
distance of the farthest pair of data objects belonging to
different clusters
46
MAX
4 1
2 5 0.4
0.35
5
2 0.3
0.25
3 6 0.2
3 0.15
1 0.1
0.05
4
0
3 6 4 1 2 5
47
Strength of MAX
48
Limitations of MAX
•Tends to break large clusters
Original Points
49
Limitations of MAX
5 4 1
2 0.25
5 0.2
2
0.15
3 6 0.1
1 0.05
4 0
3 3 6 4 1 2 5
52
Group Average
• Strengths
– Less susceptible to noise and outliers
• Limitations
– Biased towards globular clusters
53
Centroid Distance
• Inter-cluster distance
– The distance between two clusters is represented by the
distance between the centers of the clusters
– Determined by cluster centroids
54
Summary
55