Академический Документы
Профессиональный Документы
Культура Документы
CS498
Todays lecture
! Clustering and unsupervised learning ! Hierarchical clustering ! K-means, K-medoids, VQ
Unsupervised learning
! Supervised learning
! Use labeled data to do something smart
Some inspiration
Green
0 0 0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
A new question
! I see classes, but
1 0.9 0.8
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4
! Answer: Clustering
0.2 Green 0 0 0.1 0.2 0.3 0.4 Red 0.5 0.6 0.7 0.8 0.9 1
Clustering
! Discover classes in data
! Divide data in sensible clusters
Clustering process
! Describe your data using features
! Whats your objective?
Players height
2!
4!
Altitude
Speed
Intensity
Wavefront position
West
East
South
How to cluster
! Tons of methods ! We can use step-based logical steps
! e.g., find two closest point and merge, repeat
Hierarchical methods
! Agglomerative algorithms
! Keep pairing up your data
! Divisive algorithms
! Keep breaking up your data
Agglomerative Approach
! Look at your data points and form pairs
! Keep at it
More formally
! Represent data as vectors: X = {xi , i = 1,, N } ! Represent clusters by: C j ! Represent the clustering by: R = {C j , j = 1,, m } ! Define a distance measure: d(C j ,C i )
e.g . R = {{x1, x 3 }, x 2 {x 4 , x 5 , x 6 }}
Agglomerative clustering
! Choose: R0 = {C i = {xi }, i = 1,, N } ! For t = 1,
! Among all clusters in Rt-1, find cluster pair {Ci,Cj} such that: arg min d(C i ,C j ) ! Form new cluster and replace pair: Cq = Ci !C j
i,j
R0 x1 R1 R3
x2
x3
x4
x5
Similarity
R2 R4
x1 x2
x3
R1 R3
x5
x4
R2 R4
Cluster distance?
! Complete linkage
! Merge clusters that result to the smallest diameter
! Single linkage
! Merge clusters with two closest data points
! Group average
! Use average of distances
Whats involved
! At level t we have N - t clusters ! At level t+1 the pairs we consider are:
" N !t $ $ $ $ # 2
N !1
% (N ! t )(N ! t ! 1) ' ' ( ' ' 2 & % (N ! 1)N (N + 1) ' ' ) ' ' 6 &
! Overall comparisons:
" N !t $ $ ( $ # 2 t =0 $
" N !t $ $ ( $ # 2 t =0 $
N !3
Divisive Clustering
! Works the other way around ! Start with all data in one cluster ! Start dividing into sub-clusters
Divisive Clustering
! Choose: R0 = {X} ! For t = 1,
! For k = 1,,t
! Find least similar sub-clusters in each cluster
! Pick the least similar of all: arg max d(C k ,i ,C k , j ) ! New clustering is now: Rt = (Rt !1 ! {C t }) " {C t ,i ,C t , j }
k ,i , j
Comparison
! Which one is faster?
! Agglomerative
! Divisive has a complicated search step
! ! are the cluster parameters ! U ! {0,1} is an assignment matrix ! d() is a distance function
An iterative solution
! We cant use a gradient method
! The assignment matrix is binary-valued
Overall process
! Initialize ! and iterate:
! Estimate U
! # 1, if d(xi , !j ) = min k d(xi , !k ) # # uij = " # # # $ 0, otherwise
! Estimate !
"u
i
ij
!d(xi , !j ) ! !j
=0
K-means
! Standard and super-popular algorithm ! Finds clusters in terms of region centers ! Optimizes squared Euclidean distance
d(x, !) = x ! !
2
K-means algorithm
! Initialize k means ! Iterate
! Assign samples xi to closest mean ! Estimate from assigned samples xi
!1
!2
!3
!4
!5
!6 !6
!4
!2
!1
!2
!3
!4
!5
!6 !6
!4
!2
!1
!2
!3
!4
!5
!6 !6
!4
!2
!1
!2
!3
!4
!5
!6 !6
!4
!2
!1
!2
!3
!4
!5
!6 !6
!4
!2
K-Means on El Capitan
1 0.9 0.8 0.7 0.6
Blue
Blue
Green
0 0 0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
0 Green
0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
K-means on El Capitan
K-means on El Capitan
1 0.9 0.8 0.7 0.6
Blue
0.8
0.6
Blue
0.4
0.2
Green
Green
0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
K-means on El Capitan
One problem
! K-means struggles with outliers
40 30 20 10 0 !10 !20 !30 !5 40 30 20 10 0 !10 !20 !30 !5
10
15
10
15
K-medoids
! Medoid:
! Least dissimilar data point to all others ! Not as influenced by outliers as the mean
K-medoids
Input data 40 30 20 10 0 !10 !20 !30 !10 40 30 20 10 0 !10 !20 !30 !10 k!means 40 30 20 10 0 !10 !20 !30 !10 k!medoids
10
20
10
20
10
20
Vector Quantization
! Use of clustering for compression
! Keep a codebook ( k-means) ! Transmit nearest codebook vector instead of current sample
! We transmit only the cluster index, not the entire data for each sample
Simple example
2 0 0 !2 !4 !6 !4 !2 0 2 4 6 !2 !4 !6 !6
!4
!2
4
Cluster
3 2 1 Coded sequence
Frequency
Time
Recap
! Hierarchical clustering
! Agglomerative, Divisive ! Issues with performance
! K-means
! Fast and easy ! K-medoids for more robustness (but slower)