Lecture 12 - K-Means, GMMS, EM

Clustering
CS498
Todays lecture
! Clustering and unsupervised learning ! Hierarchical clustering ! K-means, K-medoids, VQ
Unsupervised learning
! Supervised learning
! Use labeled data to do something smart
! What if the labels dont exist?
Some inspiration
El Capitan, Yosemite National Park
The way well see it

1 0.9 0.8 0.7 0.6
Blue
0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2
Green
0 0 0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
A new question
! I see classes, but
1 0.9 0.8
! How do I find them?

! Can I automate this? ! How any are there?
Blue
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4
! Answer: Clustering
0.2 Green 0 0 0.1 0.2 0.3 0.4 Red 0.5 0.6 0.7 0.8 0.9 1
Clustering
! Discover classes in data
! Divide data in sensible clusters
! Fundamentally ill-defined problem

! There is often no correct solution
! Relies on many user choices
Clustering process
! Describe your data using features
! Whats your objective?
! Define a proximity measure

! How is the feature space shaped?
! Define a clustering criterion

! When do samples make a cluster?
Know what you want

! Features & objective matter
! Which are the two classes?
Know what you want

! Features & objective matter
! Which are the two classes?
Basketball player recruiting Players knowledge of entomology
Players height
Know your space

! Define a sensible proximity measure
Know your space

! Define a sensible proximity measure
Angle of incidence
2!
4!
Know your cluster type

! What forms a cluster in your space?

Planes near airports
Altitude
Speed

Sound bouncing off a wall
Intensity
Wavefront position

Ants departing colony North
West
East
South
How many clusters?

! The deeper you look the more youll get
There are no right answers!

! Part of clustering is an art ! You need to experiment to get there ! But some good starting points exist
How to cluster
! Tons of methods ! We can use step-based logical steps
! e.g., find two closest point and merge, repeat
! Or formulate a global criterion
Hierarchical methods
! Agglomerative algorithms
! Keep pairing up your data
! Divisive algorithms
! Keep breaking up your data
Agglomerative Approach
! Look at your data points and form pairs
! Keep at it
More formally
! Represent data as vectors: X = {xi , i = 1,, N } ! Represent clusters by: C j ! Represent the clustering by: R = {C j , j = 1,, m } ! Define a distance measure: d(C j ,C i )
e.g . R = {{x1, x 3 }, x 2 {x 4 , x 5 , x 6 }}
Agglomerative clustering
! Choose: R0 = {C i = {xi }, i = 1,, N } ! For t = 1,
! Among all clusters in Rt-1, find cluster pair {Ci,Cj} such that: arg min d(C i ,C j ) ! Form new cluster and replace pair: Cq = Ci !C j
i,j
! Until we have only one cluster
Rt = (Rt "1 " {C i ,C j }) ! {Cq }
Pretty picture version

! Dendrogram
R0 x1 R1 R3
x2
x3
x4
x5
Similarity
R2 R4
Pretty picture two

! Venn diagram
x1 x2
x3
R1 R3
x5
x4
R2 R4
Cluster distance?
! Complete linkage
! Merge clusters that result to the smallest diameter
! Single linkage
! Merge clusters with two closest data points
! Group average
! Use average of distances
Whats involved
! At level t we have N - t clusters ! At level t+1 the pairs we consider are:
" N !t $ $ $ $ # 2
N !1
% (N ! t )(N ! t ! 1) ' ' ( ' ' 2 & % (N ! 1)N (N + 1) ' ' ) ' ' 6 &
! Overall comparisons:
" N !t $ $ ( $ # 2 t =0 $
Not good for our case

! El Capitan picture has 63,140 pixels ! How many cluster comparisons is that?
" N !t $ $ ( $ # 2 t =0 $
N !3
% ' ' = 41,946,968,141,536 ' ' &
! Thanks, but no thanks
Divisive Clustering
! Works the other way around ! Start with all data in one cluster ! Start dividing into sub-clusters
Divisive Clustering
! Choose: R0 = {X} ! For t = 1,
! For k = 1,,t
! Find least similar sub-clusters in each cluster
! Pick the least similar of all: arg max d(C k ,i ,C k , j ) ! New clustering is now: Rt = (Rt !1 ! {C t }) " {C t ,i ,C t , j }
k ,i , j
! Until each point is a cluster
Comparison
! Which one is faster?
! Agglomerative
! Divisive has a complicated search step
! Which one gives better results?

! Divisive
! Agglomerative makes only local observations
Using cost functions

! Given a set of data xi ! Define a cost function:
J (!, U) = ! ! uijd(xi , !j )
i j
! ! are the cluster parameters ! U ! {0,1} is an assignment matrix ! d() is a distance function
An iterative solution
! We cant use a gradient method
! The assignment matrix is binary-valued
! We have two parameters to find !, U

! Fix one and find the other, repeat flip case ! Iterate until happy
Overall process
! Initialize ! and iterate:
! Estimate U
! # 1, if d(xi , !j ) = min k d(xi , !k ) # # uij = " # # # $ 0, otherwise
! Estimate !
"u
i
ij
!d(xi , !j ) ! !j
=0
! Repeat until satisfied
K-means
! Standard and super-popular algorithm ! Finds clusters in terms of region centers ! Optimizes squared Euclidean distance
d(x, !) = x ! !
2
K-means algorithm
! Initialize k means ! Iterate
! Assign samples xi to closest mean ! Estimate from assigned samples xi
! Repeat until convergence
Example run step 1

2 1
!1
!2
!3
!4
!5
!6 !6
!4
!2
Example run step 2

2 1
!1
!2
!3
!4
!5
!6 !6
!4
!2
Example run step 3

2 1
!1
!2
!3
!4
!5
!6 !6
!4
!2
Example run step 4

2 1
!1
!2
!3
!4
!5
!6 !6
!4
!2
Example run step 5

2 1
!1
!2
!3
!4
!5
!6 !6
!4
!2
How well does it work?

! Converges to a minimum of cost function
! Not for all distances though!
! Is heavily biased by starting positions

! Various initialization tricks
! Its pretty fast!
K-Means on El Capitan
1 0.9 0.8 0.7 0.6
1 0.9 0.8 0.7 0.6
Blue
0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2
Blue
0.5 0.4 0.3 0.2 0.1 0 1 0.5
Green
0 0 0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
0 Green
0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
K-means on El Capitan
1 0.9 0.8 0.7 0.6
Blue
0.8
0.6
Blue
0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2
0.4
0.2
0 1 0.8 0.6 0.4 0.2

0 0 0.1 0.2 0.3 0.4 Red 0.5 0.6 0.7 0.8 0.9 1
Green
Green
0.1
0.2
0.3
0.4 Red
0.5
0.6
0.7
0.8
0.9
One problem
! K-means struggles with outliers
40 30 20 10 0 !10 !20 !30 !5 40 30 20 10 0 !10 !20 !30 !5
10
15
10
15
K-medoids
! Medoid:
! Least dissimilar data point to all others ! Not as influenced by outliers as the mean
! Replace means with medoids

! Redesign k-means as k-medoids
K-medoids
Input data 40 30 20 10 0 !10 !20 !30 !10 40 30 20 10 0 !10 !20 !30 !10 k!means 40 30 20 10 0 !10 !20 !30 !10 k!medoids
10
20
10
20
10
20
Vector Quantization
! Use of clustering for compression
! Keep a codebook ( k-means) ! Transmit nearest codebook vector instead of current sample
! We transmit only the cluster index, not the entire data for each sample
Simple example
2 0 0 !2 !4 !6 !4 !2 0 2 4 6 !2 !4 !6 !6
!4
!2
Vector Quantization in Audio

Input sequence
Frequency
4
Cluster
3 2 1 Coded sequence
Frequency
Time
Recap
! Hierarchical clustering
! Agglomerative, Divisive ! Issues with performance
! K-means
! Fast and easy ! K-medoids for more robustness (but slower)

Lecture 12 - K-Means, GMMS, EM

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Lecture 12 - K-Means, GMMS, EM

Загружено:

Авторское право:

Доступные форматы

Clustering

! What if the labels dont exist?

El Capitan, Yosemite National Park

The way well see it

0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2

! How do I find them?

! Fundamentally ill-defined problem

! Relies on many user choices

! Define a proximity measure

! Define a clustering criterion

Know what you want

Know what you want

Know your space

Know your space

Know your cluster type

Know your cluster type

Know your cluster type

Know your cluster type

How many clusters?

There are no right answers!

! Or formulate a global criterion

! Until we have only one cluster

Rt = (Rt "1 " {C i ,C j }) ! {Cq }

Pretty picture version

Pretty picture two

Not good for our case

% ' ' = 41,946,968,141,536 ' ' &

! Thanks, but no thanks

! Until each point is a cluster

! Which one gives better results?

Using cost functions

! We have two parameters to find !, U

! Repeat until satisfied

! Repeat until convergence

Example run step 1

Example run step 2

Example run step 3

Example run step 4

Example run step 5

How well does it work?

! Is heavily biased by starting positions

! Its pretty fast!

1 0.9 0.8 0.7 0.6

0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2

0.5 0.4 0.3 0.2 0.1 0 1 0.5

0.5 0.4 0.3 0.2 0.1 0 1 0.8 0.6 0.4 0.2

0 1 0.8 0.6 0.4 0.2

! Replace means with medoids

Vector Quantization in Audio

Вам также может понравиться