Вы находитесь на странице: 1из 12

K-Means Clustering-Example

Real-Life Numerical Example

We have 4 medicines as our training data points object and each medicine has 2
attributes. Each attribute represents coordinate of the object. We have to
determine which medicines belong to cluster 1 and which medicines belong to the
other cluster.
Attribute1 (X): weight index Attribute 2 (Y): pH
Object

1 1
Medicine A

Medicine B 2 1

Medicine C 4 3

Medicine D 5 4
Iteration 1

Attribute1 (X): Attribute 2 (Y): (1,1) (2,1) Cluster


Object weight index pH
1 1 0 1 1
Medicine A

2 1
1 0 2
Medicine B
4 3 3.6 2.8 2
Medicine C
5 4 5 4.2 2
Medicine D

c1=(1,1)
c2=(5+4+2)/3,(4+3+1)/3
c2=11/3,8/3
Iteration 2

Attribute1 (X): Attribute 2 (Y): (1,1) (3.6,2.6) Cluster


Object weight index pH
1 1 0 2.1 1
Medicine A

2 1
1 2.2 1
Medicine B
4 3 3.6 0.5 2
Medicine C
5 4 5 1.9 2
Medicine D

c1=(3/2,2/2)
c1=(1.5, 1)
c2=(9/2, 7/2)
c2=(4.5,3.5)
Iteration 3

Attribute1 (X): Attribute 2 (Y): (1.5,1) (4.5,3.5) Cluster


Object weight index pH
1 1 0.5 4.3 1
Medicine A

2 1
0.5 3.5 1
Medicine B
4 3 4.0 0.7 2
Medicine C
5 4 5.4 0.7 2
Medicine D

Same clusters hence we stop


How do you choose k?
• The Elbow Method:
• – Choose a number of clusters that covers most of the variance

• Other methods:
Percentage of variance explained

– Information Criterion
Approach
– Silhouette method
– Jump method
– Gap statistic

Number of clusters

6
Weaknesses of K-Mean Clustering
 
1. When the numbers of data are not so many, initial grouping will
determine the cluster significantly.
2. The number of cluster, K, must be determined before hand. Its
disadvantage is that it does not yield the same result with each run,
since the resulting clusters depend on the initial random assignments.
3. We never know the real cluster, using the same data, because if it is
inputted in a different order it may produce different cluster if the
number of data is few.
4. It is sensitive to initial condition. Different initial condition may produce
different result of cluster. The algorithm may be trapped in the local
optimum.
Applications of K-Mean Clustering

• It is relatively efficient and fast. It computes result at O(tkn), where n is


number of objects or points, k is number of clusters and t is number of
iterations.
• k-means clustering can be applied to machine learning or data mining
• Used on acoustic data in speech understanding to convert waveforms into
one of k categories (known as Vector Quantization or Image Segmentation).
• Also used for choosing color palettes on old fashioned graphical display
devices and Image Quantization.
Practice Example

Consider the dataset given below cluster the subjects in three


clusters using k-means algorithm
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
Python command k-means
from sklearn.cluster import Kmeans

kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter=300, n_init=10)

• n_clusters int, default=8


The number of clusters to form as well as the number of centroids to generate.
• init { ‘k-means++’, ‘random’, ndarray, callable}, default=’k-means++’Method for initialization:
‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.
‘random’: choose n_clusters observations (rows) at random from data for the initial centroids.
If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
If a callable is passed, it should take arguments X, n_clusters and a random state and return an initialization.
• n_init i nt, default=10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best
output of n_init consecutive runs in terms of inertia.
• max_iter int, default=300
Maximum number of iterations of the k-means algorithm for a single run.
10
CONCLUSION

• K-means algorithm is useful for undirected knowledge discovery and is


relatively simple.
• K-means has found wide spread usage in lot of fields, ranging from
unsupervised learning of neural network, Pattern recognitions, Classification
analysis, Artificial intelligence, image processing, machine vision, and many
others.
THANK YOU

12

Вам также может понравиться