13 views

Uploaded by minichel

- Clustering for portfolio management
- 10.1.1.107.9846_TM_004
- A Fast Implementation of the Isodata Clustering
- Clustering Validacao
- JACST-4749
- Turi Image
- A Prototype for Intrusion Detection in Wireless Sensor Networks Using Data Mining Methods
- Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
- Model Order Selection for Multiple Cooperative Swarms Clustering
- Xiao 2014
- Free Rooms 182
- A framework for texture classification using ccr 2009.pdf
- 1 Song Etal Fuzzy Clustering Book Chapter
- 158 9(Clustering)
- qmat97_iwanowski_skoneczny_szostakowski.pdf
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer Vision
- Articulated Pose Estimation With Flexible Mixtures-Of-parts
- Assignment No 1
- Enhancing_Data_Analysis_with_Noise_Remov.pdf
- 10.1.1.195.39

You are on page 1of 3

k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering

problem. The procedure follows a simple and easy way to classify a given data set through a certain number

of clusters (assume k clusters) fixed apriori. The main idea is to define k centers, one for each cluster. These

centers should be placed in a cunning way because of different location causes different result. So, the

better choice is to place them as much as possible far away from each other. The next step is to take each

point belonging to a given data set and associate it to the nearest center. When no point is pending, the first

step is completed and an early group age is done. At this point we need to re-calculate k new centroids as

barycenter of the clusters resulting from the previous step. After we have these k new centroids, a new binding

has to be done between the same data set points and the nearest new center. A loop has been generated. As a

result of this loop we may notice that the k centers change their location step by step until no more changes

are done or in other words centers do not move any more. Finally, this algorithm aims at minimizing an

objective function know as squared error function given by:

where,

||xi - vj|| is the Euclidean distance between xi and vj.

ci is the number of data points in ith cluster.

c is the number of cluster centers.

Let X = {x1,x2,x3,..,xn} be the set of data points and V = {v1,v2,.,vc} be the set of centers.

1) Randomly select c cluster centers.

2) Calculate the distance between each data point and cluster centers.

3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluste

centers..

4) Recalculate the new cluster center using:

5) Recalculate the distance between each data point and new obtained cluster centers.

6) If no data point was reassigned then stop, otherwise repeat from step 3).

Advantages

1) Fast, robust and easier to understand.

2) Relatively efficient: O(tknd), where n is # objects, k is # clusters, d is # dimension of each object, and t is #

iterations. Normally, k, t, d << n.

3) Gives best result when data set are distinct or well separated from each other.

Note: For more detailed figure for k-means algorithm please refer to k-means figure sub page.

Disadvantages

1) The learning algorithm requires apriori specification of the number of cluster centers.

2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means will not be able to

resolve

that there are two clusters.

3) The learning algorithm is not invariant to non-linear transformations i.e. with different representation of data

we get

different results (data represented in form of cartesian co-ordinates and polar co-ordinates will give different

results).

4) Euclidean distance measures can unequally weight underlying factors.

5) The learning algorithm provides the local optima of the squared error function.

6) Randomly choosing of the cluster center cannot lead us to the fruitful result. Pl. refer Fig.

7) Applicable only when mean is defined i.e. fails for categorical data.

8) Unable to handle noisy data and outliers.

9) Algorithm fails for non-linear data set.

Fig II: Showing the non-linear data set where k-means algorithm fails

- Clustering for portfolio managementUploaded byChiragDahiya
- 10.1.1.107.9846_TM_004Uploaded bysudheer105
- A Fast Implementation of the Isodata ClusteringUploaded bykalokos
- Clustering ValidacaoUploaded byAndrey de Souza
- JACST-4749Uploaded byMoorthi v
- Turi ImageUploaded byPrateek Gupta
- A Prototype for Intrusion Detection in Wireless Sensor Networks Using Data Mining MethodsUploaded byEditor IJRITCC
- Comparative Analysis of K-Means and Fuzzy C-Means AlgorithmsUploaded byFormat Seorang Legenda
- Model Order Selection for Multiple Cooperative Swarms ClusteringUploaded bymrpezhman
- Xiao 2014Uploaded bysonia
- Free Rooms 182Uploaded byXXxSpeeedyxXX
- A framework for texture classification using ccr 2009.pdfUploaded byAxel Herrera Cabrera
- 1 Song Etal Fuzzy Clustering Book ChapterUploaded byMag Nae
- 158 9(Clustering)Uploaded byHendy Kurniawan
- qmat97_iwanowski_skoneczny_szostakowski.pdfUploaded byhub23
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer VisionUploaded byIOSRjournal
- Articulated Pose Estimation With Flexible Mixtures-Of-partsUploaded byJosé Alejandro Mesejo Chiong
- Assignment No 1Uploaded byqqq
- Enhancing_Data_Analysis_with_Noise_Remov.pdfUploaded byvivekreddy
- 10.1.1.195.39Uploaded byNeko Plus
- Iaetsd-A Survey on One Class ClusteringUploaded byiaetsdiaetsd
- JADM Volume 6 Issue 1 Pages 121-132Uploaded byreza kh
- speech recognition system IMUploaded byCj Kao
- Chen, Shrivastava, Gupta - 2013 - Neil Extracting Visual Knowledge From Web DataUploaded bypeter
- xplore1417.pdfUploaded bykumar kumar
- IJETTCS-2014-12-03-83Uploaded byAnonymous vQrJlEN
- Final First Progress Report for Ph.DUploaded byShakir Khan
- Constantoglou 2009 Tipologias Turismo CosteiroUploaded byemasousaabreu
- A Review of Data Mining TechniquesUploaded byEditor IJRITCC
- NETWORK PERFORMANCE ENHANCEMENT WITH OPTIMIZATION SENSOR PLACEMENT IN WIRELESS SENSOR NETWORKUploaded byJohn Berg

- MPLUploaded byminichel
- NLP steemerUploaded byminichel
- Chapter-5Uploaded byminichel
- tabu_TSPUploaded byBedadipta Bain
- W1L2-Complexity.pdfUploaded byminichel
- Example CodeUploaded byminichel
- Facebook-Distributed-System-Case-Study-For-Distributed-System-Inside-Facebook-Datacenters.pdfUploaded byminichel
- Final exam review.docUploaded byminichel
- IoT Assignment.docxUploaded byminichel
- LicenseUploaded byminichel
- costQuality.pdfUploaded byminichel
- Face Recognition 2nd presentationUploaded byminichel
- Chapter11 -2Uploaded byminichel
- Data Mining Practice Final SolUploaded byminichel
- Domain Area PUploaded byminichel
- Cybersafety Basics(1)Uploaded byminichel
- 10 Facts on the Relationship Between a Language and Culture for an English ProjectUploaded byminichel
- LcmUploaded byminichel
- horowitz-and-sahani-fundamentals-of-computer-algorithms-2nd-edition.pdfUploaded byminichel
- History IsUploaded byminichel
- STQA2 AssignmentUploaded byminichel
- IoT AssignmentUploaded byminichel
- wubCPPUploaded byminichel
- Pervasive ComputingUploaded bysavidasan
- Pervasive ComputingUploaded bysavidasan
- asstqaUploaded byminichel
- Expert System to Suggest a Natural Drink to-1Uploaded byminichel
- STQA Assigment 2Uploaded byminichel
- Cloud ComputingUploaded byminichel

- Fourier Series, part 1.pdfUploaded byJacob Bains
- A note of Lambert´s Theorem - R.Devaney, and R. BlanchardUploaded byalvaroroman92
- Solving Problems With Two VariablesUploaded byDaisyListening
- psdUploaded bygarg_arpit
- MPZ3132 Assignment 2 2011-2012Uploaded byUditha Muthumala
- Modelling Portferry Choice in RoRo Freight Transportation (2002)Uploaded byLaura Andrea Vega
- An Overview of the Hamilton-jacobi Equation - 21Uploaded byJohn Bird
- RIDT98 SymmetryUploaded bySanti Lopez
- Digital Maheswari R CSE1003 Material 1Uploaded byNikhilesh Prabhakar
- Generating Signal With MATLABUploaded byondoy4925
- Book Review Averaging Methods Nonlinear Dynamical SystemsUploaded byDerrick Powers
- divisibility rulesUploaded byapi-242276181
- ch2.pdfUploaded byVienNgocQuang
- Assignment2 ProjUploaded byTrần Dũng
- Descriptive vs Inferential StatisticsUploaded byyashar2500
- Statistics Khaled BhuiyanUploaded bybhuiyankhaled
- g8m4 sg key system of equations l21-27Uploaded byapi-276774049
- 59 Sample ChapterUploaded byPRASENJIT MUKHERJEE
- UNIT 9Uploaded bygfhgffggg
- Eg AssignmentUploaded byPreethi Sharmi
- Computer lab fileUploaded byPiyush Menghani
- Solidworks-Sheetmetal- Tips and TricksUploaded bycveta
- LMS Adaptive FiltersUploaded byalialibaba
- Lect7-UWAUploaded byIlham Yacine
- David F. Batten, Cristoforo S. Bertuglia, Dino Martellato, Sylvie Occelli Auth., David F. Batten, Cristoforo S. Bertuglia, Dino Martellato, Sylvie Occelli Eds. Learning, Innovation and Urban EvolutionUploaded byJuan Carlos Lobato Valdespino
- Todini - Hydrological Catchment Modelling- Past-present and FutureUploaded byrutitassi
- P5 Maths CA1 Worksheet - Word Problems Pt 1Uploaded bymra
- Risk Analysis in Capital BudgetingUploaded byarashpreet
- 12_03_0865_0874.pdfUploaded byRACHEL KATHI
- Stress Analysis of Plate With HoleUploaded byketu009