11 views

Uploaded by baskarbaju1

Syll 6

- SEGMENTATION USING ‘NEW’ TEXTURE FEATURE
- Clustering
- LTE.pdf
- FINDING SIMILARITY BETWEEN RAGAS USING K-MEANS CLUSTERING
- Lecture 10 Clustering Basic
- Fuzzy Clustering as an Intrusion Detection Technique
- e1071
- 3 Cluster Analysis
- On the Use of Side Information for Mining
- Image Segment at i On
- reference 2.pdf
- M.Tech
- A Survey on: Hyper Spectral Image Segmentation and Classification Using FODPSO
- A Review of Network Traffic Analysis and Prediction Techniques
- kmeans1
- Wireless Fingerprinting Wisec08
- 285 Project Paper
- Increasing Information Shareability by Using NTBS Clustering Approach for VANET
- Sample Data Mining Project Paper
- 1-s2.0-S1569190X16302350-main

You are on page 1of 45

Chapter 3

PPDM Cl

Class

Tan,Steinbach, Kumar

4/18/2004

z

will be similar (or related) to one another and different

from (or unrelated to) the objects in other groups

Inter-cluster

Inter

cluster

distances are

maximized

Intra-cluster

distances are

minimized

Tan,Steinbach, Kumar

4/18/2004

z

Discovered Clusters

Understanding

Group

p related documents

for browsing, group genes

and proteins that have

similar functionality, or

group stocks with similar

price fluctuations

Industry Group

Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,

Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,

DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,

Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,

Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,

Sun-DOWN

Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,

ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,

Computer-Assoc-DOWN,Circuit-City-DOWN,

Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN,

Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN

1

2

Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,

MBNA-Corp-DOWN,Morgan-Stanley-DOWN

3

4

Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,

Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,

Schlumberger-UP

Technology1-DOWN

Technology2-DOWN

Financial-DOWN

Oil-UP

Summarization

Reduce the size of large

data sets

C uste g p

Clustering

precipitation

ec p tat o

in Australia

Tan,Steinbach, Kumar

4/18/2004

z

Supervised classification

Have class label information

Simple segmentation

Di

Dividing

idi students

t d t into

i t different

diff

t registration

i t ti groups

alphabetically, by last name

Results of a query

Groupings are a result of an external specification

Graph partitioning

Some mutual relevance and synergy, but areas are not

identical

Tan,Steinbach, Kumar

4/18/2004

Six Clusters

Two Clusters

Four Clusters

Tan,Steinbach, Kumar

4/18/2004

Types of Clusterings

z

partitional sets of clusters

Partitional Clustering

A division data objects into non-overlapping subsets (clusters)

such that each data object is in exactly one subset

Hierarchical clustering

A set of nested clusters organized as a hierarchical tree

Tan,Steinbach, Kumar

4/18/2004

Partitional Clustering

Original

g

Points

Tan,Steinbach, Kumar

A Partitional Clustering

g

4/18/2004

Hierarchical Clustering

p1

p3

p4

p2

p1 p2

Traditional Hierarchical Clustering

p3 p4

Traditional Dendrogram

p1

p3

p4

p2

p1 p2

Non-traditional Hierarchical Clustering

Tan,Steinbach, Kumar

p3 p4

Non-traditional Dendrogram

4/18/2004

z

In non-exclusive clusterings, points may belong to multiple

clusters.

clusters

Can represent multiple classes or border points

In fuzzy clustering, a point belongs to every cluster with some

weight between 0 and 1

Weights must sum to 1

Probabilistic clustering has similar characteristics

In

I some cases, we only

l wantt to

t cluster

l t some off the

th data

d t

Cluster of widely different sizes, shapes, and densities

Tan,Steinbach, Kumar

4/18/2004

Types of Clusters

z

Well-separated clusters

Center-based clusters

Contiguous clusters

Density-based clusters

P

Property

t or Conceptual

C

t l

Tan,Steinbach, Kumar

4/18/2004

10

z

Well-Separated Clusters:

A cluster is a set of points such that any point in a cluster is

closer (or more similar) to every other point in the cluster than

to any point not in the cluster.

3 well-separated clusters

Tan,Steinbach, Kumar

4/18/2004

11

z

Center-based

A cluster is a set of objects such that an object in a cluster is

closer (more similar) to the center of a cluster, than to the

center of any other cluster

The center of a cluster is often a centroid, the average of all

th points

the

i t iin th

the cluster,

l t or a medoid,

d id the

th mostt representative

t ti

point of a cluster

4 center-based clusters

Tan,Steinbach, Kumar

4/18/2004

12

z

Transitive)

A cluster is a set of points such that a point in a cluster is

closer (or more similar) to one or more other points in the

cluster than to any point not in the cluster.

8 contiguous clusters

Tan,Steinbach, Kumar

4/18/2004

13

z

Density-based

A cluster is a dense region of points, which is separated by

low-density regions, from other regions of high density.

Used when the clusters are irregular or intertwined, and when

noise and outliers are present.

6 density-based clusters

Tan,Steinbach, Kumar

4/18/2004

14

z

Finds clusters that share some common property or represent

a particular concept.

.

2 Overlapping Circles

Tan,Steinbach, Kumar

4/18/2004

15

z

Finds clusters that minimize or maximize an objective function.

Enumerate all possible ways of dividing the points into clusters and

evaluate the `goodness' of each potential set of clusters by using

the given objective function

function. (NP Hard)

P titi

Partitional

l algorithms

l ith

typically

t i ll h

have global

l b l objectives

bj ti

data to a parameterized model.

statistical distributions.

Tan,Steinbach, Kumar

4/18/2004

16

z

and solve a related problem in that domain

Proximity matrix defines a weighted graph, where the

nodes are the points being clustered, and the

weighted edges represent the proximities between

points

Clustering is equivalent to breaking the graph into

connected components, one for each cluster.

Want to minimize the edge weight between clusters

and maximize the edge weight within clusters

Tan,Steinbach, Kumar

4/18/2004

17

z

This is a derived measure, but central to clustering

Sparseness

Dictates type of similarity

Adds to efficiency

Attribute type

Dictates type of similarity

Type of Data

Dictates type of similarity

Other characteristics, e.g., autocorrelation

z

z

z

Dimensionality

Di

i

lit

Noise and Outliers

Type of Distribution

Tan,Steinbach, Kumar

4/18/2004

18

Clustering Algorithms

z

Hierarchical clustering

Density-based clustering

Tan,Steinbach, Kumar

4/18/2004

19

K-means Clustering

z

z

Each cluster is associated with a centroid (center point)

centroid

Number of clusters,

clusters K

K, must be specified

The basic algorithm is very simple

Tan,Steinbach, Kumar

4/18/2004

20

z

cluster.

similarity correlation

similarity,

correlation, etc

etc.

K-means will converge for common similarity measures

mentioned above.

z

z

iterations.

pp g condition is changed

g to Until relatively

y few

points change clusters

Complexity is O( n * K * I * d )

I = number of iterations, d = number of attributes

Tan,Steinbach, Kumar

4/18/2004

21

3

2.5

Original Points

15

1.5

0.5

0

-2

-1.5

-1

-0.5

0.5

1.5

2.5

2.5

1.5

1.5

0.5

0.5

-2

-1.5

-1

-0.5

0.5

1.5

-1.5

-1

-0.5

0.5

1.5

Optimal Clustering

Tan,Steinbach, Kumar

-2

Sub-optimal Clustering

4/18/2004

22

Iteration 6

1

2

3

4

5

3

2.5

1.5

0.5

0

-2

-1.5

-1

-0.5

0.5

1.5

Tan,Steinbach, Kumar

4/18/2004

23

Iteration 1

Iteration 2

Iteration 3

2.5

2.5

2.5

1.5

1.5

1.5

0.5

0.5

0.5

-2

-1.5

-1

-0.5

0.5

1.5

-2

-1.5

-1

-0.5

0.5

1.5

-2

It ti 4

Iteration

It ti 5

Iteration

2.5

1.5

1.5

1.5

0.5

0.5

0.5

-0.5

0.5

Tan,Steinbach, Kumar

1.5

0.5

1.5

1.5

2.5

2.5

-1

-0.5

It ti 6

Iteration

-1.5

-1

-2

-1.5

-2

-1.5

-1

-0.5

0.5

1.5

-2

-1.5

-1

-0.5

0.5

4/18/2004

24

z

For each point, the error is the distance to the nearest cluster

To get SSE, we square these errors and sum them.

K

SSE = dist 2 ( mi , x )

i =1 xCi

cluster Ci

Given two clusters, we can choose the one with the smallest

error

One easy way to reduce SSE is to increase K, the number of

clusters

A good clustering with smaller K can have a lower SSE than a poor

clustering

l t i with

ith hi

higher

h K

Tan,Steinbach, Kumar

4/18/2004

25

Iteration 5

1

2

3

4

3

2.5

1.5

0.5

0

-2

-1.5

-1

-0.5

0.5

1.5

Tan,Steinbach, Kumar

4/18/2004

26

Iteration 1

Iteration 2

2.5

2.5

1.5

1.5

0.5

0.5

-2

-1.5

-1

-0.5

0.5

1.5

-2

-1.5

-1

-0.5

0.5

Iteration 3

2.5

1.5

1.5

1.5

2.5

2.5

0.5

0.5

0.5

-1

1

-0.5

05

05

0.5

Tan,Steinbach, Kumar

Iteration 5

-1.5

15

1.5

Iteration 4

-2

2

15

1.5

-2

2

-1.5

15

-1

1

-0.5

05

05

0.5

15

1.5

-2

2

-1.5

15

-1

1

-0.5

05

05

0.5

15

1.5

4/18/2004

27

z

one centroid from each cluster is small.

Ch

Chance

is

i relatively

l ti l smallll when

h K iis llarge

i ht way, and

right

d sometimes

ti

they

th d

dont

t

Tan,Steinbach, Kumar

4/18/2004

28

10 Clusters Example

Iteration 4

1

2

3

8

6

4

2

0

-2

-4

-6

0

10

15

20

x

Starting with two initial centroids in one cluster of each pair of clusters

Tan,Steinbach, Kumar

4/18/2004

29

10 Clusters Example

Iteration 2

8

Iteration 1

8

-2

-2

-4

-4

-6

-6

0

10

15

20

15

20

-2

-2

-4

-4

-6

-6

10

20

Iteration 4

8

Iteration 3

15

10

15

20

10

Starting with two initial centroids in one cluster of each pair of clusters

Tan,Steinbach, Kumar

4/18/2004

30

10 Clusters Example

Iteration 4

1

2

3

8

6

4

2

0

-2

-4

-6

0

10

15

20

x

Starting with some pairs of clusters having three initial centroids, while other have only one.

Tan,Steinbach, Kumar

4/18/2004

31

10 Clusters Example

Iteration 2

8

Iteration 1

8

-2

-2

-4

-4

-6

-6

0

10

15

20

-2

-4

-4

-6

-6

5

10

15

20

15

20

-2

10

x

Iteration

4

x

Iteration

3

15

20

10

Starting with some pairs of clusters having three initial centroids, while other have only one.

Tan,Steinbach, Kumar

4/18/2004

32

z

Multiple runs

Helps,

p , but probability

p

y is not on yyour side

determine initial centroids

z Select more than k initial centroids and then

select among these initial centroids

z

Postprocessing

z Bisecting K-means

z

Tan,Steinbach, Kumar

4/18/2004

33

z

clusters

Several strategies

Choose the point that contributes most to SSE

Choose a point from the cluster with the highest SSE

If there are several empty clusters, the above can be

repeated several times.

Tan,Steinbach, Kumar

4/18/2004

34

z

updated after all points are assigned to a centroid

each assignment (incremental approach)

More expensive

Introduces an order dependency

N

Never

gett an empty

t cluster

l t

Can use weights to change the impact

Tan,Steinbach, Kumar

4/18/2004

35

z

Pre-processing

Normalize the data

Eliminate outliers

Post processing

Post-processing

Eliminate small clusters that may represent outliers

Split loose

loose clusters,

clusters ii.e.,

e clusters with relatively high

SSE

Merge

g clusters that are close and that have relatively

y

low SSE

Can use these steps during the clustering process

ISODATA

SO

Tan,Steinbach, Kumar

4/18/2004

36

Bisecting K-means

z

hierarchical clustering

Tan,Steinbach, Kumar

4/18/2004

37

Tan,Steinbach, Kumar

4/18/2004

38

Limitations of K-means

z

differing

Sizes

Densities

Non-globular shapes

outliers.

Tan,Steinbach, Kumar

4/18/2004

39

K-means (3 Clusters)

Original Points

Tan,Steinbach, Kumar

4/18/2004

40

K-means (3 Clusters)

Original Points

Tan,Steinbach, Kumar

4/18/2004

41

Original Points

Tan,Steinbach, Kumar

K-means (2 Clusters)

4/18/2004

42

Original Points

K-means Clusters

y clusters.

Find parts of clusters, but need to put together.

Tan,Steinbach, Kumar

4/18/2004

43

Original Points

Tan,Steinbach, Kumar

K-means Clusters

4/18/2004

44

Original Points

Tan,Steinbach, Kumar

K-means Clusters

4/18/2004

45

- SEGMENTATION USING ‘NEW’ TEXTURE FEATUREUploaded byAnonymous IlrQK9Hu
- ClusteringUploaded byPhạmTrungKiên
- LTE.pdfUploaded byalinir100
- FINDING SIMILARITY BETWEEN RAGAS USING K-MEANS CLUSTERINGUploaded byJournal 4 Research
- Lecture 10 Clustering BasicUploaded byRifki Maulana
- Fuzzy Clustering as an Intrusion Detection TechniqueUploaded byeditor9891
- e1071Uploaded byLuis Campos Airola
- 3 Cluster AnalysisUploaded byRizal Espe Setya Perdana
- On the Use of Side Information for MiningUploaded bykpramyait
- Image Segment at i OnUploaded byRam Choudhary
- reference 2.pdfUploaded byPolisetty Geetha Manohari
- M.TechUploaded byAli
- A Survey on: Hyper Spectral Image Segmentation and Classification Using FODPSOUploaded byEditor IJRITCC
- A Review of Network Traffic Analysis and Prediction TechniquesUploaded byPatrick Gelard
- kmeans1Uploaded byAnibal
- Wireless Fingerprinting Wisec08Uploaded byRishabh Goyal
- 285 Project PaperUploaded byΒασίλης Πλιάσσας
- Increasing Information Shareability by Using NTBS Clustering Approach for VANETUploaded byAnonymous vQrJlEN
- Sample Data Mining Project PaperUploaded byAdisu Wagaw
- 1-s2.0-S1569190X16302350-mainUploaded byRajmeet Singh
- Performing Cluster AnalysisUploaded byTamanna Bavishi Shah
- reportUploaded byleo2004
- SAS analyst/programmer/modelerUploaded byapi-79290812
- Dashboard s asssssssssssssssssss sssssssssssssssssssssssssssss aasssssssssssssUploaded byBala Sri
- Jurnal Inter MatematikaUploaded byPatrice Ester Irala-Paruntu
- Dwm CourseUploaded bynaresh.kosuri6966
- High Accurate Internet TrafficUploaded byAuliya Burhanuddin
- IRJET-Facility Location Decisions in the Automobile IndustryUploaded byIRJET Journal
- Jiajia Sun_2011_SBGf_Geophysical Inversion Constrained by PetrophysicsUploaded bykeoley
- Example Process Chain - OpenForisToolkit WikiUploaded byagonas

- Chap009 Test Bank(1) SolutionUploaded byMax Somerton
- Employee Benefits IndiaUploaded bybaskarbaju1
- HouseUploaded bybaskarbaju1
- dwdm u-2Uploaded bybaskarbaju1
- PS3aUploaded byShrey Budhiraja
- Gmail - Interview Call Letter With Capgemini for F2F on 25th June 2016- Oracle Apps TechnicalUploaded bybaskarbaju1
- Real Time ScenarioesUploaded bybaskarbaju1
- SCF QPUploaded bybaskarbaju1
- cn-unit5Uploaded bybaskarbaju1
- APSRTC Official Website for Online Bus Ticket Booking - APSRTConlineUploaded bybaskarbaju1
- Interview Questions Baseline - Latest Version - CopyUploaded bybaskarbaju1
- APSRTC Official Website for Online Bus Ticket Booking - APSRTConlineUploaded bybaskarbaju1
- jntuk r10 syllabusUploaded bysubramanyam62
- Syllabus 07 08 It III II Computer NetworksUploaded bybaskarbaju1
- Guidelines to Upload Documents - NEw SVSUploaded byYash Kalyani
- IPF KakinadaUploaded bybaskarbaju1
- h10719 Isilon Onefs Technical Overview WpUploaded bybaskarbaju1
- TestimonialsUploaded bybaskarbaju1
- AssignmentUploaded bybaskarbaju1
- Control mUploaded bybaskarbaju1
- MBA(EE) 14-15 Project ScheduleUploaded bybaskarbaju1
- Medical Fitness Test – FAQ’sUploaded bybaskarbaju1
- New DocUploaded bybaskarbaju1
- Intra Day MarginsUploaded bybaskarbaju1
- 15472Uploaded bybaskarbaju1
- Security Analysis_Investment ManagementUploaded bybaskarbaju1
- FM Prob Post Mid Term EMBA SAPMUploaded bybaskarbaju1
- 5Uploaded bybaskarbaju1
- Nivesh Portfolio Tracker_October 31, 2014Uploaded bybaskarbaju1

- Introduction to HCIUploaded byCHaz Lina
- lesson Plan Banat (3).docxUploaded byBenjie Fiel Andriano
- MISVC Software User GuideUploaded bynbr67sce
- CSS Hands OnUploaded bymonisaanth t r
- pythonUploaded bysrikanthpc
- 7 Stages of a Typical CFD SimulationUploaded byHi-Tech CFD
- Comparision of Apple and Q mobileUploaded bySyed Hashim Shah Hashmi
- Optimal Client Server Assignment for Internet Distributed Systems DocxUploaded bybhagathnagar
- UTA 2011 AUVSI SUAS Journal Paper_full MicropilotUploaded byCristhian Murcia
- Atomic Bomb Tutorial En blenderUploaded byfiven
- FRDMKE06UGUploaded byBOLFRA
- Acer Al1916wpUploaded byDragos-Neculai Terlita-Rautchi
- FortiGate_IPS_Guide_01_30005_0080_20070725Uploaded bywci98
- SQL for Testing ProfessionalUploaded byAnshuman Singh
- Biopython Tutorial and CookbookUploaded byninagika
- Fractal Image Compression - A Review.Uploaded byIJAR Journal
- MemTest86_User_Guide_UEFI.pdfUploaded byPablo Cerda
- QuizUploaded bycicroy
- Algorithms for Two-dimensional Bin Packing and Assignment Problems - Lodi (1)Uploaded bytrivilintarolas
- Omron PLC StudyUploaded byferoztechit
- List of IEC Standards - Wikipedia, The Free EncyclopediaUploaded byAnonymous AvwRPJ
- Applying Dijkstra Algorithm for Solving Neutrosophic Shortest Path ProblemUploaded byAnonymous 0U9j6BLllB
- dragonjeen python-noteUploaded byapi-295680726
- ABB Various Releases .pdfUploaded byabhi_26t
- atomicwallet-whitepaperUploaded bykawsaya
- 42-Queues - Laravel - The PHP Framework for Web ArtisansUploaded byLabel Excellence
- vi editor tutorialUploaded byNurcahyo Teguh Permadi
- va-deployment-guide.pdfUploaded byVlad Alexandru Ionut
- The Many Ways of Programming an ARM Cortex-M MicrocontrollerUploaded byfazrez
- Ethical Hacking - The 5 Phases Every Hacker Must Follow2Uploaded bygreenostrich