Predictive Analytics and Data Mining: Segmentation Using Clustering

Predictive Analytics and Data Mining
Segmentation using Clustering

Automatic Cluster Detection
DM techniques used to find patterns in data

• Not always easy to identify
 No observable pattern
 Too many patterns
Decomposition (break down into smaller pieces)
Automatic Cluster Detection is useful to find “better behaved” clusters

of data within a larger dataset
Automatic Cluster Detection
K-Means clustering algorithm – depends on a geometric interpretation of the

data
Other automatic cluster detection (ACD) algorithms include:

• Gaussian mixture models
• Agglomerative clustering
• Divisive clustering
• Self-organizing maps (SOM) - Neural Nets
ACD is a tool
• No preclassified training data set
• No distinction between independent and dependent variables
• Marketing clusters referred to as “segments”
• Customer segmentation is a popular application of clustering
ACD rarely used in isolation – other methods follow up

Segmentation
Organizing Customers into groups with similar traits, product

preferences or expectations
• Demographic Characteristics
• Psychographics (interests, attitudes, opinions, personality, values,
lifestyles…)
• Desired benefits from products/services
• Past-purchase or product use behaviors
K-means Clustering
“K” – circa 1967 – this algorithm looks for a fixed number of

clusters which are defined in terms of proximity of data
points to each other
How K-means works (see next slide figures):

• Algorithm selects K data points randomly
• Assigns each of the remaining data points to one of K clusters
• Calculate the mean of cases of each cluster and move the K data
points/ cluster seeds to the mean of the cluster
• Reassign cases closest to the new seed I as belonging to cluster I
• Euclidean distance (dist. Between two points (u1,v1) and (u2,v2) is
the sq. root (sq. (u1-u2) + sq. (v1-v2)
K-means Clustering
K-means Clustering
Resulting clusters describe

underlying structure in the
data, however, there is no
one right description of that
structure
Similarity & Difference
Automatic Cluster Detection is quite simple for a software program to

accomplish – data points, clusters mapped in space
However, business data points are not about points in space but about
purchases, phone calls, airplane trips, car registrations, etc. which
have no obvious connection to the dots in a cluster diagram
Clustering business data requires some notion of natural association –

records (data) in a given cluster are more similar to each other than
to those in another cluster
For DM software, this concept of association must be translated into

some sort of numeric measure of the degree of similarity
Most common translation is to translate data values (eg., gender, age,

product, etc.) into numeric values so can be treated as points in
space
If two points are close in geometric sense then they represent similar
data in the database
Business variable (fields) types:

• Categorical (eg., mint, cherry, chocolate)
• Ranks (eg., freshman, soph, etc. or valedictorian, salutatorian)
• Intervals (eg., 56 degrees, 72 degrees, etc)
• True measures – interval variables that measure from a meaningful
zero point
 Age, weight, height, length, tenure are good examples
Pattern Discovery
“…the discovery of interesting, unexpected, or valuable structures in

large data sets.”
- David Hand, Professor of Statistics, Imperial College
- “If you’ve got terabytes of data, and you’re relying on data mining to
find interesting things in there for you, you’ve lost before you’ve
even begun. You really need people who understand what it is they
are looking for – and what they can do with it once they find it.”
- Herb Edelstein, President of Two Crows Corporation
Inputs (Desirable Charateristics)
Meaningful to the analysis objective
Relatively independent
Limited in number
Have a measurement level of Interval
Have low kurtosis and skewness statistics

What Value of K to Use
Subject Matter Knowledge (there are most likely five groups)
Convenience (It is convenient to market to 3 or 4 groups)
Constraints (You have 5 products and need 5 segments)
Arbitrarily (always pick 10)
Based on the Data (Ward’s method or Elbow Criterion )
(Elbow Plot – plot of ratio of within cluster variance to between cluster

variance vs the no. of clusters)
Ward’s Method
Algorithm for Hierarchical cluster analysis
In this method each observation is considered a cluster, and the

clusters are hierarchically joined, based on minimizing the ratio of
variation within clusters to between clusters
Based on a statistical analysis, the number of clusters is selected
This number of clusters is used for k-means cluster analysis

Ward’s Method in SAS Enterprise Miner
Preliminary k-means clustering on data to save many cluster centroids

(default 50)
Ward’s hierarchical clustering on saved cluster centroids (k, then k-1, k-

2 etc) to determine ideal value of k (greater than minimum specified
in selection criteria and has a CCC (cubic clustering criterion)
statistic greater than threshold specified in selection criteria)
K-means clustering on the original dataset using k from step 2

Evaluating Clusters
What does it mean to say that a cluster is “good”?

• Clusters should have members that have a high degree of similarity
• Standard way to measure within-cluster similarity is variance* –
clusters with lowest variance is considered best
• Cluster size is also important so alternate approach is to use average
variance**
* The sum of the squared differences of each element from the mean
** The total variance divided by the size of the cluster
Evaluating Clusters
Finally, if detection identifies good clusters along with weak ones it

could be useful to set the good ones aside (for further study) and run
the analysis again to see if improved clusters are revealed from only
the weaker ones
Validating Clusters
Interpretation
Goal: obtain meaningful and useful clusters
Caveats:
(1) Random chance can often produce apparent clusters
(2) Different cluster methods produce different results
Solutions:
Obtain summary statistics
Also review clusters in terms of variables not used in clustering
Label the cluster (e.g. clustering of financial firms in 2008 might

yield label like “midsize, sub-prime loser”)
Desirable Cluster Features
Stability – are clusters and cluster assignments sensitive to slight

changes in inputs? Are cluster assignments in partition B similar to
partition A?
Separation – check ratio of between-cluster variation to within-

cluster variation (higher is better)
CLUSTERING USING SAS ENTERPRISE MINER
Grocery Store Case Study
Analysis goal:
Where should you open new grocery store locations?
Group geographic regions into segments based on
income, household size, and population density.
Analysis plan:
 Select and transform segmentation inputs.
 Select the number of segments to create.
 Create segments with the Cluster tool.
 Interpret the segments.
47

Predictive Analytics and Data Mining: Segmentation Using Clustering

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Predictive Analytics and Data Mining: Segmentation Using Clustering

Загружено:

Авторское право:

Доступные форматы

Predictive Analytics and Data Mining

Segmentation using Clustering

DM techniques used to find patterns in data

Decomposition (break down into smaller pieces)

Automatic Cluster Detection is useful to find “better behaved” clusters

K-Means clustering algorithm – depends on a geometric interpretation of the

Other automatic cluster detection (ACD) algorithms include:

ACD rarely used in isolation – other methods follow up

Organizing Customers into groups with similar traits, product

“K” – circa 1967 – this algorithm looks for a fixed number of

How K-means works (see next slide figures):

Resulting clusters describe

Automatic Cluster Detection is quite simple for a software program to

Clustering business data requires some notion of natural association –

For DM software, this concept of association must be translated into

Most common translation is to translate data values (eg., gender, age,

Business variable (fields) types:

“…the discovery of interesting, unexpected, or valuable structures in

Meaningful to the analysis objective

Have a measurement level of Interval

Have low kurtosis and skewness statistics

Subject Matter Knowledge (there are most likely five groups)

Convenience (It is convenient to market to 3 or 4 groups)

Constraints (You have 5 products and need 5 segments)

Arbitrarily (always pick 10)

Based on the Data (Ward’s method or Elbow Criterion )

(Elbow Plot – plot of ratio of within cluster variance to between cluster

Algorithm for Hierarchical cluster analysis

In this method each observation is considered a cluster, and the

Based on a statistical analysis, the number of clusters is selected

This number of clusters is used for k-means cluster analysis

Preliminary k-means clustering on data to save many cluster centroids

Ward’s hierarchical clustering on saved cluster centroids (k, then k-1, k-

K-means clustering on the original dataset using k from step 2

What does it mean to say that a cluster is “good”?

Finally, if detection identifies good clusters along with weak ones it

Goal: obtain meaningful and useful clusters

Obtain summary statistics

Also review clusters in terms of variables not used in clustering

Label the cluster (e.g. clustering of financial firms in 2008 might

Stability – are clusters and cluster assignments sensitive to slight

Separation – check ratio of between-cluster variation to within-

Вам также может понравиться