Академический Документы
Профессиональный Документы
Культура Документы
Overview
Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview
Like factor analysis, cluster analysis also examines an entire set of interdependent relationships The primary objective of cluster analysis is to classify objects into relatively homogeneous groups based on the variables In essence, cluster analysis is the obverse of factor analysis reducing the number of objects rather than variables Clustering is very useful for dening target markets
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Market segmentation
Recognising customers differences is the key to successful marketing It can lead to a closer matching between products and customer needs Consumers may be clustered on the basis of benets sought from the purchase of a product (benet segmentation) Understanding buying behaviour, for example what kind of strategies do car buyers use for buying a car? Identifying new product opportunities clustering brands and products Selecting test markets
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Cluster analysis
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem
Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups called clusters Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters
Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Cluster analysis
Both cluster analysis and discriminant analysis are concerned with classication However, discriminant analysis requires prior knowledge of the group membership for each object or case included, to develop the classication rule In contrast, in cluster analysis there is no a priori information about the group or cluster membership for any of the objects Groups or clusters are suggested by the data, not dened a priori
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Consumers were asked to express their degree of agreement with the following statements Use 1 = disagree, 7 = agree
1. 2. 3. 4. 5. 6. Shopping is fun Shopping is bad for your budget I combine shopping with eating out I try to get the best buys while shopping I dont care about shopping You can save a lot of money by comparing prices
Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Construction of distances between pairs of objects (subjectively or based on a formal measure) The development of a routine, or algorithm, for forming clusters on the basis of these distances The unit of analysis is either the distance or dissimilarity between objects or the closeness, proximity or similarity
Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem
ij =
k =1
(xik xjk )2
Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters
For p = 2, the Euclidean distance corresponds to the straight line distance between the two points (xi 1 , xi 2 ) and (xj 1 , xj 2 ) A weight wk could be assigned to variable k if it was believed that more importance should be attached to some variables over others giving
p
Interpreting and proling the clusters Assess reliability and validity Example
ij =
k =1
wk (xik xjk )2
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure
ij =
k =1
|xik xjk |
Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Compared to Euclidean distance, this gives less relative weight to large differences The Chebychev distance is ij = max |xik xjk |
k
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Non-hierarchical clustering clusters are formed by adjusting the membership of those clusters existing at any stage in the process by moving individuals in or out
Some use multivariate analysis of variance ideas in the sense that they divide the objects into groups such that the between groups variation is large and the within groups variation is small
Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem
At each stage the two clusters with the smallest increase in the overall sum of squares within cluster distances are combined The method is designed for the case where the elements of the data matrix are metrical variables since only then is it sensible to compute the sums of squares involved
Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Interpreting and proling clusters involves examining the cluster centroids The centroids enable us to describe each cluster by assigning it a name or label It is often helpful to prole the clusters in terms of variables that were not used for clustering These may include demographic, psychographic, product usage, media usage or other variables
Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
A B C D E
6 7
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
We repeat the procedure until all objects are in one cluster More specically, the next most similar pair of objects is Donna and Eve (A, B) C (D, E) (A, B) 7 9 C 6 (D, E)
Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
The agglomeration schedule is: Stage Initial 1 2 3 4 Number of clusters 5 4 3 2 1 Clusters (A) (B) (C) (D) (E) (A, B) (C) (D) (E) (A, B) (C) (D, E) (A, B) (C, D, E) (A, B, C, D, E) Distance level 0 3 5 6 7
Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview
The sets of clusters produced by the farthest neighbour method coincide with those from the nearest neighbour method, although the distance levels differ at which the clusters merge Generally, the nearest and farthest neighbour methods give different results, sometimes very different, especially when there are many objects or individuals to be clustered Both methods only depend on the ordinal properties of the distances
Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example
Overview Cluster analysis Statistics associated with cluster analysis Example Formulate the problem Select a distance or similarity measure Select a clustering procedure Decide on the number of clusters Interpreting and proling the clusters Assess reliability and validity Example