Вы находитесь на странице: 1из 4

WARDS CLUSTERING ALGORITHM

Wards clustering algorithm is a popular procedure within the set of algorithms called agglomerative hierarchical methods. These methods apply a routing strategy to reproduce a hierarchical or treelike structure among n objects. Starting with n clusters, where each object is a cluster, an agglomerative hierarchical method proceeds in a stagewise manner to reduce the number of clusters one at a time until all n objects are in one cluster. See also HIERARCHICAL CLUSTER ANALYSIS. WARDS ALGORITHM The following steps describe the usual implementation of an (agglomerative) hierarchical cluster analysis: 1. Dene a triangular matrix that shows a measure of similarity or proximity between each pair of n objects (see PROXIMITY DATA). This matrix has n(n 1)/2 entries and is often constructed by computing proximity measures such as Euclidean distances or correlations based on an original n m data matrix of n objects and m attributes or variables. 2. Search the proximity matrix for the most similar pair of clusters and join these two clusters. The proximity value between the two merged clusters is called the criterion or objectivefunction value for stage k, zk . 3. Update the proximity matrix by recomputing proximity values between the new cluster and all other clusters. The new proximity matrix has one less row (or column) than the preceding proximity matrix. 4. Repeat steps 2 and 3 until all objects reside in one cluster. The result is a treelike structure that shows which two clusters were merged at each stage k, k = 1, . . . , n 1, and the corresponding criterion values zk for each stage, where stage k corresponds to n k clusters.

Differences among hierarchical methods primarily center around two procedural steps: the denition of the most similar pair of clusters (step 2) and the method of updating similarity measures from one stage to the next (step 3). Lance and Williams [4] developed a generalized transformation model that elegantly denes the measure of proximity in step 3 for six popular hierarchical models. Later, this transformation model was extended by Wishart [12] to include Wards method, which is sometimes called the minimum variance or Wards error sum of squares method. Interestingly, the original article by Ward [11] described a generalized hierarchical method similar to, but no less general than, the four-step description above. In particular, Ward specied that the loss from joining two groups (i.e., the criterion value in step 2) is best expressed by whatever objective function makes sense to the investigator and then described various objective functions that he used in his research for the Air Force [11, p. 237]. Indeed, in a subsequent communication, Wards own preference for naming his model was the MAXOF (MAXimize an Objective Function) clustering model. In his numerical example, Ward used the sum of squared deviations about the group mean or error sum of squares, which in multidimensional Euclidean space is dened for cluster c as
m nc

ESSc =
j=1 i=1

(xcij xcj )2 ,

(1)

where m is the number of attributes; nc is the number of objects in cluster c; xcij is the measure (raw, standardized, etc.) of attribute j on object i within cluster c; and xcj is the mean of the jth attribute in cluster c. The overall error sum of squares objective function in stage k is then given by
nk

ESSk =
c=1

ESSc ,

(2)

and the loss or increase in ESS based on the fusion of two clusters in stage k is given by zk = ESSk ESSk1 , (3)

Encyclopedia of Statistical Sciences, Copyright 2006 John Wiley & Sons, Inc.

WARDS CLUSTERING ALGORITHM

which denes an error sum of squares or minimum variance criterion for step 3. Subsequently, Wishart [12, p. 167] showed that the criterion in (3) is equivalent to onehalf of the squared Euclidean distance between two joined single-object clusters and proved that the use of a squared Euclidean distance proximity matrix is functionally equivalent to Wards ESS example and implementable through the transformation function rst described by Lance and Williams. Thus Wards early choice of an ESS example, Wisharts link to Euclidean distance, and the Lance and Williams transformation function, the attractive but not necessarily valid conceptualization of clusters as swarms in Euclidean space, the closeness of clusters based on the proportionality between increase in the ESS and the squared Euclidean distance separating merged-cluster centroids [1, p. 143], and subsequent implementations by commercial computer packages all came together to transform Wards perfectly general algorithm into an algorithm with an exclusive, distancebased, minimum variance focus. See Ward [11, p. 241] and Anderberg [1, p. 43] for numerical examples of Wards method based on the ESS criterion. Anderberg [1] and Everitt [3] are excellent sources for descriptions of the various hierarchical algorithms.

of Cattells scree test. Binder [2] has proposed a Bayesian approach to estimating the best number of clusters, but its usefulness is restricted to small problems. See [10] for other procedures. SOFTWARE PACKAGES There are many sources of computer software for implementing cluster analyses, but two commercially available packages stand out, both of which include Wards method: SAS and CLUSTAN (see STATISTICAL SOFTWARE). SAS [10] is a comprehensive system for data analysis that is widely used in universities and other research-oriented groups. It implements seven clustering procedures, including four common hierarchical methods. It uses the n m multivariate data matrix as input, has procedures for printing dendrograms , and prints reports that include a criterion for estimating the best number of clusters. CLUSTAN [13] is by far the most comprehensive clustering package available. It includes 28 clustering procedures, 10 of which are hierarchical. Input data options include multivariate data matrices and user-dened similarity matrices. Users have a choice of 40 proximity measures, depending on data types (numeric or binary) and user needs. The package also includes relocation routines for improving an initial clustering, various forms of graphical output, stopping-rule procedures from Mojena [7], the ability to read data les previously created through the SPSS software system, and a conversational preprocessor. EVALUATION Wards method gures prominently in the literature that addresses the evaluation of clustering algorithms. The effectiveness of Wards method as a clustering procedure can be viewed from various perspectives. First, does it give an optimal solution with respect to minimum error sum of squares? Second, just how good is it in recovering cluster structure, i.e., in identifying both the correct number of clusters and the correct membership of objects? Finally, how does it compare to other clustering procedures?

STOPPING RULES Users of hierarchical clustering algorithms often wish to determine the best number of clusters. Mojena [7] and Mojena and Wishart [8] proposed and evaluated three statistical rules for this task, based on the behavior of the criterion vector z as a monotonically increasing function. These rules predict a signicant increase from zk to zk+1 , if any, which implies an undesirable fusion, and the stage with n k clusters as best. Wards method, together with a simple uppertail rule, gave consistently good results across Monte Carlo data sets that conceptualized clusters as compact swarms in Euclidean space. Morey et al. [9] further conrmed successful results with this rule, along with an alternative rule based on an adaptation

WARDS CLUSTERING ALGORITHM

Wards method is a heuristic rather than an optimization algorithm. As such, it does not ensure that the resulting clustering yields an overall minimum variance solution. Indeed, it would be very surprising if it were to yield an optimal or even near-optimal solution except for trivial data sets. Optimal solutions to the ESS clustering problem have been generated by dynamic and 01 integer programming formulations, but these severely limit the size (n) of the problem due to storage and computational constraints. An attractive strategy suggested by Wishart [13] and implemented by Morey et al. [9] is to generate an initial solution by Wards method and then systematically reassign objects by using relocation techniques. The evaluation literature primarily reports on the recovery performances of clustering techniques and on their comparisons. The literature tends to favor Wards method, although results are mixed. It would now appear that some tentative conclusions are emerging based on work by Morey et al. [9] and Milligan et al. [6], and on the thorough review by Milligan [5]: 1. Wards method performs quite well across a variety of data sets that include Monte Carlo mixtures, ultrametric data, and real data; however, performance can vary widely depending on the selection of clustering parameters such as proximity measures, and on certain data-set characteristics such as cluster size and cluster overlap. 2. The ESS focus of Wards method dictates the use of squared Euclidean distance as a measure of proximity; yet this measure of association may not be warranted for all studies, as it mixes together object associations due to shape, scatter, and height. If only shape is of interest, then correlationtype measures are more appropriate. In this case, an algorithm such as the group average method based on a correlation criterion can give more legitimate clustering results. 3. The extent of cluster overlap affects the performance of various algorithms. Wards method appears to give the best recovery as overlap increases, but the

group average method seems to outperform Wards method with nonoverlapping structures. 4. Wards method tends to fuse small clusters and appears to favor the creation of clusters having roughly the same number of observations. The group average method is as good or better when clusters are of unequal size. 5. Wards method and other hierarchical algorithms are not very robust with respect to various types of error perturbations, such as outliers . 6. Wards method is sensitive to prole elevation, with a tendency to give distinct, but not necessarily valid, clusters along the principal component of a multivariate distribution. If the elevation component is pervasive, then the solution can be valid, as in the alcohol abuse study by Morey et al. [9]. REFERENCES
1. Anderberg, M. R. (1973). Cluster Analysis for Applications. Academic, New York. 2. Binder, D. A. (1981). Biometrika, 68, 275285. 3. Everitt, B. S. (1980). Cluster Analysis, 2nd ed. Heinemann, London, England. 4. Lance, G. N. and Williams, W. T. (1967). Computer J., 9, 373380. 5. Milligan, G. W. (1981). Multivariate Behav. Res., 16, 379407. 6. Milligan, G. W., Soon, S. C., and Sokol, L. M. (1983). IEEE Trans. Pattern Analysis Machine Intelligence, PAMI-5, 4047. 7. Mojena, R. (1977). Computer J., 20, 359363. 8. Mojena, R. and Wishart, D. (1980). COMPSTAT 1980 Proc. Physica-Verlag, Vienna, Austria, pp. 426432. 9. Morey, L. C., Blasheld, R. K., and Skinner, H. A. (1983). Multivariate Behav. Res., 18, 309329. 10. SAS USERs GUIDE: Statistics (1982). SAS Institute, Cary, NC. 11. Ward, J. H. (1963). J. Amer. Statist. Ass., 58, 236244. 12. Wishart, D. (1969). Biometrics, 22, 165170. 13. Wishart, D. (1982). CLUSTAN User Manual. Program Library Unit, Edinburgh University, Edinburgh, Scotland.

WARDS CLUSTERING ALGORITHM

See also DENDROGRAMS; HIERARCHICAL CLUSTER ANALYSIS; PROXIMITY DATA; and RECURSIVE PARTITIONING.

RICHARD MOJENA

Вам также может понравиться