Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm (1111.6285v2)

Published 27 Nov 2011 in stat.ML, cs.CV, and stat.AP

Abstract: The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. However there are different interpretations in the literature and there are different implementations of the Ward agglomerative algorithm in commonly used software systems, including differing expressions of the agglomerative criterion. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.

Citations (2,810)

View on Semantic Scholar

Summary

The paper clarifies critical differences in dissimilarity input treatment and dendrogram scaling that affect Ward’s clustering outcomes.
It employs detailed case studies to distinguish algorithmic variations between Ward1 and Ward2 implementations across software systems.
The findings guide researchers in selecting and verifying hierarchical clustering parameters for consistent and accurate data analysis.

Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

The paper "Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm" by Fionn Murtagh and Pierre Legendre offers an in-depth review and analysis of Ward's error sum of squares hierarchical clustering method. This method, originally described by Ward in 1963, has seen extensive use and numerous generalizations. Murtagh and Legendre address the complexities and variances in its implementation across different software systems, aiming to clarify these distinctions for developers and researchers in data analysis.

Key Issues in Ward's Method Implementations

The paper identifies several critical factors contributing to the confusion surrounding Ward's method:

Input Dissimilarities: Whether these inputs should be squared before processing.
Output Dendrogram Heights: Whether or not to take the square root of these heights.
Algorithmic Differences: Variations in the loop structure of the stepwise dissimilarity-based agglomerative algorithm.

The authors stress the importance of recognizing these distinctions and urge users to verify the specifics of their chosen software implementations.

Definitions and Concepts

The paper provides a detailed overview of various related terms and concepts:

Distance and Dissimilarity: While distance must satisfy the triangular inequality, dissimilarity does not necessarily have to.
Ultrametric Distance: Defines hierarchical clustering but requires adherence to the strong triangular inequality.
Error Sum of Squares and Variance: Fundamental in defining the clustering criterion, influencing the clustering outcomes.

Variance in Clustering Criteria

Ward's method can be expressed with different clustering criteria, depending on whether the population variance or sample variance is utilized. The paper emphasizes understanding these distinctions as they impact the implementation outcomes.

Lance-Williams Dissimilarity Update Formula

The Lance-Williams recurrence formula allows the update of dissimilarities post-agglomeration. This formula relates to different clustering criteria based on variance or minimum variance, which are fundamental in understanding Ward's method.

Implementation Comparisons

The authors compare the Ward1 and Ward2 implementations:

Ward1 uses squared Euclidean distances as input, relying on a weighted sum of distances.
Ward2 optimizes the variance index directly using the Euclidean distances without squaring them.

Despite their differences, both implementations aim to minimize the change in variance, reflecting the core essence of Ward's method.

Case Studies

Several experiments comparing different implementations of Ward's method were conducted using R software packages. These experiments highlighted that:

Experiment 1: Shows that hclust.PL with method "ward.D2" and agnes with method "ward" produce identical results.
Experiment 2: Demonstrates that hclust on squared input is identical to hclust.PL with method "ward.D" on squared input.
Experiment 3: Validates that modifying the input form changes the clustering outputs significantly.
Experiment 4: Highlights the equivalence of Ward1 and Ward2 outputs by squaring or not squaring input dissimilarities and using different options.

Practical Implications

The paper's findings underscore the importance of careful implementation and parameter selection in hierarchical clustering methods. Users and developers are encouraged to:

Verify the nature of input dissimilarities (squared or not).
Understand the updates and transformations applied to these inputs throughout the clustering process.
Ensure consistency in reporting and interpreting dendrogram heights, especially when developing or utilizing various software packages.

Future Directions

The work suggests potential improvements in the clarity and robustness of software implementations. For future theoretical developments, exploring the implications of different distance metrics and their effects on clustering outcomes can provide deeper insights into hierarchical clustering methodologies.

Conclusion

By clarifying the differences in Ward's method implementations and their effects on clustering outcomes, Murtagh and Legendre contribute significantly to the field of hierarchical clustering. Their work serves as a valuable guide for researchers and practitioners aiming to ensure consistency and accuracy in their data analysis endeavors.

## References

- Anderson, M.R., 1973. Cluster Analysis for Applications, Academic.
- Batagelj, V., 1988. Generalized Ward and related clustering problems, in H.H. Bock, ed., Classification and Related Methods of Data Analysis, North-Holland, pp. 67--74.
- Benzécri, J.P., 1976. L'Analyse des Données, Tome 1, La Taxinomie, Dunod, (2nd edn.; 1973, 1st edn.).
- Cailliez, F., and Pages, J.-P., 1976. Introduction a l'Analyse des Données, SMASH (Société de Mathématiques Appliquées et Sciences Humaines).
- Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179--188.
- Jain, A.K., and Dubes, R.C., 1988. Algorithms for Clustering Data, Prentice-Hall.
- Jambu, M., 1978. Classification Automatique pour l'Analyse des Données. I. Methodes et Algorithmes, Dunod.
- Jambu, M., 1989. Exploration Informatique et Statistique des Données, Dunod.
- Kaufman, L., and Rousseeuw, P.J., 1990. Finding Groups in Data: An Introduction to Cluster Analysis, Wiley.
- Lance, G.N., and Williams, W.T., 1967. A general theory of classificatory sorting strategies. 1. Hierarchical systems, The Computer Journal, 9, 4, 373--380.
- Legendre, P., and Fortin, M.-J., 2010. Comparison of the Mantel test and alternative approaches for detecting complex relationships in the spatial analysis of genetic data. Molecular Ecology Resources, 10, 831--844.
- Legendre, P., 2011. const.clust, R package to compute space-constrained or time-constrained agglomerative clustering. http://www.bio.umontreal.ca/legendre/indexEn.html
- Legendre, P., and Legendre, L., 2012. Numerical Ecology, 3rd ed., Elsevier.
- Le Roux, B., and Rouanet, H., 2004. Geometric Data Analysis: From Correspondence Analysis to Structured Data Analysis, Kluwer.
- Murtagh, F., 1983. A survey of recent advances in hierarchical clustering algorithms, The Computer Journal, 26, 354--359.
- Murtagh, F., 1985. Multidimensional Clustering Algorithms, Physica-Verlag.
- Murtagh, F., 2000. Multivariate data analysis software and resources, http://www.classification-society.org/csna/mda-sw
- Székely, G.J., and Rizzo, M.L., 2005. Hierarchical clustering via joint between-within distances: extending Ward's minimum variance method, Journal of Classification, 22 (2), 151--183.
- Ward, J.H., 1963. Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 236--244.
- Wishart, D., 1969. An algorithm for hierarchical classifications, Biometrics, 25, 165--170.
- XploRe, 2007. http://fedc.wiwi.hu-berlin.de/xplore/tutorials/xaghtmlnode53.html

PDF Markdown