Stop using the elbow criterion for k-means and how to choose the number of clusters instead

Published 23 Dec 2022 in stat.ML and cs.LG | (2212.12189v1)

Abstract: A major challenge when using k-means clustering often is how to choose the parameter k, the number of clusters. In this letter, we want to point out that it is very easy to draw poor conclusions from a common heuristic, the "elbow method". Better alternatives have been known in literature for a long time, and we want to draw attention to some of these easy to use options, that often perform better. This letter is a call to stop using the elbow method altogether, because it severely lacks theoretic support, and we want to encourage educators to discuss the problems of the method -- if introducing it in class at all -- and teach alternatives instead, while researchers and reviewers should reject conclusions drawn from the elbow method.

Abstract PDF Upgrade to Chat

Citations (62)

View on Semantic Scholar

Summary

The paper critiques the elbow criterion by demonstrating its theoretical shortcomings and practical limitations in accurately identifying cluster structures.
It evaluates alternative methods such as the Variance Ratio Criterion, Dunn Index, Davies-Bouldin Index, BIC, and Gap statistic to enhance clustering validity.
The study calls for abandoning heuristic methods in favor of statistically grounded approaches, urging researchers to adopt improved practices in clustering analysis.

Critique of the Elbow Criterion in K-Means Clustering

The paper "Stop using the elbow criterion for k-means" by Erich Schubert provides a critical analysis of the elbow criterion commonly used for determining the number of clusters, $k$ , in k-means clustering. The author argues against the prevalent use of this heuristic due to its significant theoretical shortcomings and offers alternative methods which are better supported by existing clustering literature.

Analysis of K-Means and Its Challenges

K-means clustering, a widely taught and utilized method, is appreciated for its simplicity and computational efficiency. The algorithm operates by partitioning data points into $k$ clusters in such a way that the inertia, or the within-cluster sum of squares (SSE), is minimized. Despite these advantages, selecting the appropriate number of clusters remains a pivotal challenge. The elbow method, which relies on visual inspection of a plotted SSE curve to detect a point where the rate of decrease sharply changes, is a common approach to address this challenge.

Critique of the Elbow Method

Schubert's paper highlights that the elbow method offers minimal theoretical support and can easily lead to misleading conclusions. While it may appear effective on artificially well-separated datasets, it tends to produce similar SSE curves regardless of the data's inherent cluster structure, including datasets with uniform or normal noise.

The paper investigates several attempts to formalize the elbow detection algorithmically. These methods often involve heuristic approaches that lack robust theoretical underpinning, such as measuring curvature or fitting linear models, which are prone to sensitivity based on scaling and parameter ranges. As demonstrated in the paper's experiments, these methods frequently fail to identify true cluster structures, further questioning their utility.

Alternative Methods

Rather than relying on the elbow criterion, the paper recommends various established alternatives for selecting $k$ :

Variance-based Criteria: The Variance Ratio Criterion (VRC), which compares the between-cluster variance with the within-cluster variance, offers a more statistically grounded solution.
Distance-based Criteria: Techniques like the Dunn Index and the Davies-Bouldin Index evaluate separation between clusters relative to their internal compactness, providing a more informed perspective on cluster quality.
Information-theoretic Criteria: The use of Bayesian Information Criterion (BIC) integrates model complexity with fit quality, favoring models that explain data more succinctly.
Simulation-based Approaches: The Gap statistic compares SSE against what would be expected under a null distribution, helping to assess the significance of clustering results.

Implications and Future Directions

The paper emphasizes the importance of revisiting classical statistical measures over more heuristic methods that may have fallen from favor. The variance-ratio criterion, Bayesian Information Criterion, and the Gap statistic are particularly recommended, given their robustness in diverse data scenarios.

Furthermore, Schubert suggests that educators and researchers should abstain from teaching or using the elbow method without highlighting its pitfalls. This paper also serves as a call to critical evaluation of clustering algorithms and methodologies, including data preprocessing and choice of clustering technique, as key determinants of successful application.

Conclusion

In conclusion, Schubert’s paper makes a compelling case for the abandonment of the elbow criterion in favor of more statistically sound methods. The advocated approaches not only provide superior outcomes but also enhance the interpretability of results, paving the way for clustering that is firmly based on theoretical foundations. As AI and data analytics advance, such insights are essential for driving more accurate and trustworthy data-driven decisions.

Markdown