Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters (2110.04660v2)

Published 9 Oct 2021 in cs.CV, cs.AI, cs.IT, and math.IT

Abstract: This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.

Citations (4)

Summary

We haven't generated a summary for this paper yet.