Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast and Accurate $k$-means++ via Rejection Sampling (2012.11891v1)

Published 22 Dec 2020 in cs.LG and cs.DS

Abstract: $k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Vincent Cohen-Addad (88 papers)
  2. Silvio Lattanzi (47 papers)
  3. Ashkan Norouzi-Fard (24 papers)
  4. Christian Sohler (27 papers)
  5. Ola Svensson (55 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.