Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sensitivity Sampling for $k$-Means: Worst Case and Stability Optimal Coreset Bounds (2405.01339v1)

Published 2 May 2024 in cs.DS

Abstract: Coresets are arguably the most popular compression paradigm for center-based clustering objectives such as $k$-means. Given a point set $P$, a coreset $\Omega$ is a small, weighted summary that preserves the cost of all candidate solutions $S$ up to a $(1\pm \varepsilon)$ factor. For $k$-means in $d$-dimensional Euclidean space the cost for solution $S$ is $\sum_{p\in P}\min_{s\in S}|p-s|2$. A very popular method for coreset construction, both in theory and practice, is Sensitivity Sampling, where points are sampled in proportion to their importance. We show that Sensitivity Sampling yields optimal coresets of size $\tilde{O}(k/\varepsilon2\min(\sqrt{k},\varepsilon{-2}))$ for worst-case instances. Uniquely among all known coreset algorithms, for well-clusterable data sets with $\Omega(1)$ cost stability, Sensitivity Sampling gives coresets of size $\tilde{O}(k/\varepsilon2)$, improving over the worst-case lower bound. Notably, Sensitivity Sampling does not have to know the cost stability in order to exploit it: It is appropriately sensitive to the clusterability of the data set while being oblivious to it. We also show that any coreset for stable instances consisting of only input points must have size $\Omega(k/\varepsilon2)$. Our results for Sensitivity Sampling also extend to the $k$-median problem, and more general metric spaces.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nikhil Bansal (61 papers)
  2. Vincent Cohen-Addad (88 papers)
  3. Milind Prabhu (4 papers)
  4. David Saulpic (21 papers)
  5. Chris Schwiegelshohn (35 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com