Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal (2004.06263v3)

Published 14 Apr 2020 in cs.CG, cs.DC, and cs.DS

Abstract: Given a collection of $n$ points in $\mathbb{R}d$, the goal of the $(k,z)$-clustering problem is to find a subset of $k$ "centers" that minimizes the sum of the $z$-th powers of the Euclidean distance of each point to the closest center. Special cases of the $(k,z)$-clustering problem include the $k$-median and $k$-means problems. Our main result is a unified two-stage importance sampling framework that constructs an $\varepsilon$-coreset for the $(k,z)$-clustering problem. Compared to the results for $(k,z)$-clustering in [Feldman and Langberg, STOC 2011], our framework saves a $\varepsilon2 d$ factor in the coreset size. Compared to the results for $(k,z)$-clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a $\operatorname{poly}(k)$ factor in the coreset size and avoids the $\exp(k/\varepsilon)$ term in the construction time. Specifically, our coreset for $k$-median ($z=1$) has size $\tilde{O}(\varepsilon{-4} k)$ which, when compared to the result in [Sohler and Woodruff, STOC 2018], saves a $k$ factor in the coreset size. Our algorithmic results rely on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. We also provide a size lower bound of $\Omega\left(k\cdot \min \left{2{z/20},d \right}\right)$ for a $0.01$-coreset for $(k,z)$-clustering, which has a linear dependence of size on $k$ and an exponential dependence on $z$ that matches our algorithmic results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Lingxiao Huang (39 papers)
  2. Nisheeth K. Vishnoi (73 papers)
Citations (69)

Summary

We haven't generated a summary for this paper yet.