Papers
Topics
Authors
Recent
2000 character limit reached

Coresets for Robust Clustering via Black-box Reductions to Vanilla Case

Published 11 Feb 2025 in cs.DS | (2502.07669v1)

Abstract: We devise $\epsilon$-coresets for robust $(k,z)$-Clustering with $m$ outliers through black-box reductions to vanilla case. Given an $\epsilon$-coreset construction for vanilla clustering with size $N$, we construct coresets of size $N\cdot \mathrm{poly}\log(km\epsilon{-1}) + O_z\left(\min{km\epsilon{-1}, m\epsilon{-2z}\logz(km\epsilon{-1}) }\right)$ for various metric spaces, where $O_z$ hides $2{O(z\log z)}$ factors. This increases the size of the vanilla coreset by a small multiplicative factor of $\mathrm{poly}\log(km\epsilon{-1})$, and the additive term is up to a $(\epsilon{-1}\log (km)){O(z)}$ factor to the size of the optimal robust coreset. Plugging in vanilla coreset results of [Cohen-Addad et al., STOC'21], we obtain the first coresets for $(k,z)$-Clustering with $m$ outliers with size near-linear in $k$ while previous results have size at least $\Omega(k2)$ [Huang et al., ICLR'23; Huang et al., SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only $O_z(km \epsilon{-1})$ non-dense points to obtain the $O_z(km \epsilon{-1})$ additive bound. Another condition requires the vanilla coreset to possess an extra size-preserving property. We further give a black-box reduction that turns a vanilla coreset to the one satisfying the said size-preserving property, leading to the alternative $O_z(m\epsilon{-2z}\log{z}(km\epsilon{-1}))$ additive bound. We also implement our reductions in the dynamic streaming setting and obtain the first streaming algorithms for $k$-Median and $k$-Means with $m$ outliers, using space $\tilde{O}(k+m)\cdot\mathrm{poly}(d\epsilon{-1}\log\Delta)$ for inputs on the grid $[\Delta]d$.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.