Coresets for Robust Clustering via Black-box Reductions to Vanilla Case

Published 11 Feb 2025 in cs.DS | (2502.07669v1)

Abstract: We devise $\epsilon$-coresets for robust $(k,z)$-Clustering with $m$ outliers through black-box reductions to vanilla case. Given an $\epsilon$-coreset construction for vanilla clustering with size $N$, we construct coresets of size $N\cdot \mathrm{poly}\log(km\epsilon^{-1}) + O_z\left(\min{km\epsilon^{-1}, m\epsilon^{{-2z}\log^{z(km\epsilon^{-1})}} }\right)$ for various metric spaces, where $O_z$ hides $2^{O(z\log z)}$ factors. This increases the size of the vanilla coreset by a small multiplicative factor of $\mathrm{poly}\log(km\epsilon^{-1})$, and the additive term is up to a $(\epsilon^{-1}\log (km))^{O(z)}$ factor to the size of the optimal robust coreset. Plugging in vanilla coreset results of [Cohen-Addad et al., STOC'21], we obtain the first coresets for $(k,z)$-Clustering with $m$ outliers with size near-linear in $k$ while previous results have size at least $\Omega(k^2)$ [Huang et al., ICLR'23; Huang et al., SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only $O_z(km \epsilon^{-1})$ non-dense points to obtain the $O_z(km \epsilon^{-1})$ additive bound. Another condition requires the vanilla coreset to possess an extra size-preserving property. We further give a black-box reduction that turns a vanilla coreset to the one satisfying the said size-preserving property, leading to the alternative $O_z(m\epsilon^{{-2z}\log^{{z}(km\epsilon^{-1}))$}} additive bound. We also implement our reductions in the dynamic streaming setting and obtain the first streaming algorithms for $k$-Median and $k$-Means with $m$ outliers, using space $\tilde{O}(k+m)\cdot\mathrm{poly}(d\epsilon^{{-1}\log\Delta)$} for inputs on the grid $[\Delta]^d$.