Coresets for Robust Clustering via Black-box Reductions to Vanilla Case
Abstract: We devise $\epsilon$-coresets for robust $(k,z)$-Clustering with $m$ outliers through black-box reductions to vanilla case. Given an $\epsilon$-coreset construction for vanilla clustering with size $N$, we construct coresets of size $N\cdot \mathrm{poly}\log(km\epsilon{-1}) + O_z\left(\min{km\epsilon{-1}, m\epsilon{-2z}\logz(km\epsilon{-1}) }\right)$ for various metric spaces, where $O_z$ hides $2{O(z\log z)}$ factors. This increases the size of the vanilla coreset by a small multiplicative factor of $\mathrm{poly}\log(km\epsilon{-1})$, and the additive term is up to a $(\epsilon{-1}\log (km)){O(z)}$ factor to the size of the optimal robust coreset. Plugging in vanilla coreset results of [Cohen-Addad et al., STOC'21], we obtain the first coresets for $(k,z)$-Clustering with $m$ outliers with size near-linear in $k$ while previous results have size at least $\Omega(k2)$ [Huang et al., ICLR'23; Huang et al., SODA'25]. Technically, we establish two conditions under which a vanilla coreset is as well a robust coreset. The first condition requires the dataset to satisfy special structures - it can be broken into "dense" parts with bounded diameter. We combine this with a new bounded-diameter decomposition that has only $O_z(km \epsilon{-1})$ non-dense points to obtain the $O_z(km \epsilon{-1})$ additive bound. Another condition requires the vanilla coreset to possess an extra size-preserving property. We further give a black-box reduction that turns a vanilla coreset to the one satisfying the said size-preserving property, leading to the alternative $O_z(m\epsilon{-2z}\log{z}(km\epsilon{-1}))$ additive bound. We also implement our reductions in the dynamic streaming setting and obtain the first streaming algorithms for $k$-Median and $k$-Means with $m$ outliers, using space $\tilde{O}(k+m)\cdot\mathrm{poly}(d\epsilon{-1}\log\Delta)$ for inputs on the grid $[\Delta]d$.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.