Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints (2412.03832v2)
Abstract: We obtain the minimax rate for a mean location model with a bounded star-shaped set $K \subseteq \mathbb{R}n$ constraint on the mean, in an adversarially corrupted data setting with Gaussian noise. We assume an unknown fraction $\epsilon \le 1/2-\kappa$ for some fixed $\kappa\in(0,1/2]$ of $N$ observations are arbitrarily corrupted. We obtain a minimax risk up to proportionality constants under the squared $\ell_2$ loss of $\max(\eta{*2},\sigma2\epsilon2)\wedge d2$ with \begin{align*} \eta* = \sup \bigg{\eta \ge 0 : \frac{N\eta2}{\sigma2} \leq \log \mathcal{M}_K{\operatorname{loc}}(\eta,c)\bigg}, \end{align*} where $\log \mathcal{M}_K{\operatorname{loc}}(\eta,c)$ denotes the local entropy of the set $K$, $d$ is the diameter of $K$, $\sigma2$ is the variance, and $c$ is some sufficiently large absolute constant. A variant of our algorithm achieves the same rate for settings with known or symmetric sub-Gaussian noise, with a smaller breakdown point, still of constant order. We further study the case of unknown sub-Gaussian noise and show that the rate is slightly slower: $\max(\eta{*2},\sigma2\epsilon2\log(1/\epsilon))\wedge d2$. We generalize our results to the case when $K$ is star-shaped but unbounded.