Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination (2403.10416v1)

Published 15 Mar 2024 in cs.LG, cs.DS, math.ST, stat.ML, and stat.TH

Abstract: We study Gaussian sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}d$ with corruption rate $\epsilon>0$, our algorithm has sample complexity $(k2/\epsilon2)\mathrm{polylog}(d/\epsilon)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(\epsilon)$. Previous efficient algorithms inherently incur error $\Omega(\epsilon \sqrt{\log(1/\epsilon)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. M. Brennan and G. Bresler. Reducibility and statistical-computational gaps from secret leakage. In Conference on Learning Theory, pages 648–847. PMLR, 2020.
  2. Statistical query algorithms and low degree tests are almost equivalent. In Conference on Learning Theory, pages 774–774. PMLR, 2021.
  3. Computationally efficient robust sparse estimation in high dimensions. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, pages 169–212, 2017.
  4. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
  5. Q. Berthet and P. Rigollet. Complexity theoretic lower bounds for sparse principal component detection. In COLT 2013 - The 26th Annual Conference on Learning Theory, pages 1046–1066, 2013.
  6. Q. Berthet and P. Rigollet. Optimal detection of sparse principal components in high dimension. The Annals of Statistics, 41(4):1780, 2013.
  7. Robust sparse regression under adversarial corruption. In International conference on machine learning, pages 774–782. PMLR, 2013.
  8. Faster algorithms for high-dimensional robust covariance estimation. In Conference on Learning Theory, COLT 2019, pages 727–757, 2019.
  9. Outlier-robust sparse estimation via non-convex optimization. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022.
  10. Robust sparse principal component analysis. Technometrics, 55(2):202–214, 2013.
  11. C. Croux and G. Haesbroeck. Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika, 87(3):603–618, 2000.
  12. The full landscape of robust mean testing: Sharp separations between oblivious and adaptive contamination. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2023.
  13. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
  14. Quantum entropy scoring for fast robust mean estimation and improved outlier detection. Advances in Neural Information Processing Systems, 32:6067–6077, 2019.
  15. I. Diakonikolas and D. M. Kane. Algorithmic high-dimensional robust statistics. Cambridge university press, 2023.
  16. Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 2683–2702, 2018.
  17. Outlier-robust high-dimensional sparse estimation via iterative filtering. In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.
  18. List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022.
  19. Robust sparse mean estimation via sum of squares. In Conference on Learning Theory, pages 4703–4763. PMLR, 2022.
  20. Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022.
  21. Near-optimal algorithms for gaussians with huber contamination: Mean estimation and linear regression. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
  22. Nearly-linear time and streaming algorithms for outlier-robust PCA. In International Conference on Machine Learning, 2023.
  23. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, pages 73–84, 2017.
  24. Robust Statistics. John Wiley & Sons, 2009.
  25. Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/CRC, 2015.
  26. P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73–101, 03 1964.
  27. Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
  28. Robust meta-learning for mixed linear regression with small batches. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
  29. J. Z. Li. Principled approaches to robust machine learning and beyond. PhD thesis, Massachusetts Institute of Technology, 2018.
  30. High dimensional robust sparse regression. In The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 2020.
  31. J. W. Tukey. A survey of sampling from contaminated distributions. Contributions to probability and statistics, 2:448–485, 1960.
  32. R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Number 47 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge ; New York, NY, 2018.
  33. M. J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019.
  34. Statistical and computational trade-offs in estimation of sparse principal components. The Annals of Statistics, pages 1896–1930, 2016.
  35. Outlier-robust pca: The high-dimensional case. IEEE Trans. on Information Theory, 59(1):546–572, 2013.
  36. Robust pca via outlier pursuit. Advances in neural information processing systems, 23, 2010.
  37. Robust estimation via generalized quasi-gradients. Information and Inference: A Journal of the IMA, 11(2):581–636, 2022.
  38. S. Zeng and J. Shen. List-decodable sparse mean estimation. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: