Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation (2403.04726v1)

Published 7 Mar 2024 in cs.DS, cs.LG, math.ST, stat.ML, and stat.TH

Abstract: We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers. Specifically, the algorithm observes a \emph{corrupted} set of samples from $\mathcal{N}(\mu,\mathbf{I}_d)$, where the unknown mean $\mu \in \mathbb{R}d$ is constrained to be $k$-sparse. A series of prior works has developed efficient algorithms for robust sparse mean estimation with sample complexity $\mathrm{poly}(k,\log d, 1/\epsilon)$ and runtime $d2 \mathrm{poly}(k,\log d,1/\epsilon)$, where $\epsilon$ is the fraction of contamination. In particular, the fastest runtime of existing algorithms is quadratic ($\Omega(d2)$), which can be prohibitive in high dimensions. This quadratic barrier in the runtime stems from the reliance of these algorithms on the sample covariance matrix, which is of size $d2$. Our main contribution is an algorithm for robust sparse mean estimation which runs in \emph{subquadratic} time using $\mathrm{poly}(k,\log d,1/\epsilon)$ samples. We also provide analogous results for robust sparse PCA. Our results build on algorithmic advances in detecting weak correlations, a generalized version of the light-bulb problem by Valiant.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. “Robust Estimates of Location: Survey and Advances” Princeton, NJ, USA: Princeton University Press, 1972
  2. J. Alman “An Illuminating Algorithm for the Light Bulb Problem” In Proc. 2nd Symposium on Simplicity in Algorithms (SOSA), 2019
  3. “Reducibility and Statistical-Computational Gaps from Secret Leakage” In Proc. 33rd Annual Conference on Learning Theory (COLT), 2020
  4. “Statistical query algorithms and low-degree tests are almost equivalent” In Proc. 34th Annual Conference on Learning Theory (COLT), 2021
  5. “Computationally Efficient Robust Sparse Estimation in High Dimensions” In Proc. 30th Annual Conference on Learning Theory (COLT), 2017
  6. “Optimal Robust Linear Regression in Nearly Linear Time” In arXiv abs/2007.08137, 2020
  7. Y. Cheng, I. Diakonikolas and R. Ge “High-Dimensional Robust Mean Estimation in Nearly-Linear Time” In Proc. 30th Annual Symposium on Discrete Algorithms (SODA), 2019 DOI: 10.1137/1.9781611975482.171
  8. “Faster Algorithms for High-Dimensional Robust Covariance Estimation” In Proc. 32nd Annual Conference on Learning Theory (COLT), 2019
  9. “Outlier-Robust Sparse Estimation via Non-Convex Optimization” In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022
  10. Y. Cheng “High-Dimensional Robust Statistics: Faster Algorithms and Optimization Landscape” See timestamp 22:00 in the talk, Robustness in High-dimensional Statistics and Machine Learning at IDEAL Institute, 2021
  11. Y. Cherapanamjeri, S. Mohanty and M. Yau “List decodable mean estimation in nearly linear time” In Proc. 61st IEEE Symposium on Foundations of Computer Science (FOCS), 2020
  12. “A Direct Formulation for Sparse PCA Using Semidefinite Programming” In SIAM Review 49.3, 2007, pp. 434–448
  13. Y. Dong, S.B. Hopkins and J. Li “Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection” In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019
  14. I. Diakonikolas “Computational-Statistical Tradeoffs and Open Problems” See page number 36 in http://www.iliasdiakonikolas.org/stoc19-tutorial/Tradeoffs-and-Open-Problems.pdf, STOC 2019 Tutorial: Recent Advances in High-Dimensional Robust Statistics, 2019
  15. I. Diakonikolas “Algorithmic Robust Statistics” Available online at https://youtu.be/HKm0L2Cy69Y?t=3527, Statistical thinking in the age of AI : robustness, fairness and privacy (Meeting in Mathematical Statistics), 2023
  16. “Algorithmic High-Dimensional Robust Statistics” Cambridge University Press, 2023
  17. “Robust Estimators in High Dimensions without the Computational Intractability” In Proc. 57th IEEE Symposium on Foundations of Computer Science (FOCS), 2016 DOI: 10.1109/FOCS.2016.85
  18. “Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation” In Proc. 54th Annual ACM Symposium on Theory of Computing (STOC), 2022
  19. “Robust Sparse Mean Estimation via Sum of Squares” In Proc. 35th Annual Conference on Learning Theory (COLT), 2022
  20. “Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering” In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019
  21. “Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions” In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022
  22. “Streaming Algorithms for High-Dimensional Robust Statistics” In Proc. 39th International Conference on Machine Learning (ICML), 2022
  23. “Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression” In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023
  24. “Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA” In Proc. 40th International Conference on Machine Learning (ICML), 2023
  25. I. Diakonikolas, D.M. Kane and A. Stewart “Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures” In Proc. 58th IEEE Symposium on Foundations of Computer Science (FOCS), 2017 DOI: 10.1109/FOCS.2017.16
  26. “Robust Subgaussian Estimation of a Mean Vector in Nearly Linear Time” In The Annals of Statistics 50.1 Institute of Mathematical Statistics, 2022, pp. 511–536
  27. “Detecting correlations with little memory and communication” In Proc. 31st Annual Conference on Learning Theory (COLT), 2018
  28. “Compressed sensing: theory and applications” Cambridge University Press, 2012
  29. “Robust Statistics” John Wiley & Sons, 2009
  30. T. Hastie, R. Tibshirani and M. Wainwright “Statistical Learning with Sparsity: The Lasso and Generalizations”, 2015
  31. P.J. Huber “Robust Estimation of a Location Parameter” In The Annals of Mathematical Statistics 35.1, 1964, pp. 73–101 DOI: 10.1214/aoms/1177703732
  32. “A Faster Interior Point Method for Semidefinite Programming” In Proc. 61st IEEE Symposium on Foundations of Computer Science (FOCS), 2020
  33. A. Jambulapati, J. Li and K. Tian “Robust sub-gaussian principal component analysis and width-independent schatten packing” In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020
  34. M. Karppa, P. Kaski and J. Kohonen “A Faster Subquadratic Algorithm for Finding Outlier Correlations” In ACM Trans. Algorithms 14.3, 2018
  35. “Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time” In Algorithmica 82.11, 2020
  36. “A Fast Spectral Algorithm for Mean Estimation with Sub-Gaussian Rates” In Proc. 33rd Annual Conference on Learning Theory (COLT), 2020
  37. K.A. Lai, A.B. Rao and S. Vempala “Agnostic Estimation of Mean and Covariance” In Proc. 57th IEEE Symposium on Foundations of Computer Science (FOCS), 2016, pp. 665–674 DOI: 10.1109/FOCS.2016.76
  38. “Information Theory: From Coding to Learning” Cambridge University Press, 2023
  39. “Faster Algorithms via Approximation Theory” In Foundations and Trends® in Theoretical Computer Science 9, 2014, pp. 125–210
  40. G. Valiant “Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem” In Journal of the ACM 62.2, 2015 DOI: 10.1145/2728167
  41. L.G. Valiant “Functionality in Neural Nets” In Proc. of the Seventh AAAI National Conference on Artificial Intelligence AAAI Press, 1988, pp. 629–634
  42. S. van de Geer “Estimation and Testing Under Sparsity”, École d’Été de Probabilités de Saint-Flour Springer, 2016
  43. R. Vershynin “High-Dimensional Probability: An Introduction with Applications in Data Science” Cambridge University Press, 2018
  44. B. Zhu, J. Jiao and J. Steinhardt “Robust Estimation via Generalized Quasi-Gradients” In Information and Inference: A Journal of the IMA, 2022, pp. 581–636 DOI: 10.1093/imaiai/iaab018

Summary

We haven't generated a summary for this paper yet.