Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Combinatorial Approach to Robust PCA

Published 28 Nov 2023 in cs.DS, cs.LG, and stat.ML | (2311.16416v1)

Abstract: We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown $k$-dimensional subspace $U \subseteq \mathbb{R}d$, and $s$ randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when $ks2 = O(d)$, recovers every single data point up to a nearly-optimal $\ell_1$ error of $\tilde O(ks/d)$ in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace $U$. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace $U$ is arbitrary.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, pages 1171–1197, 2012.
  2. A discriminative framework for clustering via similarity functions. In Symposium on Theory of Computing (STOC), pages 671–680, 2008.
  3. Lower bounds for sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 1190–1197, 2010.
  4. List-decodable subspace recovery: Dimension independent error in polynomial time. In Symposium on Discrete Algorithms (SODA), pages 1279–1297, 2021.
  5. Robust sparse regression under adversarial corruption. In International Conference on Machine Learning (ICML), pages 774–782, 2013.
  6. Atomic decomposition by basis pursuit. SIAM Review, 43(1):129–159, 2001.
  7. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
  8. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Transactions on information theory, 52(2):489–509, 2006.
  9. Learning from untrusted data. In Symposium on Theory of Computing (STOC), pages 47–60, 2017.
  10. M Deza and P Frankl. Erdős–ko–rado theorem—22 years later. SIAM Journal on Algebraic Discrete Methods, 4(4):419–431, 1983.
  11. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint arXiv:1911.05911, 2019.
  12. Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning (ICML), pages 1596–1606, 2019.
  13. Nearly-linear time and streaming algorithms for outlier-robust pca. In International Conference on Machine Learning (ICML), pages 7886–7921, 2023.
  14. Efficient algorithms and lower bounds for robust linear regression. In Symposium on Discrete Algorithms (SODA), pages 2745–2754, 2019.
  15. David L Donoho. Compressed sensing. Transactions on information theory, 52(4):1289–1306, 2006.
  16. Intersection theorems for systems of finite sets. The Quarterly Journal of Mathematics, 12(1):313–320, 1961.
  17. Paul Erdős. A problem on independent r-tuples. Ann. Univ. Sci. Budapest. Eötvös Sect. Math, 8:93–95, 1965.
  18. The erdős matching conjecture and concentration inequalities. Journal of Combinatorial Theory, Series B, 157:366–400, 2022.
  19. Basis pursuit. A Mathematical Introduction to Compressive Sensing, pages 77–110, 2013.
  20. Robust pca in high-dimension: a deterministic approach. In International Conference on Machine Learning (ICML), pages 1827–1834, 2012.
  21. P Hall. On representatives of subsets. Journal of the London Mathematical Society, 1(1):26–30, 1935.
  22. Robust matrix decomposition with sparse corruptions. Transactions on Information Theory, 57(11):7221–7234, 2011.
  23. Algorithms and hardness for robust subspace recovery. In Conference on Learning Theory (COLT), pages 354–375, 2013.
  24. Robust mean estimation on highly incomplete data with arbitrary outliers. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1558–1566, 2021.
  25. K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance. In Symposium on Theory of Computing (STOC), pages 627–636, 2011.
  26. On the power of adaptivity in sparse recovery. In Foundations of Computer Science (FOCS), pages 285–294, 2011.
  27. Near-optimal sparse recovery in the l1 norm. In Foundations of Computer Science (FOCS), pages 199–207, 2008.
  28. Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems (NeurIPS), pages 15689–15701, 2020.
  29. Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011.
  30. Gyula OH Katona. A simple proof of the erdős-chao ko-rado theorem. Journal of Combinatorial Theory, Series B, 13(2):183–184, 1972.
  31. Efficient algorithms for outlier-robust regression. In Conference On Learning Theory (COLT), pages 1420–1430, 2018.
  32. Adaptive sparse recovery with limited adaptivity. In Symposium on Discrete Algorithms (SODA), pages 2729–2744, 2019.
  33. Robust meta-learning for mixed linear regression with small batches. Advances in Neural Information Processing Systems (NeurIPS), pages 4683–4696, 2020.
  34. An overview of robust subspace recovery. Proceedings of the IEEE, 106(8):1380–1410, 2018.
  35. On robust mean estimation under coordinate-level corruption. In International Conference on Machine Learning (ICML), pages 6914–6924, 2021.
  36. Robust subspace recovery with adversarial outliers. arXiv preprint arXiv:1904.03275, 2019.
  37. Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica, 11:63–70, 1991.
  38. Non-convex robust pca. In Advances in Neural Information Processing Systems (NIPS), pages 1107–1115, 2014.
  39. Robust estimation via robust gradient estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):601–627, 2020.
  40. (1 + eps)-approximate sparse recovery. In Foundations of Computer Science (FOCS), pages 295–304, 2011.
  41. Lower bounds for adaptive sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 652–663, 2013.
  42. Vojtěch Rödl. On a packing and covering problem. European Journal of Combinatorics, 6(1):69–78, 1985.
  43. List decodable subspace recovery. In Conference on Learning Theory (COLT), pages 3206–3226, 2020.
  44. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  45. Outlier-robust pca: The high-dimensional case. Transactions on Information Theory, 59(1):546–572, 2013.
  46. Robust pca via outlier pursuit. Transactions on Information Theory, 5(58):3047–3064, 2012.
  47. A unified framework for outlier-robust pca-like algorithms. In International Conference on Machine Learning (ICML), pages 484–493, 2015.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.