A Combinatorial Approach to Robust PCA
Abstract: We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown $k$-dimensional subspace $U \subseteq \mathbb{R}d$, and $s$ randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when $ks2 = O(d)$, recovers every single data point up to a nearly-optimal $\ell_1$ error of $\tilde O(ks/d)$ in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace $U$. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace $U$ is arbitrary.
- Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, pages 1171–1197, 2012.
- A discriminative framework for clustering via similarity functions. In Symposium on Theory of Computing (STOC), pages 671–680, 2008.
- Lower bounds for sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 1190–1197, 2010.
- List-decodable subspace recovery: Dimension independent error in polynomial time. In Symposium on Discrete Algorithms (SODA), pages 1279–1297, 2021.
- Robust sparse regression under adversarial corruption. In International Conference on Machine Learning (ICML), pages 774–782, 2013.
- Atomic decomposition by basis pursuit. SIAM Review, 43(1):129–159, 2001.
- Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Transactions on information theory, 52(2):489–509, 2006.
- Learning from untrusted data. In Symposium on Theory of Computing (STOC), pages 47–60, 2017.
- M Deza and P Frankl. Erdős–ko–rado theorem—22 years later. SIAM Journal on Algebraic Discrete Methods, 4(4):419–431, 1983.
- Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint arXiv:1911.05911, 2019.
- Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning (ICML), pages 1596–1606, 2019.
- Nearly-linear time and streaming algorithms for outlier-robust pca. In International Conference on Machine Learning (ICML), pages 7886–7921, 2023.
- Efficient algorithms and lower bounds for robust linear regression. In Symposium on Discrete Algorithms (SODA), pages 2745–2754, 2019.
- David L Donoho. Compressed sensing. Transactions on information theory, 52(4):1289–1306, 2006.
- Intersection theorems for systems of finite sets. The Quarterly Journal of Mathematics, 12(1):313–320, 1961.
- Paul Erdős. A problem on independent r-tuples. Ann. Univ. Sci. Budapest. Eötvös Sect. Math, 8:93–95, 1965.
- The erdős matching conjecture and concentration inequalities. Journal of Combinatorial Theory, Series B, 157:366–400, 2022.
- Basis pursuit. A Mathematical Introduction to Compressive Sensing, pages 77–110, 2013.
- Robust pca in high-dimension: a deterministic approach. In International Conference on Machine Learning (ICML), pages 1827–1834, 2012.
- P Hall. On representatives of subsets. Journal of the London Mathematical Society, 1(1):26–30, 1935.
- Robust matrix decomposition with sparse corruptions. Transactions on Information Theory, 57(11):7221–7234, 2011.
- Algorithms and hardness for robust subspace recovery. In Conference on Learning Theory (COLT), pages 354–375, 2013.
- Robust mean estimation on highly incomplete data with arbitrary outliers. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1558–1566, 2021.
- K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance. In Symposium on Theory of Computing (STOC), pages 627–636, 2011.
- On the power of adaptivity in sparse recovery. In Foundations of Computer Science (FOCS), pages 285–294, 2011.
- Near-optimal sparse recovery in the l1 norm. In Foundations of Computer Science (FOCS), pages 199–207, 2008.
- Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems (NeurIPS), pages 15689–15701, 2020.
- Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011.
- Gyula OH Katona. A simple proof of the erdős-chao ko-rado theorem. Journal of Combinatorial Theory, Series B, 13(2):183–184, 1972.
- Efficient algorithms for outlier-robust regression. In Conference On Learning Theory (COLT), pages 1420–1430, 2018.
- Adaptive sparse recovery with limited adaptivity. In Symposium on Discrete Algorithms (SODA), pages 2729–2744, 2019.
- Robust meta-learning for mixed linear regression with small batches. Advances in Neural Information Processing Systems (NeurIPS), pages 4683–4696, 2020.
- An overview of robust subspace recovery. Proceedings of the IEEE, 106(8):1380–1410, 2018.
- On robust mean estimation under coordinate-level corruption. In International Conference on Machine Learning (ICML), pages 6914–6924, 2021.
- Robust subspace recovery with adversarial outliers. arXiv preprint arXiv:1904.03275, 2019.
- Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica, 11:63–70, 1991.
- Non-convex robust pca. In Advances in Neural Information Processing Systems (NIPS), pages 1107–1115, 2014.
- Robust estimation via robust gradient estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):601–627, 2020.
- (1 + eps)-approximate sparse recovery. In Foundations of Computer Science (FOCS), pages 295–304, 2011.
- Lower bounds for adaptive sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 652–663, 2013.
- Vojtěch Rödl. On a packing and covering problem. European Journal of Combinatorics, 6(1):69–78, 1985.
- List decodable subspace recovery. In Conference on Learning Theory (COLT), pages 3206–3226, 2020.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
- Outlier-robust pca: The high-dimensional case. Transactions on Information Theory, 59(1):546–572, 2013.
- Robust pca via outlier pursuit. Transactions on Information Theory, 5(58):3047–3064, 2012.
- A unified framework for outlier-robust pca-like algorithms. In International Conference on Machine Learning (ICML), pages 484–493, 2015.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.