Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise (2306.00673v2)
Abstract: The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}n$, where any such concept depends only on $K$ out of $n$ attributes of the input. Our main contribution is a new algorithm that runs in time $({nd}/{\epsilon}){O(d)}$ and under the Gaussian marginal distribution, PAC learns the class up to error rate $\epsilon$ with $O(\frac{K{4d}}{\epsilon{2d}} \cdot \log{5d} n)$ samples even when an $\eta \leq O(\epsilond)$ fraction of them are corrupted by the nasty noise of Bshouty et al. (2002), possibly the strongest corruption model. Prior to this work, attribute-efficient robust algorithms are established only for the special case of sparse homogeneous halfspaces. Our key ingredients are: 1) a structural result that translates the attribute sparsity to a sparsity pattern of the Chow vector under the basis of Hermite polynomials, and 2) a novel attribute-efficient robust Chow vector estimation algorithm which uses exclusively a restricted Frobenius norm to either certify a good approximation or to validate a sparsity-induced degree-$2d$ polynomial as a filter to detect corrupted samples.
- Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
- Efficient PAC learning from the crowd. In Proceedings of the 30th Annual Conference on Learning Theory, pages 127–150, 2017.
- Learning and 1-bit compressed sensing under asymmetric noise. In Proceedings of the 29th Annual Conference on Learning Theory, pages 152–192, 2016.
- The power of localization for efficiently learning linear separators with noise. Journal of the ACM, 63(6):50:1–50:27, 2017.
- Learning sparse polynomial functions. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 500–510, 2014.
- Computationally efficient robust sparse estimation in high dimensions. In Proceedings of the 30th Annual Conference on Learning Theory, pages 169–212, 2017.
- Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989.
- PAC learning with nasty noise. Theoretical Computer Science, 288(2):255–275, 2002.
- Avrim Blum. Learning boolean functions in an infinite attribute space. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, pages 64–72, 1990.
- Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33–61, 1998.
- Chung-Kong Chow. On the characterization of threshold functions. In Proceedings of the 2nd Annual Symposium on Switching Circuit Theory and Logical Design (FOCS), pages 34–38, 1961.
- Robust principal component analysis? Journal of the ACM, 58(3):11:1–11:37, 2011.
- Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
- Learning polynomials in few relevant dimensions. In Proceedings of the 34th Annual Conference on Learning Theory, pages 1161–1227, 2020.
- Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.
- Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.
- Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203–4215, 2005.
- An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21–30, 2008.
- Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings of the 48th Annual ACM Symposium on Theory of Computing, pages 105–117, 2016.
- Nearly optimal solutions for the Chow parameters problem and low-weight approximation of halfspaces. Journal of the ACM, 61(2):11:1–11:36, 2014.
- Recent advances in algorithmic high-dimensional robust statistics. CoRR, abs/1911.05911, 2019.
- Robust estimators in high dimensions without the computational intractability. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, pages 655–664, 2016.
- Outlier-robust high-dimensional sparse estimation via iterative filtering. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pages 10688–10699, 2019.
- Robust sparse mean estimation via sum of squares. In Proceedings of the The 35th Annual Conference on Learning Theory, pages 4703–4763, 2022.
- Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, pages 73–84, 2017.
- Learning geometric concepts with nasty noise. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing, pages 1061–1073, 2018.
- List-decodable robust mean estimation and learning mixtures of spherical gaussians. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1047–1060, 2018.
- 1-bit matrix completion. Information and Inference: A Journal of the IMA, 3(3):189–223, 2014.
- New results for learning noisy parities and halfspaces. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 563–574, 2006.
- Simon Foucart. Hard thresholding pursuit: An algorithm for compressive sensing. SIAM Journal on Numerical Analysis, 49(6):2543–2563, 2011.
- Claudio Gentile. The robustness of the p𝑝pitalic_p-norm algorithms. Machine Learning, 53(3):265–299, 2003.
- Hardness of learning halfspaces with noise. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 543–552, 2006.
- On PAC learning algorithms for rich Boolean function classes. Theoretical Computer Science, 384(1):66–76, 2007.
- Svante Janson. Gaussian Hilbert Spaces. Cambridge Tracts in Mathematics. Cambridge University Press, 1997.
- Agnostically learning halfspaces. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pages 11–20, 2005.
- Learning in the presence of malicious errors. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pages 267–280, 1988.
- Learning halfspaces with malicious noise. Journal of Machine Learning Research, 10:2715–2740, 2009.
- Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In Proceedings of the 28th Annual IEEE Symposium on Foundations of Computer Science, pages 68–77, 1987.
- Agnostic estimation of mean and covariance. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, pages 665–674, 2016.
- Learning large-margin halfspaces with more malicious noise. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, pages 91–99, 2011.
- Yishay Mansour. Learning boolean functions via the Fourier transform. In Theoretical advances in neural computation and learning, pages 391–424. Springer, 1994.
- How fast can a threshold gate learn? In Proceedings of a workshop on computational learning theory and natural learning systems (vol. 1): constraints and prospects, pages 381–414, 1994.
- Phase retrieval using alternating minimization. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems, pages 2796–2804, 2013.
- Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
- One-bit compressed sensing by linear programming. Communications on Pure and Applied Mathematics, 66(8):1275–1297, 2013.
- Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Transactions on Information Theory, 59(1):482–494, 2013.
- Robust matrix completion from quantized observations. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 397–407, 2019.
- Jie Shen. One-bit compressed sensing via one-shot hard thresholding. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, pages 510–519, 2020.
- Jie Shen. On the power of localized Perceptron for label-optimal learning of halfspaces with adversarial noise. In Proceedings of the 38th International Conference on Machine Learning, pages 9503–9514, 2021.
- Jie Shen. Sample-optimal PAC learning of halfspaces with malicious noise. In Proceedings of the 38th International Conference on Machine Learning, pages 9515–9524, 2021.
- Jie Shen. PAC learning of halfspaces with malicious noise in nearly linear time. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pages 30–46, 2023.
- Learning structured low-rank representation via matrix factorization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 500–509, 2016.
- On the iteration complexity of support recovery via hard thresholding pursuit. In Proceedings of the 34th International Conference on Machine Learning, pages 3115–3124, 2017.
- Partial hard thresholding: Towards a principled analysis of support recovery. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems, pages 3127–3137, 2017.
- A tight bound of hard thresholding. Journal of Machine Learning Research, 18(208):1–42, 2018.
- Online low-rank subspace clustering by basis dictionary pursuit. In Proceedings of the 33rd International Conference on Machine Learning, pages 622–631, 2016.
- Attribute-efficient learning and weight-degree tradeoffs for polynomial threshold functions. In Proceedings of the 25th Annual Conference on Learning Theory, pages 1–19, 2012.
- Online optimization for max-norm regularization. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems, pages 1718–1726, 2014.
- Attribute-efficient learning of halfspaces with malicious noise: Near-optimal label complexity and noise tolerance. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, pages 1072–1113, 2021.
- Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
- Joel A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, 2004.
- Regularity, boosting, and efficiently simulating every high-entropy distribution. In Proceedings of the 24th Annual IEEE Conference on Computational Complexity, pages 126–136, 2009.
- Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264, 1971.
- Provable variable selection for streaming features. In Proceedings of the 35th International Conference on Machine Learning, pages 5158–5166, 2018.
- Outlier-robust PCA: the high-dimensional case. IEEE Transactions on Information Theory, 59(1):546–572, 2013.
- Robust PCA via outlier pursuit. IEEE Transactions on Information Theory, 58(5):3047–3064, 2012.
- Chicheng Zhang. Efficient active learning of sparse halfspaces. In Proceedings of the 31st Annual Conference On Learning Theory, pages 1856–1880, 2018.
- Efficient PAC learning from the crowd with pairwise comparisons. In Proceedings of the 39th International Conference on Machine Learning, pages 25973–25993, 2022.
- List-decodable sparse mean estimation. In Proceedings of the 36th Annual Conference on Neural Information Processing Systems, pages 24031–24045, 2022.
- Semi-verified PAC learning from the crowd. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pages 505–522, 2023.
- Efficient active learning of sparse halfspaces with arbitrary bounded noise. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, pages 7184–7197, 2020.