Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise (2306.00673v2)

Published 1 Jun 2023 in cs.DS, cs.LG, and stat.ML

Abstract: The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}n$, where any such concept depends only on $K$ out of $n$ attributes of the input. Our main contribution is a new algorithm that runs in time $({nd}/{\epsilon}){O(d)}$ and under the Gaussian marginal distribution, PAC learns the class up to error rate $\epsilon$ with $O(\frac{K{4d}}{\epsilon{2d}} \cdot \log{5d} n)$ samples even when an $\eta \leq O(\epsilond)$ fraction of them are corrupted by the nasty noise of Bshouty et al. (2002), possibly the strongest corruption model. Prior to this work, attribute-efficient robust algorithms are established only for the special case of sparse homogeneous halfspaces. Our key ingredients are: 1) a structural result that translates the attribute sparsity to a sparsity pattern of the Chow vector under the basis of Hermite polynomials, and 2) a novel attribute-efficient robust Chow vector estimation algorithm which uses exclusively a restricted Frobenius norm to either certify a good approximation or to validate a sparsity-induced degree-$2d$ polynomial as a filter to detect corrupted samples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
  2. Efficient PAC learning from the crowd. In Proceedings of the 30th Annual Conference on Learning Theory, pages 127–150, 2017.
  3. Learning and 1-bit compressed sensing under asymmetric noise. In Proceedings of the 29th Annual Conference on Learning Theory, pages 152–192, 2016.
  4. The power of localization for efficiently learning linear separators with noise. Journal of the ACM, 63(6):50:1–50:27, 2017.
  5. Learning sparse polynomial functions. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 500–510, 2014.
  6. Computationally efficient robust sparse estimation in high dimensions. In Proceedings of the 30th Annual Conference on Learning Theory, pages 169–212, 2017.
  7. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989.
  8. PAC learning with nasty noise. Theoretical Computer Science, 288(2):255–275, 2002.
  9. Avrim Blum. Learning boolean functions in an infinite attribute space. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, pages 64–72, 1990.
  10. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33–61, 1998.
  11. Chung-Kong Chow. On the characterization of threshold functions. In Proceedings of the 2nd Annual Symposium on Switching Circuit Theory and Logical Design (FOCS), pages 34–38, 1961.
  12. Robust principal component analysis? Journal of the ACM, 58(3):11:1–11:37, 2011.
  13. Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
  14. Learning polynomials in few relevant dimensions. In Proceedings of the 34th Annual Conference on Learning Theory, pages 1161–1227, 2020.
  15. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.
  16. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.
  17. Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203–4215, 2005.
  18. An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21–30, 2008.
  19. Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings of the 48th Annual ACM Symposium on Theory of Computing, pages 105–117, 2016.
  20. Nearly optimal solutions for the Chow parameters problem and low-weight approximation of halfspaces. Journal of the ACM, 61(2):11:1–11:36, 2014.
  21. Recent advances in algorithmic high-dimensional robust statistics. CoRR, abs/1911.05911, 2019.
  22. Robust estimators in high dimensions without the computational intractability. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, pages 655–664, 2016.
  23. Outlier-robust high-dimensional sparse estimation via iterative filtering. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, pages 10688–10699, 2019.
  24. Robust sparse mean estimation via sum of squares. In Proceedings of the The 35th Annual Conference on Learning Theory, pages 4703–4763, 2022.
  25. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, pages 73–84, 2017.
  26. Learning geometric concepts with nasty noise. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing, pages 1061–1073, 2018.
  27. List-decodable robust mean estimation and learning mixtures of spherical gaussians. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1047–1060, 2018.
  28. 1-bit matrix completion. Information and Inference: A Journal of the IMA, 3(3):189–223, 2014.
  29. New results for learning noisy parities and halfspaces. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 563–574, 2006.
  30. Simon Foucart. Hard thresholding pursuit: An algorithm for compressive sensing. SIAM Journal on Numerical Analysis, 49(6):2543–2563, 2011.
  31. Claudio Gentile. The robustness of the p𝑝pitalic_p-norm algorithms. Machine Learning, 53(3):265–299, 2003.
  32. Hardness of learning halfspaces with noise. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 543–552, 2006.
  33. On PAC learning algorithms for rich Boolean function classes. Theoretical Computer Science, 384(1):66–76, 2007.
  34. Svante Janson. Gaussian Hilbert Spaces. Cambridge Tracts in Mathematics. Cambridge University Press, 1997.
  35. Agnostically learning halfspaces. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pages 11–20, 2005.
  36. Learning in the presence of malicious errors. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pages 267–280, 1988.
  37. Learning halfspaces with malicious noise. Journal of Machine Learning Research, 10:2715–2740, 2009.
  38. Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In Proceedings of the 28th Annual IEEE Symposium on Foundations of Computer Science, pages 68–77, 1987.
  39. Agnostic estimation of mean and covariance. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, pages 665–674, 2016.
  40. Learning large-margin halfspaces with more malicious noise. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems, pages 91–99, 2011.
  41. Yishay Mansour. Learning boolean functions via the Fourier transform. In Theoretical advances in neural computation and learning, pages 391–424. Springer, 1994.
  42. How fast can a threshold gate learn? In Proceedings of a workshop on computational learning theory and natural learning systems (vol. 1): constraints and prospects, pages 381–414, 1994.
  43. Phase retrieval using alternating minimization. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems, pages 2796–2804, 2013.
  44. Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
  45. One-bit compressed sensing by linear programming. Communications on Pure and Applied Mathematics, 66(8):1275–1297, 2013.
  46. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Transactions on Information Theory, 59(1):482–494, 2013.
  47. Robust matrix completion from quantized observations. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, pages 397–407, 2019.
  48. Jie Shen. One-bit compressed sensing via one-shot hard thresholding. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, pages 510–519, 2020.
  49. Jie Shen. On the power of localized Perceptron for label-optimal learning of halfspaces with adversarial noise. In Proceedings of the 38th International Conference on Machine Learning, pages 9503–9514, 2021.
  50. Jie Shen. Sample-optimal PAC learning of halfspaces with malicious noise. In Proceedings of the 38th International Conference on Machine Learning, pages 9515–9524, 2021.
  51. Jie Shen. PAC learning of halfspaces with malicious noise in nearly linear time. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pages 30–46, 2023.
  52. Learning structured low-rank representation via matrix factorization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 500–509, 2016.
  53. On the iteration complexity of support recovery via hard thresholding pursuit. In Proceedings of the 34th International Conference on Machine Learning, pages 3115–3124, 2017.
  54. Partial hard thresholding: Towards a principled analysis of support recovery. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems, pages 3127–3137, 2017.
  55. A tight bound of hard thresholding. Journal of Machine Learning Research, 18(208):1–42, 2018.
  56. Online low-rank subspace clustering by basis dictionary pursuit. In Proceedings of the 33rd International Conference on Machine Learning, pages 622–631, 2016.
  57. Attribute-efficient learning and weight-degree tradeoffs for polynomial threshold functions. In Proceedings of the 25th Annual Conference on Learning Theory, pages 1–19, 2012.
  58. Online optimization for max-norm regularization. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems, pages 1718–1726, 2014.
  59. Attribute-efficient learning of halfspaces with malicious noise: Near-optimal label complexity and noise tolerance. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, pages 1072–1113, 2021.
  60. Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
  61. Joel A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, 2004.
  62. Regularity, boosting, and efficiently simulating every high-entropy distribution. In Proceedings of the 24th Annual IEEE Conference on Computational Complexity, pages 126–136, 2009.
  63. Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
  64. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264, 1971.
  65. Provable variable selection for streaming features. In Proceedings of the 35th International Conference on Machine Learning, pages 5158–5166, 2018.
  66. Outlier-robust PCA: the high-dimensional case. IEEE Transactions on Information Theory, 59(1):546–572, 2013.
  67. Robust PCA via outlier pursuit. IEEE Transactions on Information Theory, 58(5):3047–3064, 2012.
  68. Chicheng Zhang. Efficient active learning of sparse halfspaces. In Proceedings of the 31st Annual Conference On Learning Theory, pages 1856–1880, 2018.
  69. Efficient PAC learning from the crowd with pairwise comparisons. In Proceedings of the 39th International Conference on Machine Learning, pages 25973–25993, 2022.
  70. List-decodable sparse mean estimation. In Proceedings of the 36th Annual Conference on Neural Information Processing Systems, pages 24031–24045, 2022.
  71. Semi-verified PAC learning from the crowd. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pages 505–522, 2023.
  72. Efficient active learning of sparse halfspaces with arbitrary bounded noise. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems, pages 7184–7197, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com