Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds (2404.02364v2)

Published 2 Apr 2024 in cs.DS and cs.LG

Abstract: Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution $\mathcal{D}$, unlabeled samples from test distribution $\mathcal{D}'$, and the goal is to output a classifier with low error on $\mathcal{D}'$ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on $\mathcal{D}'$. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a $2{(k/\epsilon){O(1)}} \mathsf{poly}(d)$-time algorithm for TDS learning intersections of $k$ homogeneous halfspaces to accuracy $\epsilon$ (prior work achieved $d{(k/\epsilon){O(1)}}$). We work under the mild assumption that the Gaussian training distribution contains at least an $\epsilon$ fraction of both positive and negative examples ($\epsilon$-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the $\epsilon$-balanced assumption is necessary for $\mathsf{poly}(d,1/\epsilon)$-time TDS learning for a single halfspace and (2) a $d{\tilde{\Omega}(\log 1/\epsilon)}$ lower bound for the intersection of two general halfspaces, even with the $\epsilon$-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006.
  2. Learning an intersection of a constant number of halfspaces over a uniform distribution. J. Comput. Syst. Sci., 54(2):371–380, 1997.
  3. Sq lower bounds for learning mixtures of separated and bounded covariance gaussians. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 2319–2349. PMLR, 12–15 Jul 2023.
  4. Sq lower bounds for non-gaussian component analysis with weaker assumptions. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  5. Learning geometric concepts with nasty noise. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 1061–1073. ACM, 2018.
  6. Statistical algorithms and a lower bound for detecting planted cliques. Journal of the ACM (JACM), 64(2):1–37, 2017.
  7. Statistical query algorithms for mean vector estimation and stochastic convex optimization. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1265–1277. SIAM, 2017.
  8. Dynamic approach to a stochastic domination: the fkg and brascamp-lieb inequalities. Proceedings of the American Mathematical Society, 135(6):1915–1922, 2007.
  9. Adversarial resilience in sequential prediction via abstention. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  10. Beyond perturbations: Learning guarantees with arbitrary adversarial test examples. Advances in Neural Information Processing Systems, 33:15859–15870, 2020.
  11. Learning functions of halfspaces using prefix covers. In Shie Mannor, Nathan Srebro, and Robert C. Williamson, editors, COLT 2012 - The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland, volume 23 of JMLR Proceedings, pages 15.1–15.10. JMLR.org, 2012.
  12. An efficient tester-learner for halfspaces. arXiv preprint arXiv:2302.14853, 2023.
  13. Tester-learners for halfspaces: Universal algorithms. 37th Conference on Neural Information Processing Systems (NeurIPS 2023, to appear)., 2023.
  14. Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.
  15. Efficient learning with arbitrary covariate shift. In Algorithmic Learning Theory, pages 850–864. PMLR, 2021.
  16. Learning halfspaces under log-concave densities: Polynomial approximations and moment matching. In Shai Shalev-Shwartz and Ingo Steinwart, editors, COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, volume 30 of JMLR Workshop and Conference Proceedings, pages 522–545. JMLR.org, 2013.
  17. Baum’s algorithm learns intersections of halfspaces with respect to log-concave distributions. In Lecture Notes in Computer Science, pages 588–600, 01 2009.
  18. Learning intersections and thresholds of halfspaces. J. Comput. Syst. Sci., 68(4):808–840, 2004.
  19. Learning geometric concepts via gaussian surface area. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pages 541–550. IEEE Computer Society, 2008.
  20. Cryptographic hardness for learning intersections of halfspaces. Journal of Computer and System Sciences, 75(1):2–12, 2009.
  21. Testable learning with distribution shift. arXiv preprint arXiv:2311.15142, 2023.
  22. Composite geometric concepts and polynomial predictability. Inf. Comput., 113(2):230–252, 1994.
  23. Raghu Meka. personal communication, 2010.
  24. Domain adaptation: Learning bounds and algorithms. In Proceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Montréal, Canada, 2009.
  25. A survey on domain adaptation theory: learning bounds and theoretical guarantees. arXiv preprint arXiv:2004.11829, 2020.
  26. Testing distributional assumptions of learning algorithms. Proceedings of the fifty-fifth annual ACM Symposium on Theory of Computing, 2023.
  27. Santosh S Vempala. Learning convex concepts from gaussian distributions with pca. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 124–130. IEEE, 2010.
  28. Santosh S Vempala. A random-sampling-based algorithm for learning intersections of halfspaces. Journal of the ACM (JACM), 57(6):1–14, 2010.
  29. A useful variant of the davis–kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015.
Citations (3)

Summary

We haven't generated a summary for this paper yet.