Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Testable Learning with Distribution Shift (2311.15142v2)

Published 25 Nov 2023 in cs.DS and cs.LG

Abstract: We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on ${\pm 1}d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. The power of localization for efficiently learning linear separators with noise. Journal of the ACM (JACM), 63(6):1–27, 2017.
  2. Louay MJ Bazzi. Polylogarithmic independence can fool dnf formulas. SIAM Journal on Computing, 38(6):2220–2272, 2009.
  3. Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, pages 65–72, 2006.
  4. Learning bounds for domain adaptation. Advances in neural information processing systems, 20, 2007.
  5. A theory of learning from different domains. Machine learning, 79:151–175, 2010.
  6. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006.
  7. On the hardness of domain adaptation and the utility of unlabeled target samples. In International Conference on Algorithmic Learning Theory, 2012.
  8. The true sample complexity of active learning. Machine learning, 80:111–139, 2010.
  9. From model selection to adaptive estimation. In Festschrift for lucien le cam, pages 55–87. Springer, 1997.
  10. On the fourier spectrum of monotone functions. Journal of the ACM (JACM), 43(4):747–770, 1996.
  11. Improving generalization with active learning. Machine learning, 15:201–221, 1994.
  12. Clément Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends® in Communications and Information Theory, 19(6):1032–1198, 2022.
  13. Amit Daniely. A ptas for agnostically learning halfspaces. In Conference on Learning Theory, pages 484–502. PMLR, 2015.
  14. Bounded independence fools halfspaces. SIAM Journal on Computing, 39(8):3441–3462, 2010.
  15. The optimality of polynomial regression for agnostic learning under gaussian marginals in the sq model. In Conference on Learning Theory, pages 1552–1584. PMLR, 2021.
  16. Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals. In International Conference on Machine Learning, pages 7922–7938. PMLR, 2023.
  17. Non-convex sgd learns halfspaces with adversarial label noise. Advances in Neural Information Processing Systems, 33:18540–18549, 2020.
  18. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. Advances in Neural Information Processing Systems, 33:13586–13596, 2020.
  19. Impossibility theorems for domain adaptation. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010.
  20. Active learning via perfect selective classification. Journal of Machine Learning Research, 13(2), 2012.
  21. Statistical-query lower bounds via functional gradients. Advances in Neural Information Processing Systems, 33:2147–2158, 2020.
  22. A moment-matching approach to testable learning and a new characterization of rademacher complexity. Proceedings of the fifty-fifth annual ACM Symposium on Theory of Computing, 2023.
  23. Beyond perturbations: Learning guarantees with arbitrary adversarial test examples. Advances in Neural Information Processing Systems, 33:15859–15870, 2020.
  24. An efficient tester-learner for halfspaces. arXiv preprint arXiv:2302.14853, 2023.
  25. Tester-learners for halfspaces: Universal algorithms. 37th Conference on Neural Information Processing Systems (NeurIPS 2023, to appear)., 2023.
  26. Fooling functions of halfspaces under product distributions. In 2010 IEEE 25th Annual Conference on Computational Complexity, pages 223–234. IEEE, 2010.
  27. Steve Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on Machine learning, pages 353–360, 2007.
  28. Steve Hanneke. Theoretical foundations of active learning. Carnegie Mellon University, 2009.
  29. Steve Hanneke. Rates of convergence in active learning. The Annals of Statistics, pages 333–361, 2011.
  30. Steve Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
  31. Efficient learning with arbitrary covariate shift. In Algorithmic Learning Theory, pages 850–864. PMLR, 2021.
  32. Reliable agnostic learning. Journal of Computer and System Sciences, 78(5):1481–1495, 2012.
  33. Learning geometric concepts via gaussian surface area. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 541–550. IEEE, 2008.
  34. Hypercontraction principle and random multilinear forms. Probability Theory and Related Fields, 77(3):325–342, 1988.
  35. Domain adaptation: Learning bounds and algorithms. In Proceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Montréal, Canada, 2009.
  36. Foundations of machine learning. MIT press, 2018.
  37. Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
  38. New degree bounds for polynomial threshold functions. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 325–334, 2003.
  39. A survey on domain adaptation theory: learning bounds and theoretical guarantees. arXiv preprint arXiv:2004.11829, 2020.
  40. Testing distributional assumptions of learning algorithms. Proceedings of the fifty-fifth annual ACM Symposium on Theory of Computing, 2023.
  41. Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study. JEADV Clinical Practice, 1(4):344–354, 2022.
  42. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  43. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Internal Medicine, 181(8):1065–1070, 2021.
  44. Paweł Wolff. Hypercontractivity of simple random variables. Studia Mathematica, 3(180):219–236, 2007.
  45. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine, 15(11):e1002683, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Adam R. Klivans (21 papers)
  2. Konstantinos Stavropoulos (23 papers)
  3. Arsen Vasilyan (17 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com