Testable Learning with Distribution Shift (2311.15142v2)
Abstract: We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on ${\pm 1}d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.
- The power of localization for efficiently learning linear separators with noise. Journal of the ACM (JACM), 63(6):1–27, 2017.
- Louay MJ Bazzi. Polylogarithmic independence can fool dnf formulas. SIAM Journal on Computing, 38(6):2220–2272, 2009.
- Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, pages 65–72, 2006.
- Learning bounds for domain adaptation. Advances in neural information processing systems, 20, 2007.
- A theory of learning from different domains. Machine learning, 79:151–175, 2010.
- Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006.
- On the hardness of domain adaptation and the utility of unlabeled target samples. In International Conference on Algorithmic Learning Theory, 2012.
- The true sample complexity of active learning. Machine learning, 80:111–139, 2010.
- From model selection to adaptive estimation. In Festschrift for lucien le cam, pages 55–87. Springer, 1997.
- On the fourier spectrum of monotone functions. Journal of the ACM (JACM), 43(4):747–770, 1996.
- Improving generalization with active learning. Machine learning, 15:201–221, 1994.
- Clément Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends® in Communications and Information Theory, 19(6):1032–1198, 2022.
- Amit Daniely. A ptas for agnostically learning halfspaces. In Conference on Learning Theory, pages 484–502. PMLR, 2015.
- Bounded independence fools halfspaces. SIAM Journal on Computing, 39(8):3441–3462, 2010.
- The optimality of polynomial regression for agnostic learning under gaussian marginals in the sq model. In Conference on Learning Theory, pages 1552–1584. PMLR, 2021.
- Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals. In International Conference on Machine Learning, pages 7922–7938. PMLR, 2023.
- Non-convex sgd learns halfspaces with adversarial label noise. Advances in Neural Information Processing Systems, 33:18540–18549, 2020.
- Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. Advances in Neural Information Processing Systems, 33:13586–13596, 2020.
- Impossibility theorems for domain adaptation. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010.
- Active learning via perfect selective classification. Journal of Machine Learning Research, 13(2), 2012.
- Statistical-query lower bounds via functional gradients. Advances in Neural Information Processing Systems, 33:2147–2158, 2020.
- A moment-matching approach to testable learning and a new characterization of rademacher complexity. Proceedings of the fifty-fifth annual ACM Symposium on Theory of Computing, 2023.
- Beyond perturbations: Learning guarantees with arbitrary adversarial test examples. Advances in Neural Information Processing Systems, 33:15859–15870, 2020.
- An efficient tester-learner for halfspaces. arXiv preprint arXiv:2302.14853, 2023.
- Tester-learners for halfspaces: Universal algorithms. 37th Conference on Neural Information Processing Systems (NeurIPS 2023, to appear)., 2023.
- Fooling functions of halfspaces under product distributions. In 2010 IEEE 25th Annual Conference on Computational Complexity, pages 223–234. IEEE, 2010.
- Steve Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on Machine learning, pages 353–360, 2007.
- Steve Hanneke. Theoretical foundations of active learning. Carnegie Mellon University, 2009.
- Steve Hanneke. Rates of convergence in active learning. The Annals of Statistics, pages 333–361, 2011.
- Steve Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
- Efficient learning with arbitrary covariate shift. In Algorithmic Learning Theory, pages 850–864. PMLR, 2021.
- Reliable agnostic learning. Journal of Computer and System Sciences, 78(5):1481–1495, 2012.
- Learning geometric concepts via gaussian surface area. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 541–550. IEEE, 2008.
- Hypercontraction principle and random multilinear forms. Probability Theory and Related Fields, 77(3):325–342, 1988.
- Domain adaptation: Learning bounds and algorithms. In Proceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Montréal, Canada, 2009.
- Foundations of machine learning. MIT press, 2018.
- Ryan O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
- New degree bounds for polynomial threshold functions. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 325–334, 2003.
- A survey on domain adaptation theory: learning bounds and theoretical guarantees. arXiv preprint arXiv:2004.11829, 2020.
- Testing distributional assumptions of learning algorithms. Proceedings of the fifty-fifth annual ACM Symposium on Theory of Computing, 2023.
- Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study. JEADV Clinical Practice, 1(4):344–354, 2022.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Internal Medicine, 181(8):1065–1070, 2021.
- Paweł Wolff. Hypercontractivity of simple random variables. Studia Mathematica, 3(180):219–236, 2007.
- Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine, 15(11):e1002683, 2018.
- Adam R. Klivans (21 papers)
- Konstantinos Stavropoulos (23 papers)
- Arsen Vasilyan (17 papers)