Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles (2307.04679v3)
Abstract: In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation.
- Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer New York, 2000.
- Learnability and the vapnik-chervonenkis dimension. J. ACM, 36(4):929–965, oct 1989.
- On the uniform convergence of relative frequencies of events to their probabilities. In Measures of Complexity, pages 11–30. Springer International Publishing, 2015.
- Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res., 3(null):463–482, mar 2003.
- Introduction to statistical learning theory. In Advanced Lectures on Machine Learning, pages 169–207. Springer Berlin Heidelberg, 2004.
- Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, July 2006.
- Stability and generalization. J. Mach. Learn. Res., 2:499–526, mar 2002.
- Analysis of representations for domain adaptation. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
- Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199:165–214, 2023.
- Efficient algorithms for outlier-robust regression, 2018.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- B.T. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864–878, 1963.
- Dan Amir. Best simultaneous approximation (chebyshev centers). In International Series of Numerical Mathematics / Internationale Schriftenreihe zur Numerischen Mathematik / Série internationale d’Analyse numérique, pages 19–35. Birkhäuser Basel, 1984.
- Fast rates for regularized objectives. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008.
- Local rademacher complexities. The Annals of Statistics, 33(4):1497–1537, 2005.
- Francis Bach. Learning Theory from First Principles. Draft book, 2021.
- Learning bounds for domain adaptation. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.
- On sample optimality in personalized collaborative and federated learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Collaborative learning by detecting collaboration partners. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.