Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minimax Excess Risk of First-Order Methods for Statistical Learning with Data-Dependent Oracles (2307.04679v3)

Published 10 Jul 2023 in cs.LG and math.OC

Abstract: In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer New York, 2000.
  2. Learnability and the vapnik-chervonenkis dimension. J. ACM, 36(4):929–965, oct 1989.
  3. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of Complexity, pages 11–30. Springer International Publishing, 2015.
  4. Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res., 3(null):463–482, mar 2003.
  5. Introduction to statistical learning theory. In Advanced Lectures on Machine Learning, pages 169–207. Springer Berlin Heidelberg, 2004.
  6. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, July 2006.
  7. Stability and generalization. J. Mach. Learn. Res., 2:499–526, mar 2002.
  8. Analysis of representations for domain adaptation. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
  9. Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199:165–214, 2023.
  10. Efficient algorithms for outlier-robust regression, 2018.
  11. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  12. B.T. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864–878, 1963.
  13. Dan Amir. Best simultaneous approximation (chebyshev centers). In International Series of Numerical Mathematics / Internationale Schriftenreihe zur Numerischen Mathematik / Série internationale d’Analyse numérique, pages 19–35. Birkhäuser Basel, 1984.
  14. Fast rates for regularized objectives. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008.
  15. Local rademacher complexities. The Annals of Statistics, 33(4):1497–1537, 2005.
  16. Francis Bach. Learning Theory from First Principles. Draft book, 2021.
  17. Learning bounds for domain adaptation. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.
  18. On sample optimality in personalized collaborative and federated learning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  19. Collaborative learning by detecting collaboration partners. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.