Papers
Topics
Authors
Recent
Search
2000 character limit reached

On the Saturation Effect of Kernel Ridge Regression

Published 15 May 2024 in stat.ML and cs.LG | (2405.09362v2)

Abstract: The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.

Authors (3)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Sobolev Spaces. Elsevier, 2003.
  2. Ingo Steinwart (auth.) Andreas Christmann. Support Vector Machines. Information Science and Statistics. Springer-Verlag New York, New York, NY, first edition, 2008.
  3. On regularization algorithms in learning theory. Journal of complexity, 23(1):52–72, 2007.
  4. G. Blanchard and Nicole Mücke. Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics, 18:971–1013, 2018.
  5. A. Caponnetto and Y. Yao. Cross-validation based adaptation for regularization operators in learning theory. Analysis and Applications, 08:161–183, 2010.
  6. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
  7. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39(1):1–49, October 2001.
  8. Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer Monographs in Mathematics. Springer New York, New York, NY, 2013.
  9. Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators. Electronic Journal of Statistics, 11:1022–1047, 2017.
  10. Regularization of Inverse Problems, volume 375. Springer Science & Business Media, 1996.
  11. Sobolev norm learning rates for regularized least-squares algorithms. Journal of Machine Learning Research, 21:205:1–205:38, 2020.
  12. Norm inequalities equivalent to Heinz inequality. Proceedings of the American Mathematical Society, 118(3):827–830, 1993.
  13. Spectral algorithms for supervised learning. Neural Computation, 20(7):1873–1897, 2008.
  14. László Györfi (ed.). A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York, 2002.
  15. Global saturation of regularization methods for inverse ill-posed problems. Journal of Optimization Theory and Applications, 148:164–196, 2010.
  16. Nonparametric regression estimation using penalized least squares. IEEE Transactions on Information Theory, 47(7):3054–3058, 2001.
  17. Distributed learning for sketched kernel regression. Neural Networks, 143:368–376, November 2021.
  18. Optimal convergence for distributed learning with stochastic gradient methods and spectral algorithms. Journal of Machine Learning Research, 21:147–1, 2020.
  19. Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces. Applied and Computational Harmonic Analysis, 48:868–890, 2018.
  20. Peter Mathé. Saturation of regularization methods for linear ill-posed problems in Hilbert spaces. SIAM journal on numerical analysis, 42(3):968–973, 2004.
  21. Regularization in kernel learning. The Annals of Statistics, 38(1):526–565, February 2010.
  22. Stanislav Minsker. On some extensions of Bernstein’s inequality for self-adjoint operators. Statistics & Probability Letters, 127:111–119, April 2017.
  23. Andreas Neubauer. On converse and saturation results for Tikhonov regularization of linear ill-posed problems. SIAM journal on numerical analysis, 34(2):517–527, 1997.
  24. Optimal rates for the regularized learning algorithms under general source condition. Frontiers in Applied Mathematics and Statistics, 3, 2017.
  25. Spectral methods for regularization in learning theory. DISI, Universita degli Studi di Genova, Italy, Technical Report DISI-TR-05-18, 2005.
  26. Ingo Steinwart and C. Scovel. Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs. Constructive Approximation, 35(3):363–417, 2012.
  27. Optimal rates for regularized least squares regression. In COLT, pp.  79–93, 2009.
  28. Joel A. Tropp. User-friendly tools for random matrices: An introduction. Technical report, Defense Technical Information Center, Fort Belvoir, VA, December 2012.
  29. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, New York ; London, 1st edition, 2009.
  30. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge university press, 2018.
  31. Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.
  32. Holger Wendland. Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2004.
  33. On early stopping in gradient descent learning. Constructive Approximation, 26(2):289–315, 2007.
Citations (16)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 14 likes about this paper.