A Bound on the Maximal Marginal Degrees of Freedom (2402.12885v1)
Abstract: Common kernel ridge regression is expensive in memory allocation and computation time. This paper addresses low rank approximations and surrogates for kernel ridge regression, which bridge these difficulties. The fundamental contribution of the paper is a lower bound on the rank of the low dimensional approximation, which is required such that the prediction power remains reliable. The bound relates the effective dimension with the largest statistical leverage score. We characterize the effective dimension and its growth behavior with respect to the regularization parameter by involving the regularity of the kernel. This growth is demonstrated to be asymptotically logarithmic for suitably chosen kernels, justifying low-rank approximations as the Nystr\"om method.
- F. Bach. Sharp analysis of low-rank kernel matrix approximations. In S. Shalev-Shwartz and I. Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 185–209, Princeton, NJ, USA, 12–14 Jun 2013. PMLR. URL https://proceedings.mlr.press/v30/Bach13.html.
- M. Belkin. Approximation beats concentration? An approximation view on inference with smooth radial kernels. In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1348–1361. PMLR, 06–09 Jul 2018. URL https://proceedings.mlr.press/v75/belkin18a.html.
- C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 0387310738.
- A. Caponnetto and E. De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, Aug. 2006. ISSN 1615-3383. doi:10.1007/s10208-006-0196-8.
- F. Cucker and D. X. Zhou. Learning Theory: An Approximation Theory Viewpoint. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2007. doi:10.1017/CBO9780511618796.
- P. Dommel and A. Pichler. Stochastic optimization with estimated objectives. Pure and Applied Functional Analysis, 2021.
- P. Dommel and A. Pichler. Dynamic programming for data independent decision sets. Journal of Convex Analysis, 2023.
- M. Eberts and I. Steinwart. Optimal learning rates for least squares SVMs using Gaussian kernels. In Advances in Neural Information Processing Systems (NeurIPS), volume 24, pages 1539–1547, 2011. URL https://proceedings.neurips.cc/paper/2011/file/51ef186e18dc00c2d31982567235c559-Paper.pdf.
- S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. The Journal of Machine Learning Research, 2:243–264, 2002.
- S. Fischer and I. Steinwart. Sobolev norm learning rates for regularized least-squares algorithms. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
- Statistical robustness of empirical risks in machine learning. Journal of Machine Learning Research, 24(125):1–38, 2023. URL http://jmlr.org/papers/v24/20-1039.html.
- M. Honarkhah and J. Caers. Stochastic simulation of patterns using distance-based pattern modeling. Mathematical Geosciences, 42(5):487–517, Apr. 2010. ISSN 1874-8953. doi:10.1007/s11004-010-9276-7.
- H. König and S. Richter. Eigenvalues of integral operators defined by analytic kernels. Mathematische Nachrichten, 119(1):141–155, 1984. ISSN 1522-2616. doi:10.1002/mana.19841190113.
- S. Mendelson and J. Neeman. Regularization in kernel learning. The Annals of Statistics, 38(1), Feb. 2010. ISSN 0090-5364. doi:10.1214/09-aos728.
- Data-driven stochastic dual dynamic programming: Performance guarantees and regularization schemes. 2022. URL https://optimization-online.org/?p=21376.
- A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Neural Information Processing Systems, 2007. URL https://api.semanticscholar.org/CorpusID:877929.
- Early stopping for non-parametric regression: An optimal data-dependent stopping rule. 49th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2011, 09 2011. doi:10.1109/Allerton.2011.6120320.
- Less is more: Nyström computational regularization. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/03e0704b5690a2dee1861dc3ad3316c9-Paper.pdf.
- Falkon: An optimal large scale kernel method. In Advances in Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:25900554.
- B. Schölkopf. Support Vector Learning. 1997. URL https://pure.mpg.de/rest/items/item_1794215/component/file_3214422/content.
- Kernel Methods in Computational Biology. The MIT Press, July 2004. ISBN 9780262256926. doi:10.7551/mitpress/4057.001.0001.
- I. Steinwart and A. Christmann. Support Vector Machines. Springer Publishing Company, Incorporated, 1st edition, 2008. ISBN 0387772413.
- Optimal rates for regularized least squares regression. In Proceedings of the 22nd Annual Conference on Learning Theory, pages 79–93, 2009.
- R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Number 47 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018. ISBN 978-1-108-41519-4.
- H. Wendland. Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2004. doi:10.1017/CBO9780511617539.
- C. Williams and M. Seeger. Using the nyström method to speed up kernel machines. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13. MIT Press, 2000. URL https://proceedings.neurips.cc/paper_files/paper/2000/file/19de10adbaa1b2ee13f77f679fa1483a-Paper.pdf.
- Randomized sketches for kernels: Fast and optimal nonparametric regression. The Annals of Statistics, 45(3):991–1023, 2017. ISSN 00905364. URL http://www.jstor.org/stable/26362822.
- On early stopping in gradient descent learning. Constructive Approximation, 26:289–315, 08 2007. doi:10.1007/s00365-006-0663-2.
- Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2):213–238, June 2007. ISSN 0920-5691. doi:10.1007/s11263-006-9794-4.
- Divide and conquer kernel ridge regression. In S. Shalev-Shwartz and I. Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 592–617, Princeton, NJ, USA, 12–14 Jun 2013. PMLR. URL https://proceedings.mlr.press/v30/Zhang13.html.
- Paul Dommel (5 papers)