Stochastic Gradient Descent for Nonparametric Regression
Abstract: This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample. We also provide polynomial convergence rates even when the covariates do not have full support on their domain.
- Linear smoothers and additive models. The Annals of Statistics, 17(2):453–510, 1989. ISSN 00905364. URL http://www.jstor.org/stable/2241560.
- K. L. Chung. On a Stochastic Approximation Method. The Annals of Mathematical Statistics, 25(3):463 – 483, 1954. doi: 10.1214/aoms/1177728716. URL https://doi.org/10.1214/aoms/1177728716.
- Scalable kernel methods via doubly stochastic gradients. Advances in Neural Information Processing Systems, 27, 2014.
- Nonparametric stochastic approximation with large step-sizes. The Annals of Statistics, 44(4):1363–1399, 2016. ISSN 00905364. URL http://www.jstor.org/stable/43974719.
- Projection pursuit regression. Journal of the American Statistical Association, 76(376):817–823, 1981. ISSN 01621459. URL http://www.jstor.org/stable/2287576.
- Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
- A Distribution-Free Theory of Nonparametric Regression, volume 1. Springer, 2002.
- Generalized Additive Models. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis, 1990. ISBN 9780412343902. URL https://books.google.com/books?id=qa29r1Ze1coC.
- Local likelihood estimation in generalized additive models. Scandinavian Journal of Statistics, 30(2):317–337, 2003. ISSN 03036898, 14679469. URL http://www.jstor.org/stable/4616766.
- Parsimonious online learning with kernels via sparse projections in function space. Journal of Machine Learning Research, 20(3):1–44, 2019. URL http://jmlr.org/papers/v20/16-585.html.
- O. V. Lepskii. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466, 1991. doi: 10.1137/1135065. URL https://doi.org/10.1137/1135065.
- Large scale online kernel learning. Journal of Machine Learning Research, 17(47):1–43, 2016. URL http://jmlr.org/papers/v17/14-148.html.
- The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The Annals of Statistics, 27(5):1443–1490, 1999. ISSN 00905364. URL http://www.jstor.org/stable/2674078.
- Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness. Advances in Neural Information Processing Systems, 22, 2009.
- Charles J. Stone. Additive regression and other nonparametric models. The Annals of Statistics, 13(2):689–705, 1985. ISSN 00905364. URL http://www.jstor.org/stable/2241204.
- Online learning as stochastic approximation of regularization paths: Optimality and almost-sure convergence. IEEE Transactions on Information Theory, 60(9):5716–5735, 2014. doi: 10.1109/TIT.2014.2332531.
- Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer series in statistics. Springer, 2009. ISBN 978-0-387-79051-0. doi: 10.1007/B13794. URL https://doi.org/10.1007/b13794.
- Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. doi: 10.1017/9781108627771.
- Online smooth backfitting for generalized additive models. Journal of the American Statistical Association, 0(0):1–29, 2023. doi: 10.1080/01621459.2023.2182213. URL https://doi.org/10.1080/01621459.2023.2182213.
- Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8:561–596, 01 2008. doi: 10.1007/s10208-006-0237-y.
- Smooth backfitting in generalized additive models. The Annals of Statistics, 36(1):228–260, 2008. ISSN 00905364. URL http://www.jstor.org/stable/25464622.
- A sieve stochastic gradient descent estimator for online nonparametric regression in Sobolev ellipsoids. The Annals of Statistics, 50(5):2848 – 2871, 2022. doi: 10.1214/22-AOS2212. URL https://doi.org/10.1214/22-AOS2212.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.