Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A finite sample analysis of the benign overfitting phenomenon for ridge function estimation (2007.12882v5)

Published 25 Jul 2020 in stat.ML and cs.LG

Abstract: Recent extensive numerical experiments in high scale machine learning have allowed to uncover a quite counterintuitive phase transition, as a function of the ratio between the sample size and the number of parameters in the model. As the number of parameters $p$ approaches the sample size $n$, the generalisation error increases, but surprisingly, it starts decreasing again past the threshold $p=n$. This phenomenon, brought to the theoretical community attention in \cite{belkin2019reconciling}, has been thoroughly investigated lately, more specifically for simpler models than deep neural networks, such as the linear model when the parameter is taken to be the minimum norm solution to the least-squares problem, firstly in the asymptotic regime when $p$ and $n$ tend to infinity, see e.g. \cite{hastie2019surprises}, and recently in the finite dimensional regime and more specifically for linear models \cite{bartlett2020benign}, \cite{tsigler2020benign}, \cite{lecue2022geometrical}. In the present paper, we propose a finite sample analysis of non-linear models of \textit{ridge} type, where we investigate the \textit{overparametrised regime} of the double descent phenomenon for both the \textit{estimation problem} and the \textit{prediction} problem. Our results provide a precise analysis of the distance of the best estimator from the true parameter as well as a generalisation bound which complements recent works of \cite{bartlett2020benign} and \cite{chinot2020benign}. Our analysis is based on tools closely related to the continuous Newton method \cite{neuberger2007continuous} and a refined quantitative analysis of the performance in prediction of the minimum $\ell_2$-norm solution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 2020.
  2. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019.
  3. To understand deep learning we need to understand kernel learning. arXiv preprint arXiv:1802.01396, 2018.
  4. Near-ideal model selection by ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization. The Annals of Statistics, 37(5A):2145–2177, 2009.
  5. Alfonso Castro and JW Neuberger. An inverse function theorem via continuous newton’s method. 2001.
  6. Benign overfitting in the large deviation regime. arXiv preprint arXiv:2003.05838, 2020.
  7. Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data. In Conference on Learning Theory, pages 2668–2703. PMLR, 2022.
  8. Surprises in high-dimensional ridgeless least squares interpolation. arXiv preprint arXiv:1903.08560, 2019.
  9. A geometrical viewpoint on the benign overfitting property of the minimum l⁢_⁢2𝑙_2l\_2italic_l _ 2-norm interpolant estimator. arXiv preprint arXiv:2203.05873, 2022.
  10. Just interpolate: Kernel” ridgeless” regression can generalize. arXiv preprint arXiv:1808.00387, 2018.
  11. The generalization error of random features regression: Precise asymptotics and double descent curve. arXiv preprint arXiv:1908.05355, 2019.
  12. The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics, 75(4):667–766, 2022.
  13. Shahar Mendelson. Extending the small-ball method. arXiv preprint arXiv:1709.00843, 2017.
  14. John W Neuberger. The continuous newton’s method, inverse functions, and nash-moser. The American Mathematical Monthly, 114(5):432–437, 2007.
  15. High dimensional statistics.
  16. Benign overfitting in ridge regression. arXiv preprint arXiv:2009.14286, 2020.
  17. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027, 2010.
  18. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
Citations (6)

Summary

We haven't generated a summary for this paper yet.