Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 145 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 127 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

How many samples are needed to train a deep neural network? (2405.16696v1)

Published 26 May 2024 in math.ST, stat.ML, and stat.TH

Abstract: Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the usual "parametric rate" $1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Phi-3 technical report: a highly capable language model locally on your phone. arxiv.org/pdf/2404.14219, 2024.
  2. M. Anthony and Peter Bartlett. Neural network learning: Theoretical foundations. Cambridge Univ. Press, 2009.
  3. Stronger generalization bounds for deep nets via a compression approach. In ICML, pp.  254–263, 2018.
  4. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(12):2481–2495, 2017.
  5. Spectrally-normalized margin bounds for neural networks. In NIPS, volume 30, pp.  6240–6249, 2017.
  6. Alexei Botchkarev. Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology. arXiv:1809.03006, 2018.
  7. Language models are few-shot learners. arxiv.org/abs/2005.14165, 2020.
  8. Size-independent sample complexity of neural networks. In COLT, volume 75, pp.  297–299, 2018.
  9. Deep learning. MIT Press, 2016.
  10. Speech recognition with deep recurrent neural networks. In ICASSP, pp.  6645–6649, 2013.
  11. A note on the minimax risk of sparse linear regression. arxiv.org/pdf/2405.05344, 2024.
  12. On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces. Neural Networks, 123:343–361, 2020.
  13. Layer sparsity in neural networks. arxiv.org/abs/2006.15604, 2020.
  14. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access, 8:4806–4813, 2019.
  15. Training compute-optimal large language models. arxiv.org/abs/2203.15556, 2022.
  16. Deep neural networks learn non-smooth functions effectively. In AISTATS, volume 89, pp.  869–878, 2019.
  17. Nonparametric estimation of composite functions. Ann. Statist, 37(3):1360–1404, 2009.
  18. Mini-max lower bounds for ridge combinations including neural nets. In ISIT, 2017.
  19. Nonparametric regression based on hierarchical interaction models. IEEE Trans. Inform. Theory, 63(3):1620–1630, 2017.
  20. Dieter Kraft. A software package for sequential quadratic programming. Know. Reporting d. DFVLR, 1988.
  21. Deep learning. Nature, 521:436–444, 2015.
  22. Johannes Lederer. Fundamentals of high-dimensional statistics with exercises and R labs. Springer, 2022.
  23. Johannes Lederer. Statistical guarantees for sparse deep learning. AStA Adv. Stat. Anal., 2023.
  24. Meta. Introducing meta llama 3: the most capable openly available llm to date. https://ai.meta.com/blog/meta-llama-3/, 2024.
  25. Reducing computational and statistical complexity in machine learning through cardinality sparsity. arxiv.org/abs/2302.08235, 2023.
  26. Kevin P. Murphy. Machine learning: A probabilistic perspective (adaptive computation and machine learning series). The MIT Press, 2012.
  27. Generalization in deep networks: The role of distance from initialization. arxiv.org/abs/1901.01672, 2019.
  28. Norm-based capacity control in neural networks. In COLT, volume 40, pp.  1376–1401, 2015.
  29. Exploring generalization in deep learning. In NIPS, pp.  5949–5958, 2017.
  30. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. In ICLR, 2018.
  31. OpenAI. Gpt-4 technical report. arxiv.org/pdf/2303.08774, 2023.
  32. What kinds of functions do deep neural networks learn? insights from variational spline theory. SIAM J. Math. Data Sci., 4(2):464–489, 2022.
  33. Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness. In NIPS, volume 22, 2009.
  34. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res., 13:389–427, 2012.
  35. How many samples are needed to estimate a convolutional neural network? In NIPS, 2018.
  36. An introductory guide to fano’s inequality with applications in statistical estimation. Cambridge Uni. Press, 2021.
  37. Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. Ann. Statist., 48(4):1875–1897, 2020.
  38. Convergence rates of deep relu networks for multiclass classification. Electron. J. Stat., 16(1):2724–2773, 2022.
  39. Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: Optimal rate and curse of dimensionality. In ICRL, 2019.
  40. Statistical guarantees for regularized neural networks. Neural Networks, 142:148–161, 2021.
  41. Kazuma Tsuji. Estimation error analysis of deep learning on the regression problem on the variable exponent besov space. Electron. J. Stat., 15(1):1869–1908, 2021.
  42. Weak convergence and empirical processes with applications to statistics. Springer, 1996.
  43. Martin.J. Wainwright. High-dimensional statistics : A non-asymptotic viewpoint. Cambridge Uni. Press, 2019.
  44. Information-theoretic determination of minimax rates of convergence. Ann. Statis, 27(5):1564–1599, 1999.
  45. Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
  46. Deep learning meets nonparametric regression: Are weight decayed dnns locally adaptive? In ICLR, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com
Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 50 likes.

Upgrade to Pro to view all of the tweets about this paper: