2000 character limit reached
    
  How many samples are needed to train a deep neural network? (2405.16696v1)
    Published 26 May 2024 in math.ST, stat.ML, and stat.TH
  
  Abstract: Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate $1/\sqrt{n}$ in the sample size $n$ rather than the usual "parametric rate" $1/n$. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples.
- Phi-3 technical report: a highly capable language model locally on your phone. arxiv.org/pdf/2404.14219, 2024.
- M. Anthony and Peter Bartlett. Neural network learning: Theoretical foundations. Cambridge Univ. Press, 2009.
- Stronger generalization bounds for deep nets via a compression approach. In ICML, pp. 254–263, 2018.
- SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(12):2481–2495, 2017.
- Spectrally-normalized margin bounds for neural networks. In NIPS, volume 30, pp. 6240–6249, 2017.
- Alexei Botchkarev. Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology. arXiv:1809.03006, 2018.
- Language models are few-shot learners. arxiv.org/abs/2005.14165, 2020.
- Size-independent sample complexity of neural networks. In COLT, volume 75, pp. 297–299, 2018.
- Deep learning. MIT Press, 2016.
- Speech recognition with deep recurrent neural networks. In ICASSP, pp. 6645–6649, 2013.
- A note on the minimax risk of sparse linear regression. arxiv.org/pdf/2405.05344, 2024.
- On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces. Neural Networks, 123:343–361, 2020.
- Layer sparsity in neural networks. arxiv.org/abs/2006.15604, 2020.
- The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access, 8:4806–4813, 2019.
- Training compute-optimal large language models. arxiv.org/abs/2203.15556, 2022.
- Deep neural networks learn non-smooth functions effectively. In AISTATS, volume 89, pp. 869–878, 2019.
- Nonparametric estimation of composite functions. Ann. Statist, 37(3):1360–1404, 2009.
- Mini-max lower bounds for ridge combinations including neural nets. In ISIT, 2017.
- Nonparametric regression based on hierarchical interaction models. IEEE Trans. Inform. Theory, 63(3):1620–1630, 2017.
- Dieter Kraft. A software package for sequential quadratic programming. Know. Reporting d. DFVLR, 1988.
- Deep learning. Nature, 521:436–444, 2015.
- Johannes Lederer. Fundamentals of high-dimensional statistics with exercises and R labs. Springer, 2022.
- Johannes Lederer. Statistical guarantees for sparse deep learning. AStA Adv. Stat. Anal., 2023.
- Meta. Introducing meta llama 3: the most capable openly available llm to date. https://ai.meta.com/blog/meta-llama-3/, 2024.
- Reducing computational and statistical complexity in machine learning through cardinality sparsity. arxiv.org/abs/2302.08235, 2023.
- Kevin P. Murphy. Machine learning: A probabilistic perspective (adaptive computation and machine learning series). The MIT Press, 2012.
- Generalization in deep networks: The role of distance from initialization. arxiv.org/abs/1901.01672, 2019.
- Norm-based capacity control in neural networks. In COLT, volume 40, pp. 1376–1401, 2015.
- Exploring generalization in deep learning. In NIPS, pp. 5949–5958, 2017.
- A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. In ICLR, 2018.
- OpenAI. Gpt-4 technical report. arxiv.org/pdf/2303.08774, 2023.
- What kinds of functions do deep neural networks learn? insights from variational spline theory. SIAM J. Math. Data Sci., 4(2):464–489, 2022.
- Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness. In NIPS, volume 22, 2009.
- Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res., 13:389–427, 2012.
- How many samples are needed to estimate a convolutional neural network? In NIPS, 2018.
- An introductory guide to fano’s inequality with applications in statistical estimation. Cambridge Uni. Press, 2021.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. Ann. Statist., 48(4):1875–1897, 2020.
- Convergence rates of deep relu networks for multiclass classification. Electron. J. Stat., 16(1):2724–2773, 2022.
- Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: Optimal rate and curse of dimensionality. In ICRL, 2019.
- Statistical guarantees for regularized neural networks. Neural Networks, 142:148–161, 2021.
- Kazuma Tsuji. Estimation error analysis of deep learning on the regression problem on the variable exponent besov space. Electron. J. Stat., 15(1):1869–1908, 2021.
- Weak convergence and empirical processes with applications to statistics. Springer, 1996.
- Martin.J. Wainwright. High-dimensional statistics : A non-asymptotic viewpoint. Cambridge Uni. Press, 2019.
- Information-theoretic determination of minimax rates of convergence. Ann. Statis, 27(5):1564–1599, 1999.
- Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
- Deep learning meets nonparametric regression: Are weight decayed dnns locally adaptive? In ICLR, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.