Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes (2401.04286v2)

Published 8 Jan 2024 in stat.ML and cs.LG

Abstract: In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Takuji Onoyama, Masaaki Sibuya and Hiroshi Tanaka “Limit distribution of the minimum distance between independent and identically distributed d-dimensional random variables” In Statistical Extremes and Applications Springer, 1984, pp. 549–562
  2. “Strong universal consistency of neural network classifiers” In IEEE Transactions on Information Theory 39.4 IEEE, 1993, pp. 1146–1151
  3. “Data compression and harmonic analysis” In IEEE transactions on information theory 44.6 IEEE, 1998, pp. 2435–2476
  4. Martin Anthony, Peter L Bartlett and Peter L Bartlett “Neural network learning: Theoretical foundations” cambridge university press Cambridge, 1999
  5. Enno Mammen and Alexandre B Tsybakov “Smooth discrimination analysis” In The Annals of Statistics 27.6 Institute of Mathematical Statistics, 1999, pp. 1808–1829
  6. Allan Pinkus “Approximation theory of the MLP model in neural networks” In Acta numerica 8 Cambridge University Press, 1999, pp. 143–195
  7. Yuhong Yang “Minimax nonparametric classification. I. Rates of convergence” In IEEE Transactions on Information Theory 45.7 IEEE, 1999, pp. 2271–2284
  8. Kai Lai Chung “A course in probability theory” Academic press, 2001
  9. Alexander B Tsybakov “Optimal aggregation of classifiers in statistical learning” In The Annals of Statistics 32.1 Institute of Mathematical Statistics, 2004, pp. 135–166
  10. Tong Zhang “Statistical behavior and consistency of classification methods based on convex risk minimization” In The Annals of Statistics 32.1 Institute of Mathematical Statistics, 2004, pp. 56–85
  11. Stéphane Boucheron, Olivier Bousquet and Gábor Lugosi “Theory of classification: A survey of some recent advances” In ESAIM: probability and statistics 9 EDP Sciences, 2005, pp. 323–375
  12. Peter L Bartlett, Michael I Jordan and Jon D McAuliffe “Convexity, classification, and risk bounds” In Journal of the American Statistical Association 101.473 Taylor & Francis, 2006, pp. 138–156
  13. Jean-Yves Audibert and Alexandre B Tsybakov “Fast learning rates for plug-in classifiers”, 2007
  14. Aicke Hinrichs, Iwona Piotrowska and Mariusz Piotrowski “On the degree of compactness of embeddings between weighted modulation spaces” In Journal of Function Spaces 6 Hindawi, 2008, pp. 303–317
  15. “Support vector machines” Springer Science & Business Media, 2008
  16. Charles Fefferman “Extension of Cm,ωsuperscript𝐶𝑚𝜔C^{m,\omega}italic_C start_POSTSUPERSCRIPT italic_m , italic_ω end_POSTSUPERSCRIPT-Smooth Functions by Linear Operators” In Revista Matemática Iberoamericana 25.1 Real Sociedad Matemática Española, 2009, pp. 1–48
  17. Vladimir Koltchinskii “Oracle inequalities in empirical risk minimization and sparse recovery problems: École D’Été de Probabilités de Saint-Flour XXXVIII-2008” Springer Science & Business Media, 2011
  18. Luc Devroye, László Györfi and Gábor Lugosi “A probabilistic theory of pattern recognition” Springer Science & Business Media, 2013
  19. “Optimal exponential bounds on the accuracy of classification” In Constructive Approximation 39 Springer, 2014, pp. 421–444
  20. Richard M Dudley “Real analysis and probability” CRC Press, 2018
  21. “Optimal approximation of piecewise smooth functions using deep ReLU neural networks” In Neural Networks 108 Elsevier, 2018, pp. 296–330
  22. “Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks” In The Journal of Machine Learning Research 20.1 JMLR. org, 2019, pp. 2285–2301
  23. Ram Parkash Sharma and Nitakshi Goyal “Disjoint Union Metric and Topological Spaces.” In Southeast Asian Bulletin of Mathematics 44.5, 2020
  24. “The phase diagram of approximation rates for deep neural networks” In Advances in neural information processing systems 33, 2020, pp. 13005–13015
  25. Ronald DeVore, Boris Hanin and Guergana Petrova “Neural network approximation” In Acta Numerica 30 Cambridge University Press, 2021, pp. 327–444
  26. “Deep neural network approximation theory” In IEEE Transactions on Information Theory 67.5 IEEE, 2021, pp. 2581–2623
  27. “Mathematical foundations of infinite-dimensional statistical models” Cambridge university press, 2021
  28. Philipp Petersen, Mones Raslan and Felix Voigtlaender “Topological properties of the set of functions generated by neural networks of fixed size” In Foundations of computational mathematics 21 Springer, 2021, pp. 375–444
  29. Philipp Grohs, Andreas Klotz and Felix Voigtlaender “Phase transitions in rate distortion theory and deep learning” In Foundations of Computational Mathematics 23.1 Springer, 2023, pp. 329–392
  30. Hyunouk Ko, Namjoon Suh and Xiaoming Huo “On Excess Risk Convergence Rates of Neural Network Classifiers” In arXiv preprint arXiv:2309.15075, 2023
  31. Zihan Zhang, Lei Shi and Ding-Xuan Zhou “Classification with Deep Neural Networks and Logistic Loss” In arXiv preprint arXiv:2307.16792, 2023
Citations (1)

Summary

We haven't generated a summary for this paper yet.