2000 character limit reached
Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes (2401.04286v2)
Published 8 Jan 2024 in stat.ML and cs.LG
Abstract: In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.
- Takuji Onoyama, Masaaki Sibuya and Hiroshi Tanaka “Limit distribution of the minimum distance between independent and identically distributed d-dimensional random variables” In Statistical Extremes and Applications Springer, 1984, pp. 549–562
- “Strong universal consistency of neural network classifiers” In IEEE Transactions on Information Theory 39.4 IEEE, 1993, pp. 1146–1151
- “Data compression and harmonic analysis” In IEEE transactions on information theory 44.6 IEEE, 1998, pp. 2435–2476
- Martin Anthony, Peter L Bartlett and Peter L Bartlett “Neural network learning: Theoretical foundations” cambridge university press Cambridge, 1999
- Enno Mammen and Alexandre B Tsybakov “Smooth discrimination analysis” In The Annals of Statistics 27.6 Institute of Mathematical Statistics, 1999, pp. 1808–1829
- Allan Pinkus “Approximation theory of the MLP model in neural networks” In Acta numerica 8 Cambridge University Press, 1999, pp. 143–195
- Yuhong Yang “Minimax nonparametric classification. I. Rates of convergence” In IEEE Transactions on Information Theory 45.7 IEEE, 1999, pp. 2271–2284
- Kai Lai Chung “A course in probability theory” Academic press, 2001
- Alexander B Tsybakov “Optimal aggregation of classifiers in statistical learning” In The Annals of Statistics 32.1 Institute of Mathematical Statistics, 2004, pp. 135–166
- Tong Zhang “Statistical behavior and consistency of classification methods based on convex risk minimization” In The Annals of Statistics 32.1 Institute of Mathematical Statistics, 2004, pp. 56–85
- Stéphane Boucheron, Olivier Bousquet and Gábor Lugosi “Theory of classification: A survey of some recent advances” In ESAIM: probability and statistics 9 EDP Sciences, 2005, pp. 323–375
- Peter L Bartlett, Michael I Jordan and Jon D McAuliffe “Convexity, classification, and risk bounds” In Journal of the American Statistical Association 101.473 Taylor & Francis, 2006, pp. 138–156
- Jean-Yves Audibert and Alexandre B Tsybakov “Fast learning rates for plug-in classifiers”, 2007
- Aicke Hinrichs, Iwona Piotrowska and Mariusz Piotrowski “On the degree of compactness of embeddings between weighted modulation spaces” In Journal of Function Spaces 6 Hindawi, 2008, pp. 303–317
- “Support vector machines” Springer Science & Business Media, 2008
- Charles Fefferman “Extension of Cm,ωsuperscript𝐶𝑚𝜔C^{m,\omega}italic_C start_POSTSUPERSCRIPT italic_m , italic_ω end_POSTSUPERSCRIPT-Smooth Functions by Linear Operators” In Revista Matemática Iberoamericana 25.1 Real Sociedad Matemática Española, 2009, pp. 1–48
- Vladimir Koltchinskii “Oracle inequalities in empirical risk minimization and sparse recovery problems: École D’Été de Probabilités de Saint-Flour XXXVIII-2008” Springer Science & Business Media, 2011
- Luc Devroye, László Györfi and Gábor Lugosi “A probabilistic theory of pattern recognition” Springer Science & Business Media, 2013
- “Optimal exponential bounds on the accuracy of classification” In Constructive Approximation 39 Springer, 2014, pp. 421–444
- Richard M Dudley “Real analysis and probability” CRC Press, 2018
- “Optimal approximation of piecewise smooth functions using deep ReLU neural networks” In Neural Networks 108 Elsevier, 2018, pp. 296–330
- “Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks” In The Journal of Machine Learning Research 20.1 JMLR. org, 2019, pp. 2285–2301
- Ram Parkash Sharma and Nitakshi Goyal “Disjoint Union Metric and Topological Spaces.” In Southeast Asian Bulletin of Mathematics 44.5, 2020
- “The phase diagram of approximation rates for deep neural networks” In Advances in neural information processing systems 33, 2020, pp. 13005–13015
- Ronald DeVore, Boris Hanin and Guergana Petrova “Neural network approximation” In Acta Numerica 30 Cambridge University Press, 2021, pp. 327–444
- “Deep neural network approximation theory” In IEEE Transactions on Information Theory 67.5 IEEE, 2021, pp. 2581–2623
- “Mathematical foundations of infinite-dimensional statistical models” Cambridge university press, 2021
- Philipp Petersen, Mones Raslan and Felix Voigtlaender “Topological properties of the set of functions generated by neural networks of fixed size” In Foundations of computational mathematics 21 Springer, 2021, pp. 375–444
- Philipp Grohs, Andreas Klotz and Felix Voigtlaender “Phase transitions in rate distortion theory and deep learning” In Foundations of Computational Mathematics 23.1 Springer, 2023, pp. 329–392
- Hyunouk Ko, Namjoon Suh and Xiaoming Huo “On Excess Risk Convergence Rates of Neural Network Classifiers” In arXiv preprint arXiv:2309.15075, 2023
- Zihan Zhang, Lei Shi and Ding-Xuan Zhou “Classification with Deep Neural Networks and Logistic Loss” In arXiv preprint arXiv:2307.16792, 2023