Optimal rates of approximation by shallow ReLU$^k$ neural networks and applications to nonparametric regression (2304.01561v3)
Abstract: We study the approximation capacity of some variation spaces corresponding to shallow ReLU$k$ neural networks. It is shown that sufficiently smooth functions are contained in these spaces with finite variation norms. For functions with less smoothness, the approximation rates in terms of the variation norm are established. Using these results, we are able to prove the optimal approximation rates in terms of the number of neurons for shallow ReLU$k$ neural networks. It is also shown how these results can be used to derive approximation bounds for deep neural networks and convolutional neural networks (CNNs). As applications, we study convergence rates for nonparametric regression using three ReLU neural network models: shallow neural network, over-parameterized neural network, and CNN. In particular, we show that shallow neural networks can achieve the minimax optimal rates for learning H\"older functions, which complements recent results for deep neural networks. It is also proven that over-parameterized (deep or shallow) neural networks can achieve nearly optimal rates for nonparametric regression.
- A general approximation lower bound in Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT norm, with applications to feed-forward neural networks. arXiv: 2206.04360, 2022.
- Neural network learning: Theoretical foundations. Cambridge University Press, 2009.
- Francis Bach. Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research, 18(19):1–53, 2017.
- Andrew R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993.
- Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002.
- Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pages 6240–6249. 2017.
- Nearly-tight VC-dimension and Pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20(63):1–17, 2019.
- Understanding neural networks with reproducing kernel Banach spaces. Applied and Computational Harmonic Analysis, 62:194–236, 2023.
- On deep learning as a remedy for the curse of dimensionality in nonparametric regression. The Annals of Statistics, 47(4):2261–2285, 2019.
- Approximation of zonoids by zonotopes. Acta mathematica, 162:73–141, 1989.
- George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989.
- Approximation theory and harmonic analysis on spheres and balls, volume 23. Springer, 2013.
- Constructive approximation, volume 303. Springer Science & Business Media, 1993.
- Optimal nonlinear approximation. Manuscripta Mathematica, 63(4):469–478, 1989.
- Zeev Ditzian. Measures of smoothness on the sphere. In Frontiers in Interpolation and Approximation, pages 75–91. Chapman and Hall/CRC, 2006.
- Theory of deep convolutional neural networks II: Spherical analysis. Neural Networks, 131:154–162, 2020.
- Charles Fefferman. Whitney’s extension problem for Cmsuperscript𝐶𝑚C^{m}italic_C start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Annals of Mathematics, 164(1):313–359, 2006.
- Charles Fefferman. Extension of Cm,ωsuperscript𝐶𝑚𝜔C^{m,\omega}italic_C start_POSTSUPERSCRIPT italic_m , italic_ω end_POSTSUPERSCRIPT-smooth functions by linear operators. Revista Matemática Iberoamericana, 25(1):1–48, 2009.
- Fitting smooth functions to data. American Mathematical Society, 2020.
- Size-independent sample complexity of neural networks. Information and Inference: A Journal of the IMA, 9(2):473–504, 2020.
- David Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992.
- Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991.
- Approximation bounds for norm constrained neural networks with applications to regression and GANs. Applied and Computational Harmonic Analysis, 65:249–278, 2023.
- Approximation by combinations of ReLU and squared ReLU ridge functions with l1superscript𝑙1l^{1}italic_l start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and l0superscript𝑙0l^{0}italic_l start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT controls. IEEE Transactions on Information Theory, 64(12):7649–7656, 2018.
- Adaptive regression estimation with multilayer feedforward neural networks. Journal of Nonparametric Statistics, 17(8):891–913, 2005.
- On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021.
- Deep learning. Nature, 521(7553):436–444, 2015.
- Probability in Banach spaces: isoperimetry and processes. Springer, 1991.
- Universal consistency of deep convolutional neural networks. IEEE Transactions on Information Theory, 68(7):4610–4617, 2022.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
- Uniform approximation rates and metric entropy of shallow neural networks. Research in the Mathematical Sciences, 9(3):46, 2022.
- On the degree of approximation by manifolds of finite pseudo-dimension. Constructive Approximation, 15(2):291–300, 1999.
- Yuly Makovoz. Random approximants and neural networks. Journal of Approximation Theory, 85(1):98–109, 1996.
- Rates of approximation by ReLU shallow neural networks. Preprint, 2023.
- Jiří Matoušek. Improved upper bounds for approximation by zonotopes. Acta Mathematica, 177(1):55–73, 1996.
- Convergence rates for single hidden layer feedforward networks. Neural Networks, 7(1):147–158, 1994.
- Hrushikesh N. Mhaskar. Neural networks for optimal approximation of smooth and analytic functions. Neural computation, 8(1):164–177, 1996.
- Foundations of machine learning. MIT Press, 2018.
- Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research, 21(174):1–38, 2020.
- Norm-based capacity control in neural networks. In Proceedings of the 28th Conference on Learning Theory, pages 1376–1401. 2015.
- A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks. In 6th International Conference on Learning Representations, 2018.
- A function space view of bounded norm infinite width ReLU nets: The multivariate case. In 8th International Conference on Learning Representations, 2020.
- What kinds of functions do deep neural networks learn? insights from variational spline theory. SIAM Journal on Mathematics of Data Science, 4(2):464–489, 2022.
- Near-minimax optimal estimation with shallow ReLU neural networks. IEEE Transactions on Information Theory, 69(2):1125–1140, 2023.
- Equivalence of approximation by convolutional neural networks and fully-connected networks. Proceedings of the American Mathematical Society, 148(4):1567–1581, 2020.
- Allan Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, 1999.
- Gilles Pisier. Remarques sur un résultat non publié de B. Maurey. Séminaire d’Analyse fonctionnelle (dit” Maurey-Schwartz”), pages 1–12, 1981.
- Kh. P. Rustamov. On equivalence of different moduli of smoothness on the sphere. Trudy Matematicheskogo Instituta im. V. A. Steklova, 204:274–304, 1993.
- How do infinite width bounded norm networks look in function space? In Proceedings of the 32nd Conference on Learning Theory, pages 2667–2690. 2019.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
- Deep network approximation characterized by number of neurons. Communications in Computational Physics, 28(5):1768–1811, 2020.
- Approximation rates for neural networks with general activation functions. Neural Networks, 128:313–321, 2020.
- Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks. Foundations of Computational Mathematics, pages 1–57, 2022.
- Characterization of the variation spaces corresponding to shallow neural networks. Constructive Approximation, 2023.
- Charles J. Stone. Optimal global rates of convergence for nonparametric regression. The annals of statistics, 10(4):1040–1053, 1982.
- Martin J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- Information-theoretic determination of minimax rates of convergence. The Annals of Statistics, 27(5):1564–1599, 1999.
- Yunfei Yang. Learning distributions by generative adversarial networks: approximation and generalization. PhD thesis, The Hong Kong University of Science and Technology, 2022.
- Approximation in shift-invariant spaces with deep ReLU neural networks. Neural Networks, 153:269–281, 2022.
- Dmitry Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017.
- Dmitry Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. In Proceedings of the 31st Conference on Learning Theory, pages 639–649. 2018.
- Ding-Xuan Zhou. Theory of deep convolutional neural networks: Downsampling. Neural Networks, 124:319–327, 2020a.
- Ding-Xuan Zhou. Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis, 48(2):787–794, 2020b.