Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Expressivity and Approximation Properties of Deep Neural Networks with ReLU$^k$ Activation (2312.16483v2)

Published 27 Dec 2023 in cs.LG, cs.NA, cs.NE, and math.NA

Abstract: In this paper, we investigate the expressivity and approximation properties of deep neural networks employing the ReLU$k$ activation function for $k \geq 2$. Although deep ReLU networks can approximate polynomials effectively, deep ReLU$k$ networks have the capability to represent higher-degree polynomials precisely. Our initial contribution is a comprehensive, constructive proof for polynomial representation using deep ReLU$k$ networks. This allows us to establish an upper bound on both the size and count of network parameters. Consequently, we are able to demonstrate a suboptimal approximation rate for functions from Sobolev spaces as well as for analytic functions. Additionally, through an exploration of the representation power of deep ReLU$k$ networks for shallow networks, we reveal that deep ReLU$k$ networks can approximate functions from a range of variation spaces, extending beyond those generated solely by the ReLU$k$ activation function. This finding demonstrates the adaptability of deep ReLU$k$ networks in approximating functions within various variation spaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Understanding deep neural networks with rectified linear units. In International Conference on Learning Representations, 2018.
  2. A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
  3. Question answering with subgraph embeddings. arXiv preprint arXiv:1406.3676, 2014.
  4. Power series expansion neural network. Journal of Computational Science, 59:101552, 2022.
  5. C. K. Chui and X. Li. Approximation by ridge functions and neural networks with one hidden layer. Journal of Approximation Theory, 70(2):131–141, 1992.
  6. Limitations of the approximation capabilities of neural networks with one hidden layer. Advances in Computational Mathematics, 5:233–243, 1996.
  7. Deep neural networks for rotation-invariance approximation and learning. Analysis and Applications, 17(05):737–772, 2019.
  8. Natural language processing (almost) from scratch. Journal of machine learning research, 12(ARTICLE):2493–2537, 2011.
  9. F. Cucker and D. X. Zhou. Learning theory: an approximation theory viewpoint, volume 24. Cambridge University Press, 2007.
  10. G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
  11. O. Delalleau and Y. Bengio. Shallow vs. deep sum-product networks. Advances in neural information processing systems, 24, 2011.
  12. Constructive approximation, volume 303. Springer Science & Business Media, 1993.
  13. Encoding of data sets and algorithms. arXiv preprint arXiv:2303.00984, 2023.
  14. R. Eldan and O. Shamir. The power of depth for feedforward neural networks. In Conference on learning theory, pages 907–940. PMLR, 2016.
  15. Theory of deep convolutional neural networks ii: Spherical analysis. Neural Networks, 131:154–162, 2020.
  16. W. Gautschi. On inverses of vandermonde and confluent vandermonde matrices iii. Numerische Mathematik, 29:445–450, 1978.
  17. J. He. On the optimal expressive power of relu dnns and its application in approximation with kolmogorov superposition theorem. arXiv preprint arXiv:2308.05509, 2023.
  18. Approximation properties of deep relu cnns. Research in the mathematical sciences, 9(3):38, 2022.
  19. Relu deep neural networks from the hierarchical basis perspective. Computers & Mathematics with Applications, 120:105–114, 2022.
  20. Relu deep neural networks and linear finite elements. Journal of Computational Mathematics, 38(3):502–527, 2020.
  21. J. He and J. Xu. Mgnet: A unified framework of multigrid and convolutional neural network. Science china mathematics, 62:1331–1354, 2019.
  22. J. He and J. Xu. Deep neural networks and finite elements of any order on arbitrary dimensions. arXiv preprint arXiv:2312.14276, 2023.
  23. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  24. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6):82–97, 2012.
  25. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
  26. On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007, 2014.
  27. Approximation by combinations of ReLU and squared ReLU ridge functions with ℓ1superscriptℓ1\ell^{1}roman_ℓ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and ℓ0superscriptℓ0\ell^{0}roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT controls. IEEE Transactions on Information Theory, 64(12):7649–7656, 2018.
  28. A. N. Kolmogorov. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. In Doklady Akademii Nauk, volume 114, pages 953–956. Russian Academy of Sciences, 1957.
  29. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  30. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  31. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993.
  32. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. Communications in Computational Physics, 27(2):379–411, 2019.
  33. The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30, 2017.
  34. Uniform approximation rates and metric entropy of shallow neural networks. Research in the Mathematical Sciences, 9(3):46, 2022.
  35. Theory of deep convolutional neural networks iii: Approximating radial functions. Neural Networks, 144:778–790, 2021.
  36. Approximating functions with multi-features by deep convolutional neural networks. Analysis and Applications, 21(01):93–125, 2023.
  37. T. Mao and D.-X. Zhou. Rates of approximation by ReLU shallow neural networks. Journal of Complexity, 79:101784, 2023.
  38. H. N. Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Advances in Computational Mathematics, 1:61–80, 1993.
  39. Degree of approximation by neural and translation networks with a single hidden layer. Advances in applied mathematics, 16(2):151–183, 1995.
  40. Strategies for training large scale neural network language models. In 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pages 196–201. IEEE, 2011.
  41. H. Montanelli and Q. Du. New error bounds for deep ReLU networks using sparse grids. SIAM Journal on Mathematics of Data Science, 1(1):78–92, 2019.
  42. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. International Journal of Automation and Computing, 14(5):503–519, 2017.
  43. I. Safran and O. Shamir. Depth separation in ReLU networks for approximating smooth non-linear functions. arXiv preprint arXiv:1610.09887, 14, 2016.
  44. Improvements to deep convolutional neural networks for lvcsr. In 2013 IEEE workshop on automatic speech recognition and understanding, pages 315–320. IEEE, 2013.
  45. Deep network approximation characterized by number of neurons. arXiv preprint arXiv:1906.05497, 2019.
  46. Optimal approximation rate of ReLU networks in terms of width and depth. Journal de Mathématiques Pures et Appliquées, 157:101–135, 2022.
  47. J. W. Siegel. Optimal approximation rates for deep ReLU neural networks on sobolev spaces. arXiv preprint arXiv:2211.14400, 2022.
  48. J. W. Siegel and J. Xu. Optimal approximation rates and metric entropy of ReLUk and cosine networks. arXiv preprint arXiv:2101.12365, 2021.
  49. J. W. Siegel and J. Xu. High-order approximation rates for shallow neural networks with cosine and ReLUk activation functions. Applied and Computational Harmonic Analysis, 58:1–26, 2022.
  50. J. W. Siegel and J. Xu. Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks. Foundations of Computational Mathematics, pages 1–57, 2022.
  51. Approximation of nonlinear functionals using deep ReLU networks. arXiv preprint arXiv:2304.04443, 2023.
  52. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
  53. M. Telgarsky. Representation benefits of deep feedforward networks. arXiv preprint arXiv:1509.08101, 2015.
  54. J. Xu. The finite neuron method and convergence analysis. arXiv preprint arXiv:2010.01458, 2020.
  55. Y. Yang and D.-X. Zhou. Optimal rates of approximation by shallow ReLUk𝑘{}^{k}start_FLOATSUPERSCRIPT italic_k end_FLOATSUPERSCRIPT neural networks and applications to nonparametric regression. arXiv preprint arXiv:2304.01561, 2023.
  56. D. Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017.
  57. D.-X. Zhou. Universality of deep convolutional neural networks. Applied and computational harmonic analysis, 48(2):787–794, 2020.
Citations (3)

Summary

We haven't generated a summary for this paper yet.