Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations (1708.02691v3)

Published 9 Aug 2017 in stat.ML, cs.CG, cs.LG, math.FA, math.ST, and stat.TH

Abstract: This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width $w_{\text{min}}(d)$ so that ReLU nets of width $w_{\text{min}}(d)$ (and arbitrary depth) can approximate any continuous function on the unit cube $[0,1]d$ aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions. In particular, we prove that ReLU nets with width $d+1$ can approximate any continuous convex function of $d$ variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the $d$-dimensional cube $[0,1]d$ by ReLU nets with width $d+3.$

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Boris Hanin (50 papers)
Citations (329)

Summary

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

This paper by Boris Hanin explores the expressive capacity of deep neural networks specifically utilizing Rectified Linear Unit (ReLU) activations while maintaining a bounded network width. The analysis zooms in on understanding the minimal width necessary for a ReLU network, with arbitrary depth, to approximate any continuous function over a unit cube in a multi-dimensional space. Furthermore, it examines the depth conditions required to achieve such an approximation, with particular emphasis on convex functions.

Main Contributions

  1. Minimal Width for Universal Approximation: The paper identifies the minimal width wmin(d)w_{\text{min}}(d) required for ReLU networks to approximate any continuous function on the dd-dimensional unit cube. It is demonstrated that for convex functions, ReLU networks of width d+1d+1 suffice for universal approximation. For general continuous functions, a width of d+3d+3 is necessary.
  2. Depth Requirements:

The research provides quantitative estimates of the depth of ReLU networks required to approximate functions. Specifically, it shows that: - For convex functions, the networks can achieve approximation with a width d+1d+1. - For general continuous functions, networks with width d+3d+3 are analyzed, and the relation between the function's Lipschitz constant and the depth needed for approximation is articulated.

  1. Piecewise Affine Function Representation: An essential aspect of the paper is understanding how ReLU networks can perfectly represent piecewise affine functions. The paper proves that any such function can be exactly represented by a ReLU net with hidden layer width no greater than d+3d+3, and the depth of the network correlates with the number of affine pieces in the function.
  2. Complexity and Capacity: The paper explores the complexity of functions that ReLU networks can compute, with results indicating that networks with deep architecture but fixed width retain expressive power comparable to deep networks.

Theoretical Implications and Future Directions

The research advances the theoretical understanding of the expressive capacity of ReLU networks, particularly emphasizing depth's role without leveraging increased width. This has deep implications for the design of neural architectures in scenarios where hardware or architectural constraints limit network width but allow adjusting depth freely.

The results also pave the way for more sophisticated neural network architectures that can exploit the unique capabilities of deep ReLU networks, particularly in areas requiring precise function approximations, such as in optimization and control problems.

Practical Implications

For practitioners, the findings imply that it is feasible to design narrow yet profoundly deep networks to achieve universal function approximation, especially when dealing with convex functions. This characteristic is notably beneficial in resource-constrained environments or when deploying models embedded in hardware where memory is limited but computational steps are less restricted.

Conclusion

The paper contributes significant theoretical insights into the capabilities of ReLU activated networks, offering precise conditions under which these networks can perform universal function approximation. It highlights the nuanced roles that both width and depth play in network expressivity, suggesting directions for future exploration within both theoretical research and applied machine learning tasks. Researchers and practitioners alike stand to gain a deeper appreciation for the intricacies of neural network design and the powerful potential of bounded-width architectures.