- The paper proves that deep narrow networks with width n+m+2 and nonaffine activation functions can approximate any continuous function on compact sets.
- It extends universal approximation to polynomial activations, offering a distinct advantage over traditional shallow wide networks.
- The study also explores special cases like nowhere differentiable activations and function approximation on noncompact domains, underlining design versatility.
Universal Approximation with Deep Narrow Networks
This paper presents an examination of the Universal Approximation Theorem in the context of neural networks with bounded width and arbitrary depth, often described as deep narrow networks. The principal contribution is the proof that neural networks with an arbitrary depth and a width of n+m+2, using arbitrary nonaffine continuous activation functions, can approximate any continuous function on compact subsets of Rn with the supremum norm. This result is significant given its applicability to practical activation functions, including polynomial functions, creating a qualitative distinction between deep narrow and shallow wide neural networks.
The traditional Universal Approximation Theorem addresses networks of arbitrary width (shallow) and a single hidden layer, establishing the capability of such networks to approximate any continuous function, provided nonpolynomial activation is employed. The authors propose a complementary scenario by focusing on narrow deep networks and successfully demonstrate the universal approximation property for a broad class of activation functions.
Main Contributions and Results
- Nonpolynomial Activation Functions: The primary theorem asserts that networks with the specified structure and nonpolynomial continuously differentiable activation functions with nonzero derivatives at some point are dense in C(K;Rm) for compact K, with a reduced width of n+m+1.
- Polynomial Activation Functions: A notable achievement of this paper is extending the universal approximation capacity to networks using polynomial activation functions, a result not achievable in shallow networks. The theorem includes activation functions that are nonaffine polynomials, requiring a width of n+m+2.
- Additional Special Cases: The research expands the scope to:
- Nowhere differentiable functions: By constructing specific examples, the authors show that networks with certain nowhere differentiable activation functions also exhibit universal approximation.
- Noncompact Domains: They demonstrate that for the ReLU activation function, networks can approximate functions in Lp(Rn;Rm), underscoring the frameworkâs extensibility beyond compact domains.
Implications and Future Directions
The findings underscore a versatility in network design, suggesting that incorporating depth can compensate for constraints in width without sacrificing approximation power. The practical implications are significant for designing efficient network architectures, especially in resource-constrained environments.
Theoretically, these results pave the way for further investigations into how different architectural choices in neural networks influence their computational properties. Future work might explore the precise trade-offs between width and depth concerning depth-efficiency, training complexity, and generalization performance across various tasks. Additionally, understanding how activation function choice interacts with network capacity remains an open area, suggesting further exploration into less conventional functions might yield novel insights or applications.
In conclusion, this paper extends the landscape of the Universal Approximation Theorem by encompassing a wider variety of activation functions in deep narrow networks, affirming these networks' robust approximation capabilities and opening new avenues for research in neural network theory and practice.