Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universal Approximation with Deep Narrow Networks (1905.08539v2)

Published 21 May 2019 in cs.LG, math.CA, and stat.ML

Abstract: The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural dual' scenario for networks of bounded width and arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $\rho$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and activation function $\rho$, is dense in $C(K; \mathbb{R}^m)$ for $K \subseteq \mathbb{R}^n$ with $K$ compact. This covers every activation function possible to use in practice, and also includes polynomial activation functions, which is unlike the classical version of the theorem, and provides a qualitative difference between deep narrow networks and shallow wide networks. We then consider several extensions of this result. In particular we consider nowhere differentiable activation functions, density in noncompact domains with respect to the $L^p$-norm, and how the width may be reduced to just $n + m + 1$ formost' activation functions.

Citations (289)

Summary

  • The paper proves that deep narrow networks with width n+m+2 and nonaffine activation functions can approximate any continuous function on compact sets.
  • It extends universal approximation to polynomial activations, offering a distinct advantage over traditional shallow wide networks.
  • The study also explores special cases like nowhere differentiable activations and function approximation on noncompact domains, underlining design versatility.

Universal Approximation with Deep Narrow Networks

This paper presents an examination of the Universal Approximation Theorem in the context of neural networks with bounded width and arbitrary depth, often described as deep narrow networks. The principal contribution is the proof that neural networks with an arbitrary depth and a width of n+m+2n + m + 2, using arbitrary nonaffine continuous activation functions, can approximate any continuous function on compact subsets of Rn\mathbb{R}^n with the supremum norm. This result is significant given its applicability to practical activation functions, including polynomial functions, creating a qualitative distinction between deep narrow and shallow wide neural networks.

The traditional Universal Approximation Theorem addresses networks of arbitrary width (shallow) and a single hidden layer, establishing the capability of such networks to approximate any continuous function, provided nonpolynomial activation is employed. The authors propose a complementary scenario by focusing on narrow deep networks and successfully demonstrate the universal approximation property for a broad class of activation functions.

Main Contributions and Results

  1. Nonpolynomial Activation Functions: The primary theorem asserts that networks with the specified structure and nonpolynomial continuously differentiable activation functions with nonzero derivatives at some point are dense in C(K;Rm)C(K; \mathbb{R}^m) for compact KK, with a reduced width of n+m+1n + m + 1.
  2. Polynomial Activation Functions: A notable achievement of this paper is extending the universal approximation capacity to networks using polynomial activation functions, a result not achievable in shallow networks. The theorem includes activation functions that are nonaffine polynomials, requiring a width of n+m+2n + m + 2.
  3. Additional Special Cases: The research expands the scope to:
    • Nowhere differentiable functions: By constructing specific examples, the authors show that networks with certain nowhere differentiable activation functions also exhibit universal approximation.
    • Noncompact Domains: They demonstrate that for the ReLU activation function, networks can approximate functions in Lp(Rn;Rm)L^p(\mathbb{R}^n; \mathbb{R}^m), underscoring the framework’s extensibility beyond compact domains.

Implications and Future Directions

The findings underscore a versatility in network design, suggesting that incorporating depth can compensate for constraints in width without sacrificing approximation power. The practical implications are significant for designing efficient network architectures, especially in resource-constrained environments.

Theoretically, these results pave the way for further investigations into how different architectural choices in neural networks influence their computational properties. Future work might explore the precise trade-offs between width and depth concerning depth-efficiency, training complexity, and generalization performance across various tasks. Additionally, understanding how activation function choice interacts with network capacity remains an open area, suggesting further exploration into less conventional functions might yield novel insights or applications.

In conclusion, this paper extends the landscape of the Universal Approximation Theorem by encompassing a wider variety of activation functions in deep narrow networks, affirming these networks' robust approximation capabilities and opening new avenues for research in neural network theory and practice.

Youtube Logo Streamline Icon: https://streamlinehq.com