The Expressive Power of Neural Networks: A View from the Width (1709.02540v3)

Published 8 Sep 2017 in cs.LG

Abstract: The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-$2$) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-$(n+4)$ ReLU networks, where $n$ is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-$n$ ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.

Authors (5)

Zhou Lu (22 papers)
Hongming Pu (5 papers)
Feicheng Wang (4 papers)
Zhiqiang Hu (48 papers)
Liwei Wang (240 papers)

Citations (836)

View on Semantic Scholar

Summary

The paper establishes a Universal Approximation Theorem showing that width-(n+4) ReLU networks can approximate any Lebesgue-integrable function.
It identifies a phase transition in expressiveness, proving that networks with width at most n fail to approximate nearly all functions.
The study provides a polynomial lower bound for width efficiency and validates its claims with experiments demonstrating high approximation accuracy.

The Expressive Power of Neural Networks: A View from the Width

The paper "The Expressive Power of Neural Networks: A View from the Width" by Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang provides a thorough investigation into the impact of network width on the expressiveness of neural networks, complementing the body of work that has traditionally focused on network depth.

Universal Approximation Theorem for Width-Bounded ReLU Networks

The authors establish a Universal Approximation Theorem for width-bounded ReLU networks. Specifically, they show that width- $(n+4)$ ReLU networks, where $n$ is the input dimension, are universal approximators. This demonstrates that such networks can approximate any Lebesgue-integrable function on $n$ -dimensional space within an arbitrary $L^1$ distance. This result is particularly noteworthy as it contrasts the classical universal approximation theorem, which addresses depth-bounded networks.

The proof, albeit intricate, involves constructing a network architecture that approximates any integrable function through a series of blocks. Each block in the proposed architecture can approximate functions exhibiting specific properties and then sum these approximations. The construction method provides a constructive approach to demonstrate that width- $(n+4)$ ReLU networks offer universal approximation capabilities.

Phase Transition in Expressive Power

One key finding is the identification of a phase transition in the expressiveness of ReLU networks at a critical width corresponding to the input dimension. For networks with width no greater than $n$ , the paper proves that almost all functions cannot be approximated, except on a zero-measure set. This starkly contrasts with the ability of width- $(n+4)$ networks to approximate any integrable function. The theoretical underpinning employs geometrical constructs and linear algebraic arguments to solidify this transition.

Width Efficiency of ReLU Networks

The paper also rigorously explores width efficiency, presenting a polynomial lower bound on the width required for certain classes of functions that are not realizable by any narrower networks with depth no more than polynomial of the original depth. This result is essential when considering the dual perspective of expressiveness, suggesting that while depth efficiency enjoys exponential lower bounds, width requires only a polynomial increase for the networks studied. However, the authors explicitly pose an open problem regarding whether an exponential lower bound or polynomial upper bound could eventually be proven for width efficiency, which would significantly advance the theoretical implications in this area.

Experimental Validation and Implications

Extensive experiments complement the theoretical results by demonstrating that networks with slightly larger sizes than the polynomial lower bound achieve high approximation accuracy. Various functions are represented by wide networks, and attempts are made to approximate these with narrower but deeper networks. The empirical results strongly support the theoretical claims that depth may indeed be more effective than width for ReLU networks' expressiveness.

Theoretical and Practical Implications

From a practical standpoint, this work informs neural network architecture design choices, emphasizing that while increasing depth often enhances expressiveness, the width also plays a crucial role, especially for certain classes of functions. From a theoretical perspective, the results contribute to a more nuanced understanding of neural networks' expressiveness, highlighting the phase transition and efficiency paradigms under the dimensionality constraints.

Speculations on Future AI Developments

Looking ahead, the implications of this research suggest several exciting avenues for future developments in AI. Understanding the intricate balance between depth and width could lead to the design of more efficient architectures that harness both dimensions' strengths. Moreover, the posed open problems invite further investigation, potentially leading to new theoretical breakthroughs that refine our comprehension of neural networks' capabilities.

In conclusion, "The Expressive Power of Neural Networks: A View from the Width" significantly enhances our understanding of neural networks by examining their expressiveness from a width perspective. It presents profound theoretical contributions, backed by robust empirical evidence, thereby influencing both the theoretical landscape and practical applications in neural network design.

PDF Markdown