ResNet with one-neuron hidden layers is a Universal Approximator (1806.10909v2)

Published 28 Jun 2018 in cs.LG and stat.ML

Abstract: We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in $d$ dimensions, i.e. $\ell_1(\mathbb{R}^d)$. Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and $d$. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension $d$ [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture.

Citations (216)

View on Semantic Scholar

Summary

The paper demonstrates that deep ResNets with one-neuron hidden layers achieve universal approximation for Lebesgue-integrable functions using identity mappings and ReLU activations.
It provides rigorous proofs by constructing ResNets that approximate piecewise constant functions through shifting, min/max operations, and induction across dimensions.
The study highlights why narrow ResNets outperform fully connected networks of equal width, indicating efficient architectures for robust function approximation.

ResNet with One-Neuron Hidden Layers as a Universal Approximator

The paper "ResNet with one-neuron hidden layers is a Universal Approximator" by Hongzhou Lin and Stefanie Jegelka presents a novel exploration of the universal approximation capabilities of deep ResNet architectures. This investigation articulates the conditions under which a narrow, multilayered ResNet structure demonstrates the ability to approximate any Lebesgue-integrable function. This work stands in contrast to previous results that deemed fully connected networks with equivalent dimensions as not being universal approximators.

Summary and Contributions

The authors aim to settle the longstanding question regarding the requisite depth and width for universal function approximation in neural networks, specifically focusing on the structure characterized by a residual network (ResNet) with ReLU activation functions. The key finding is that deep ResNets with alternating dimensions—one neuron per hidden layer—possess the capacity to achieve universal approximation, unlike analogous fully connected architectures that are limited when their width equals the input dimension, $d$ .

Two cornerstone contributions highlight this paper:

Theoretical Proof of Universal Approximation: Through rigorous mathematical proofs, the paper demonstrates that ResNets with a single neuron per hidden layer, coupled with identity mappings, can approximate any function from $\ell_1(R^d)$ to arbitrary accuracy as the network depth increases indefinitely.
Comparison Against Fully Connected Networks: A comparative analytical framework clarifies why ResNets outperform fully connected networks of the same width. The integration of identity mappings in ResNets enhances their representational power, thus circumventing limitations that fully connected networks face, even when depth is not a constraint.

Methodology and Theoretical Insights

The pivotal claim involves constructing a ResNet that can approximate piecewise constant functions, pivotal examples for approximating more complex functions in higher dimensions. Known as grid indicator functions within this paper, these constructions utilize ReLU non-linearity and leverage the identity mapping properties inherent in ResNets. The framework includes crucial operations attainable with these networks:

The ability to shift functions by constants.
Utilizing min and max operations to manipulate and prune level sets and slopes of calculated functions.

The methodological approach includes an insightful buildup from one-dimensional approximations to higher dimensions through induction, carefully managing construction complexity despite constrained layer width.

Implications and Future Directions

The implications of this exploration are multifaceted. On a theoretical plane, this work redefines the breadth of architectural considerations for designing neural networks capable of universal function representation. Practically, it supports efforts to leverage narrower deep networks in scenarios demanding high-fidelity approximations without exponentially increasing network width. This theoretically supports growing trends of deepening networks instead of merely broadening them.

Prospectively, further research could explore more efficient learning algorithms specifically optimized for such narrow networks and extend these findings to practical, high-dimensional, real-world datasets. Additionally, more in-depth empirical analysis aligning network complexity with computational cost and approximation accuracy in ResNets versus fully connected counterparts could provide fertile ground for both academic discourse and applied machine learning enhancements.

To summarize, the paper presents a significant advancement in understanding the universal approximation capabilities of deep ResNets, contrasting with conventional wisdom regarding neural network architecture design. It sets a solid foundation for subsequent innovations in neural network design and application, particularly in deploying structurally efficient models capable of robust function approximation.

PDF Markdown