Neural Network with Unbounded Activation Functions is Universal Approximator (1505.03654v2)

Published 14 May 2015 in cs.NE, cs.LG, and math.FA

Abstract: This paper presents an investigation of the approximation property of neural networks with unbounded activation functions, such as the rectified linear unit (ReLU), which is the new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions. By showing three reconstruction formulas by using the Fourier slice theorem, the Radon transform, and Parseval's relation, it is shown that a neural network with unbounded activation functions still satisfies the universal approximation property. As an additional consequence, the ridgelet transform, or the backprojection filter in the Radon domain, is what the network learns after backpropagation. Subject to a constructive admissibility condition, the trained network can be obtained by simply discretizing the ridgelet transform, without backpropagation. Numerical examples not only support the consistency of the admissibility condition but also imply that some non-admissible cases result in low-pass filtering.

Citations (322)

View on Semantic Scholar

Summary

The paper establishes that neural networks with unbounded activation functions such as ReLU satisfy the universal approximation property using ridgelet transforms.
It employs rigorous harmonic analysis and distribution mapping to demonstrate function reconstruction in L¹ and L² spaces.
The findings offer actionable insights for optimizing neural network architectures and guiding the design of future activation functions.

Neural Network with Unbounded Activation Functions as Universal Approximators

The paper by Sho Sonoda and Noboru Murata presents a thorough investigation into the approximation capabilities of neural networks utilizing unbounded activation functions. The focus is on activation functions such as the Rectified Linear Unit (ReLU), which have become standard in deep learning models. The primary assertion is that neural networks equipped with such activation functions meet the universal approximation property criteria.

Theoretical Background and Novel Approach

The core contribution is the analysis of ReLU networks through the lens of the ridgelet transform, employing Lebesgue distributions as a foundational mathematical tool. The authors have employed harmonic analysis and have demonstrated novel reconstruction formulas leveraging the Fourier slice theorem and the Radon transform. By doing so, they establish that neural networks with unbounded activation functions do not only approximate continuous functions arbitrarily well but also encompass a universal approximation property, aligning them with traditional neural network theories that utilized bounded functions.

Methodology and Results

The methodology involves a constructive and bilinear mapping of the neural network onto distribution spaces, using an extension of ridgelet and dual ridgelet transforms with respect to distributions. The paper makes extensive use of mathematical rigor to show that under admissibility conditions - essentially conditions where combinations of functions have certain regularity and moment properties - the neural networks can replicate functions in functional spaces such as $L^1(\mathbb{R}^m)$ and $L^2(\mathbb{R}^m)$ .

The authors present a comprehensive table of activation functions classified into bounded and unbounded categories and demonstrate their respective membership in function classes and ability to serve as universal approximators. Strong numerical results support the theoretical framework, particularly illustrating cases where non-admissible cases result in low-pass filtering effects, hence offering insight into the selection of activation functions in practice.

Implications and Future Developments

Practically, the research provides a framework for understanding how neural networks learn, as they essentially map input data via a ridgelet transform of the target function. This insight could influence approaches to neural network design and training, as it provides a pathway to potentially bypass traditional backpropagation by discretizing the ridgelet transform.

Theoretically, the paper aligns unbounded activation neural networks with a long line of approximation theories, broadening the understanding of neural network functionality in the context of high-dimensional space and complex data patterns.

Looking forward, the paper opens avenues into investigating the depth-wise behavior of neural network architectures via deep ridgelet or wavelet analyses. Additionally, exploring the interplay between ridgelet functions and multi-layer networks could yield insights into the efficacy of deep learning models beyond shallow networks, grounding empirical observations in theoretical underpinnings and potentially unlocking new optimization strategies derived from this analysis.

The implications for AI could be profound, with speculation that such insights might lead to more efficient network architectures capable of learning complex patterns with reduced data or computational requirements, aligning with the ongoing efforts to push the boundaries of artificial intelligence capabilities. The construct of admissible pairs in activation functions sets foundational knowledge that could guide the development of future activation functions tailored to specific application needs.

Conclusion

This work is methodologically significant and equips neural network frameworks with a robust theoretical backing for the utilization of unbounded activation functions, akin to the proverbial optimizer in deep learning, presenting both practical and theoretical advancements in understanding neural network architectures for function approximation. The paper provides a pivotal step in bridging classical approximation theory with modern deep learning paradigms.

PDF Markdown