Optimal approximation of piecewise smooth functions using deep ReLU neural networks (1709.05289v4)

Published 15 Sep 2017 in math.FA, cs.LG, and stat.ML

Abstract: We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in $L^2$. As a model class, we consider the set $\mathcal{E}^\beta (\mathbb R^d)$ of possibly discontinuous piecewise $C^\beta$ functions $f : [-1/2, 1/2]^d \to \mathbb R$, where the different smooth regions of $f$ are separated by $C^\beta$ hypersurfaces. For dimension $d \geq 2$, regularity $\beta > 0$, and accuracy $\varepsilon > 0$, we construct artificial neural networks with ReLU activation function that approximate functions from $\mathcal{E}^{\beta(\mathbb} R^d)$ up to $L^2$ error of $\varepsilon$. The constructed networks have a fixed number of layers, depending only on $d$ and $\beta$, and they have $O(\varepsilon^{{-2(d-1)/\beta})$} many nonzero weights, which we prove to be optimal. In addition to the optimality in terms of the number of weights, we show that in order to achieve the optimal approximation rate, one needs ReLU networks of a certain depth. Precisely, for piecewise $C^{\beta(\mathbb} R^d)$ functions, this minimal depth is given---up to a multiplicative constant---by $\beta/d$. Up to a log factor, our constructed networks match this bound. This partly explains the benefits of depth for ReLU networks by showing that deep networks are necessary to achieve efficient approximation of (piecewise) smooth functions. Finally, we analyze approximation in high-dimensional spaces where the function $f$ to be approximated can be factorized into a smooth dimension reducing feature map $\tau$ and classifier function $g$---defined on a low-dimensional feature space---as $f = g \circ \tau$. We show that in this case the approximation rate depends only on the dimension of the feature space and not the input dimension.

Citations (447)

View on Semantic Scholar

Summary

The paper establishes that deep ReLU networks can achieve optimal L²-approximation of piecewise smooth functions with O(ε⁻²(d−1)/β) nonzero weights.
It demonstrates that the necessary network depth depends solely on the domain dimension and the smoothness parameter, enabling efficient architecture design.
The research provides a theoretical basis for overcoming high-dimensional challenges by using smooth feature mappings that reduce approximation complexity.

Optimal Approximation of Piecewise Smooth Functions Using Deep ReLU Neural Networks

The paper under review, "Optimal approximation of piecewise smooth functions using deep ReLU neural networks" by Philipp Petersen and Felix Voigtlaender, addresses the complexity requirements for ReLU neural networks in approximating piecewise smooth functions in a quantitative manner. The research investigates the interplay between the architecture of neural networks—including their depth, width, and number of parameters—and their capabilities to effectively represent functions of varying smoothness within an $L^2$ -sensibility framework.

Key Contributions

Function Class and Approximation: The authors examine functions belonging to the class $\mathcal{E}^\beta (R^d)$ . These functions are defined over a $d$ -dimensional domain and have smooth segments divided by hypersurfaces with $C^\beta$ regularity. The paper establishes that using ReLU neural networks, one can achieve an $L^2$ -approximation of these functions to within an error of $\varepsilon$ .
Network Complexity: A significant result indicates that the networks which achieve this approximation have their depth and number of layers solely determined by the dimension $d$ and the regularity parameter $\beta$ . The networks incorporate $\mathcal{O}(\varepsilon^{-2(d-1)/\beta})$ nonzero weights, showcasing the optimal number for these approximations. This offers insights into designing neural networks that balance complexity with performance.
Depth Analysis: The paper explores the necessity of deep networks for efficient function approximation, illustrating that a minimum depth proportional to $\beta/d$ is essential to reach optimal approximation rates. This analysis provides a theoretical backing to observed empirical benefits of deep networks, attributing their efficiency to inherent requirements of the target function class.
Feature Mapping and High-dimensionality: The paper also considers complexities arising in high-dimensional function spaces and provides a path to avoid the curse of dimensionality by considering smooth feature mappings. When a target function is expressible in the composition $f = g \circ \tau$ , where $g$ is defined on a lower-dimensional space, the approximation complexity depends only on the dimensionality of this smaller space.

Implications

Theoretical Insights: The establishment of optimal rates establishes boundaries on the expected trade-offs between complexity and performance in neural network architecture design. These results can influence how networks are constructed in practice to avoid unnecessary complexities.
Practical Applications: Understanding the minimal required depth assists practitioners in deploying neural networks that are computationally effective without overfitting. Specifically, this can be advantageous in scenarios which demand efficient utilization of parameters.
Future Directions: The assumptions and frameworks around smooth feature mappings open avenues for further research into structured function classes that can improve generalization across high-dimensional data spaces. This could include exploring additional kernel methods or leveraging unsupervised learning to uncover underlying data structure.

Concluding Remarks

Through rigorous theoretical exploration, Petersen and Voigtlaender provide substantial contributions towards understanding the requisite complexity of neural networks meant for approximating piecewise smooth functions. By drawing connections between mathematical regularity and neural network design, this research underscores foundational principles that inform both present methodologies and future explorations in neural approximation theory.

PDF Markdown