Deep Network Approximation for Smooth Functions (2001.03040v8)

Published 9 Jan 2020 in cs.LG, cs.NA, math.NA, and stat.ML

Abstract: This paper establishes the (nearly) optimal approximation error characterization of deep rectified linear unit (ReLU) networks for smooth functions in terms of both width and depth simultaneously. To that end, we first prove that multivariate polynomials can be approximated by deep ReLU networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and their deep ReLU network approximations, we show that deep ReLU networks of width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate $f\in C^{s([0,1]^d)$} with a nearly optimal approximation error $\mathcal{O}(|f|_{C^{s([0,1]^{d)}N^{{-2s/d}L^{-2s/d})$.}}} Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.

Citations (223)

View on Semantic Scholar

Summary

The paper presents nearly optimal error bounds for approximating smooth functions using ReLU networks with controlled width and depth.
It leverages local Taylor expansions and piecewise linear properties to derive an error rate of O(N^{-2s/d}L^{-2s/d}) for functions in C^s([0,1]^d).
The findings guide the design of practical deep learning architectures by balancing computational cost with approximation accuracy.

Deep Network Approximation for Smooth Functions

The paper "Deep Network Approximation for Smooth Functions" by Jianfeng Lu, Zuowei Shen, Haizhao Yang, and Shijun Zhang provides a rigorous analysis of the approximation capabilities of ReLU-based deep neural networks for smooth functions. The work offers a quantitative bound on the approximation error in terms of both network width and depth, contributing significant theoretical insights into the expressive power of neural networks for function approximation.

Key Contributions

The paper advances previous research on the capacity of feed-forward neural networks by presenting nearly optimal approximation error bounds for smooth functions. The authors focus on functions within $C^s([0,1]^d)$ , the space of $s$ -times continuously differentiable functions over the $d$ -dimensional unit cube, and derive an approximation bound that is valid for arbitrary width $N$ and depth $L$ . They show that ReLU neural networks with width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can achieve an approximation error of $\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$ . This result holds non-asymptotically, providing practical relevance for finite $N$ and $L$ .

Theoretical Insights

The paper provides a substantial mathematical framework for understanding the interplay between network width, depth, and the smoothness of target functions, emphasizing the critical role of function composition and polynomial approximation. The authors utilize local Taylor expansions and leverage ReLU's ability to represent piecewise linear functions, constructing networks that approximate multivariate polynomials with an error rate of $\mathcal{O}(N^{-L})$ . This robust lower bound is nearly attained as demonstrated through VC-dimension analysis, reinforcing the theoretical soundness and precision of their results.

Practical Implications

The results bridge a gap in understanding how deep neural networks can effectively approximate functions typically encountered in scientific computing and data analysis, where smoothness is a reasonable assumption. The findings are poised to impact the design of deep learning architectures by guiding the selection of appropriate network width and depth for desired approximation accuracies without incurring prohibitive computational costs.

Future Directions

Further exploration is suggested in extending these results to neural networks with alternative architectures and activation functions, such as convolutional networks and sigmoidal activations, to confirm whether similar bounds hold in such contexts. Moreover, understanding the implications of these theoretical findings in practical settings, such as natural language processing and image recognition, where high-dimensional smooth functions prevail, could pave the way for more efficient model designs and training processes.

In conclusion, this paper contributes a nuanced understanding of the approximation power of deep networks for smooth functions by establishing bounds that are both practically applicable and theoretically elegant. These insights lay the groundwork for future research in neural network design and analysis, particularly in extending the scope of these results to other architectures and application domains.