Optimal Approximation with Sparsely Connected Deep Neural Networks (1705.01714v4)

Published 4 May 2017 in cs.LG, cs.IT, math.FA, and math.IT

Abstract: We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems---so-called \emph{affine systems}---can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, $\alpha$-shearlets, and more generally $\alpha$-molecules. Our central result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of $\alpha^{{-1}$-cartoon-like} functions, which is approximated optimally by $\alpha$-shearlets. We also explain how our results can be extended to the case of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks providing close-to-optimal approximation rates. Moreover, these results indicate that stochastic gradient descent can actually learn approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on.

Citations (243)

View on Semantic Scholar

Summary

The paper derives fundamental lower bounds on network connectivity and memory needed for uniform function approximation in L²(ℝᵈ).
It shows that deep neural networks can achieve optimal approximation rates for affine system-based function classes, including shearlets and ridgelets.
Validated by numerical experiments and SGD training, the findings guide efficient DNN architecture design for resource-constrained applications.

Optimal Approximation with Sparsely Connected Deep Neural Networks

This paper addresses the challenge of understanding the complexity inherent in approximating functions using deep neural networks (DNNs), particularly in terms of network connectivity and memory requirements. The primary question it explores is how to ascertain the minimal connectivity necessary for DNNs to approximate any function within a given class, uniformly, and with minimal error.

One key contribution of this paper is the derivation of fundamental lower bounds on both connectivity and memory needs of DNNs to guarantee uniform approximation rates in $L^2(\mathbb{R}^d)$ . This establishes a direct relationship between the complexity of the approximating function class and that of the neural network structure required. The authors further demonstrate that for a broad category of function classes—those optimally approximated by affine systems—these lower bounds on network connectivity and memory are indeed achievable using DNNs. Affine systems incorporate a range of representation frameworks such as wavelets, ridgelets, shearlets, and more generally, $\alpha$ -molecules.

The paper also elucidates a striking universality property of neural networks: even with minimal connectivity, they can achieve optimal approximation properties equivalent to all affine systems combined. As a relevant instance, they discuss $\alpha^{-1}$ -cartoon-like functions, which are shown to be optimally approximated by $\alpha$ -shearlets, illuminating the practical utility of the theoretical framework developed.

Furthermore, the authors extend their results to show how these findings are applicable to functions defined on lower-dimensional manifolds, a scenario often encountered in practical applications. Numerical experiments presented demonstrate that networks trained with stochastic gradient descent (SGD) algorithms align well with these theoretical bounds, achieving close-to-optimal approximation rates while learning sparse representations in the context of the training data.

The paper's implications are significant both theoretically and practically. Theoretically, it provides a robust framework to assess neural network performance in function approximation, setting a benchmark for the minimal network specifications required to achieve certain approximation rates. Practically, the insights can guide the development of more efficient neural network architectures, optimizing both training and operational costs across various applications including signal processing and computer vision.

Speculatively, these results point to a future where DNN design can more effectively leverage sparse architectures without sacrificing performance, potentially leading to efficient computation in resource-constrained environments. Moreover, understanding these constraints may lead to novel approaches for network architecture design where the innate properties of the function space are exploited to enhance learning efficiency and capacity. As AI continues to evolve, such foundational insights will be critical in advancing the robustness and applicability of neural networks in increasingly complex and diverse contexts.

PDF Markdown

Related Papers

Tweets

https://twitter.com/chaumian/status/1939442988863688980