Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality (1810.08033v1)

Published 18 Oct 2018 in stat.ML and cs.LG

Abstract: Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing, which indicates superior flexibility and adaptivity of deep learning. To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness. The Besov space is a considerably general function space including the Holder space and Sobolev space, and especially can capture spatial inhomogeneity of smoothness. Through the analysis in the Besov space, it is shown that deep learning can achieve the minimax optimal rate and outperform any non-adaptive (linear) estimator such as kernel ridge regression, which shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones. In addition to this, it is shown that deep learning can avoid the curse of dimensionality if the target function is in a mixed smooth Besov space. We also show that the dependency of the convergence rate on the dimensionality is tight due to its minimax optimality. These results support high adaptivity of deep learning and its superior ability as a feature extractor.

Authors (1)

Taiji Suzuki (119 papers)

Citations (228)

View on Semantic Scholar

Summary

The paper demonstrates that deep ReLU networks achieve minimax optimal approximation rates in Besov spaces, outperforming traditional non-adaptive linear methods.
It reveals that leveraging mixed smooth Besov spaces enables deep networks to mitigate the curse of dimensionality with improved convergence rates.
It highlights the adaptive capacity of deep networks to capture spatially inhomogeneous smoothness, ensuring superior estimation and estimation accuracy.

Adaptivity of Deep ReLU Networks in Besov Spaces and the Curse of Dimensionality

This paper presents a comprehensive analysis of the approximation and estimation capabilities of deep ReLU networks within the context of Besov and mixed smooth Besov spaces, focusing on the adaptivity of these networks to varying degrees of smoothness in the target functions. The paper thoroughly demonstrates that deep learning can achieve minimax optimal rates of convergence and surpass traditional non-adaptive linear methods such as kernel ridge regression, particularly in handling functions with spatially inhomogeneous smoothness. This superiority is primarily attributed to the adaptive nature of deep networks.

Key Results and Claims

Approximation in Besov Spaces: The paper reveals that functions within a Besov space can be efficiently approximated by deep ReLU networks, achieving an approximation error rate of $O(N^{-s/d})$ , where $s$ denotes the smoothness and $d$ is the dimensionality. This rate is unattainable by non-adaptive linear methods, which are constrained by the linear width of the Besov space. This advantage illustrates the superior adaptivity of neural networks in selecting features that capture the essential characteristics of the target function.
Avoiding the Curse of Dimensionality: By extending the analysis to mixed smooth Besov spaces, the paper highlights that deep learning can mitigate the curse of dimensionality. Specifically, deep networks can achieve a convergence rate of $O(N^{-s} \log^{c(d)}(N))$ , where $c(d)$ is a function that grows linearly with dimensionality, thus easing the exponential dependency typically faced in high-dimensional spaces.
Estimation Accuracy and Minimax Rates: The estimation error for neural networks is shown to be $O(n^{-\frac{2s}{2s + d}})$ in Besov spaces, achieving the minimax optimal rate. However, for mixed smooth Besov spaces, deep networks further demonstrate potential by reaching rates of $O(n^{-\frac{2s}{2s + 1}} \log(n)^c)$ , offering a substantial reduction in dimensional dependence.
Adaptivity to Inhomogeneous Smoothness: A critical aspect emphasized is the network's adaptivity in terms of its ability to capture local variations in smoothness, which is pivotal when estimating functions with non-uniform smooth characteristics. Linear estimators fail in this regard due to their inability to adjust to spatial irregularities in the function landscape.

Implications and Future Directions

The theoretical findings elucidate the exceptional potential of deep learning models to adaptively handle complex function spaces, effectively positioning neural networks as powerful tools in scenarios characterized by high-dimensionality and spatially variable smoothness. This ability to circumvent the curse of dimensionality and perform optimally in Besov spaces paves the way for further exploration into function-specific approximations and the development of innovative network architectures tailored for such tasks.

Looking forward, an interesting area of research would focus on devising efficient optimization algorithms that harness these theoretical insights to achieve the demonstrated optimal rates in practical scenarios. Moreover, extending the analysis to include other activation functions and exploring their respective impacts on adaptivity and convergence could offer broader implications across various machine learning applications.

Conclusion

In summary, the paper substantiates the adaptivity and efficiency of deep ReLU networks within Besov and mixed smooth Besov settings, showcasing their capability to achieve optimal approximation and estimation rates. The findings serve as a theoretical foundation supporting the widespread empirical success of deep learning, also hinting at future enhancements in neural network design and application.

PDF Markdown