On the rates of convergence for learning with convolutional neural networks (2403.16459v2)
Abstract: We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. Our first result proves a new approximation bound for CNNs with certain constraint on the weights. Our second result gives new analysis on the covering number of feed-forward neural networks with CNNs as special cases. The analysis carefully takes into account the size of the weights and hence gives better bounds than the existing literature in some situations. Using these two results, we are able to derive rates of convergence for estimators based on CNNs in many learning problems. In particular, we establish minimax optimal convergence rates of the least squares based on CNNs for learning smooth functions in the nonparametric regression setting. For binary classification, we derive convergence rates for CNN classifiers with hinge loss and logistic loss. It is also shown that the obtained rates for classification are minimax optimal in some common settings.
- Fast learning rates for plug-in classifiers. The Annals of Statistics, 35(2):608–633, 2007.
- Francis Bach. Breaking the curse of dimensionality with convex neural networks. Journal of Machine Learning Research, 18(19):1–53, 2017.
- Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.
- Convergence rates of deep ReLU networks for multiclass classification. Electronic Journal of Statistics, 16(1), 2022.
- The Barron space and the flow-induced function spaces for neural network models. Constructive Approximation, 55(1):369–406, 2022.
- Theory of deep convolutional neural networks II: Spherical analysis. Neural Networks, 131:154–162, 2020.
- Generalization analysis of CNNs for classification on spheres. IEEE Transactions on Neural Networks and Learning Systems, 34(9):6200–6213, 2023.
- Deep Learning. MIT Press, 2016.
- Approximation bounds for norm constrained neural networks with applications to regression and GANs. Applied and Computational Harmonic Analysis, 65:249–278, 2023.
- Fast convergence rates of deep neural networks for classification. Neural Networks, 138:179–197, 2021.
- Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss. arXiv:2011.13602, 2020.
- On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021.
- ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 2012.
- Deep learning. Nature, 521(7553):436–444, 2015.
- Universal consistency of deep convolutional neural networks. IEEE Transactions on Information Theory, 68(7):4610–4617, 2022.
- Besov function approximation and binary classification on low-dimensional manifolds using convolutional residual networks. In Proceedings of the 38th International Conference on Machine Learning, pages 6770–6780. 2021.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
- Smooth discrimination analysis. The Annals of Statistics, 27(6):1808–1829, 1999.
- Theory of deep convolutional neural networks III: Approximating radial functions. Neural Networks, 144:778–790, 2021.
- Yurii Nesterov. Lectures on Convex Optimization, volume 137. Springer, 2018.
- Approximation and non-parametric estimation of ResNet-type convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 4922–4931. 2019.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
- Deep network approximation characterized by number of neurons. Communications in Computational Physics, 28(5):1768–1811, 2020.
- Jonathan W. Siegel. Optimal approximation of zonoids and uniform approximation by shallow neural networks. arXiv: 2307.15285, 2023.
- Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks. Foundations of Computational Mathematics, pages 1–57, 2022.
- Characterization of the variation spaces corresponding to shallow neural networks. Constructive Approximation, 57(3):1109–1132, 2023.
- Support Vector Machines. Springer Science & Business Media, 2008.
- Fast rates for support vector machines using Gaussian kernels. The Annals of Statistics, 35(2):575–607, 2007.
- Charles J. Stone. Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, 10(4):1040–1053, 1982.
- Alexander B. Tsybakov. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1):135–166, 2004.
- Nonparametric regression using over-parameterized shallow ReLU neural networks. arXiv: 2306.08321, 2023.
- Optimal rates of approximation by shallow ReLUk𝑘{}^{k}start_FLOATSUPERSCRIPT italic_k end_FLOATSUPERSCRIPT neural networks and applications to nonparametric regression. Constructive Approximation, 2024.
- Dmitry Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017.
- Dmitry Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. In Proceedings of the 31st Conference on Learning Theory, pages 639–649. 2018.
- Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–134, 2004.
- Classification with deep neural networks and logistic loss. arXiv: 2307.16792, 2023.
- Ding-Xuan Zhou. Theory of deep convolutional neural networks: Downsampling. Neural Networks, 124:319–327, 2020a.
- Ding-Xuan Zhou. Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis, 48(2):787–794, 2020b.
- Learning ability of interpolating deep convolutional neural networks. Applied and Computational Harmonic Analysis, 68:101582, 2024.