Tractability of approximation by general shallow networks (2308.03230v2)
Abstract: In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y}} G( x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.
- N. Aronszajn. Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337–404, 1950.
- A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. Information Theory, IEEE Transactions on, 39(3):930–945, 1993.
- Understanding neural networks with reproducing kernel banach spaces. arXiv preprint arXiv:2109.09710, 2021.
- Regularization and semi-supervised learning on large graphs. In Learning Theory: 17th Annual Conference on Learning Theory, COLT 2004, Banff, Canada, July 1-4, 2004. Proceedings 17, pages 624–638. Springer, 2004.
- M. Belkin and P. Niyogi. Semi-supervised learning on riemannian manifolds. Machine learning, 56:209–239, 2004.
- M. Belkin and P. Niyogi. Convergence of laplacian eigenmaps. Advances in neural information processing systems, 19, 2006.
- K. Böröczky and G. Wintsche. Covering the sphere by equal spherical balls. Discrete and Computational Geometry: The Goodman-Pollack Festschrift, pages 235–251, 2003.
- J. Bourgain and J. Lindenstrauss. Distribution of points on spheres and approximation by zonotopes. Israel Journal of Mathematics, 64(1):25–31, 1988.
- Optimal nonlinear approximation. Manuscripta mathematica, 63(4):469–478, 1989.
- Some remarks on greedy algorithms. Advances in computational Mathematics, 5(1):173–187, 1996.
- J. Dick and F. Pillichshammer. Digital nets and sequences: discrepancy theory and quasi–Monte Carlo integration. Cambridge University Press, 2010.
- D. Dũng and v. K. Nguyeb=n. Deep relu neural networks in high-dimensional approximation. Neural Networks, 142:619–635, 2021.
- Locally learning biomedical data using diffusion frames. Journal of Computational Biology, 19(11):1251–1264, 2012.
- Radial basis function approximation with distributively stored data on spheres. Constructive Approximation, pages 1–31, 2023.
- F. Filbir and H. N. Mhaskar. Marcinkiewicz–Zygmund measures on manifolds. Journal of Complexity, 27(6):568–596, 2011.
- Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 367–374, 2010.
- W. Hackbusch. Tensor spaces and numerical tensor calculus, volume 42. Springer Science & Business Media, 2012.
- Polyharmonic and related kernels on manifolds: interpolation and approximation. Foundations of Computational Mathematics, 12:625–670, 2012.
- Approximation by combinations of relu and squared relu ridge functions with ℓ1superscriptℓ1\ell^{1}roman_ℓ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and ℓ0superscriptℓ0\ell^{0}roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT controls. IEEE Transactions on Information Theory, 64(12):7649–7656, 2018.
- V. Kůrková and M. Sanguineti. Bounds on rates of variable basis and neural network approximation. IEEE Transactions on Information Theory, 47(6):2659–2665, 2001.
- V. Kůrková and M. Sanguineti. Comparison of worst case errors in linear and neural network approximation. IEEE Transactions on Information Theory, 48(1):264–275, 2002.
- Continuous and discrete least-squares approximation by radial basis functions on spheres. Journal of Approximation Theory, 143(1):124–133, 2006.
- Uniform approximation rates and metric entropy of shallow neural networks. Research in the Mathematical Sciences, 9(3):46, 2022.
- M. Maggioni and H. N. Mhaskar. Diffusion polynomial frames on metric measure spaces. Applied and Computational Harmonic Analysis, 24(3):329–353, 2008.
- Y. Makovoz. Uniform approximation by neural networks. Journal of Approximation Theory, 95(2):215–228, 1998.
- T. Mao and D.-X. Zhou. Approximation of functions from korobov spaces by deep convolutional neural networks. Advances in Computational Mathematics, 48(6):84, 2022.
- Data based construction of kernels for semi-supervised learning with less labels. Frontiers in Applied Mathematics and Statistics, 5:21, 2019.
- H. N. Mhaskar. On the tractability of multivariate integration and approximation by neural networks. Journal of Complexity, 20(4):561–590, 2004.
- H. N. Mhaskar. Weighted quadrature formulas and approximation by zonal function networks on the sphere. Journal of Complexity, 22(3):348–370, 2006.
- H. N. Mhaskar. Eignets for function approximation on manifolds. Applied and Computational Harmonic Analysis, 29(1):63–87, 2010.
- H. N. Mhaskar. A generalized diffusion frame for parsimonious representation of functions on data defined manifolds. Neural Networks, 24(4):345–359, 2011.
- H. N. Mhaskar. Dimension independent bounds for general shallow networks. Neural Networks, 123:142–152, 2020.
- H. N. Mhaskar. Kernel-based analysis of massive data. Frontiers in Applied Mathematics and Statistics, 6:30, 2020.
- H. N. Mhaskar. Local approximation of operators. Applied and Computational Harmonic Analysis, 2023.
- H. N. Mhaskar. Function approximation with zonal function networks with activation functions analogous to the rectified linear unit functions. Journal of Complexity, 51:1–19, April 2019.
- Dimension-independent bounds on the degree of approximation by neural networks. IBM Journal of Research and Development, 38(3):277–284, 1994.
- A deep learning approach to diabetic blood glucose prediction. Frontiers in Applied Mathematics and Statistics, 3:14, 2017.
- H. N. Mhaskar and T. Poggio. Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications, 14(06):829–848, 2016.
- H. Montanelli and Q. Du. New error bounds for deep relu networks using sparse grids. SIAM Journal on Mathematics of Data Science, 1(1):78–92, 2019.
- E. Novak and H. Woźniakowski. Tractability of Multivariate Problems: Standard information for functionals, volume 12. European Mathematical Society, 2008.
- G. Pisier. Remarques sur un résultat non publié de b. maurey. Séminaire d’Analyse fonctionnelle (dit” Maurey-Schwartz”), pages 1–12, 1981.
- D. Pollard. Convergence of stochastic processes. Springer Science & Business Media, 2012.
- R. Schaback. Native hilbert spaces for radial basis functions i. In New Developments in Approximation Theory: 2nd International Dortmund Meeting (IDoMAT)’98, Germany, February 23–27, 1998, pages 255–282. Springer, 1999.
- J. W. Siegel and J. Xu. Approximation rates for neural networks with general activation functions. Neural Networks, 128:313–321, 2020.
- J. W. Siegel and J. Xu. High-order approximation rates for shallow neural networks with cosine and reluk activation functions. Applied and Computational Harmonic Analysis, 58:1–26, 2022.
- J. W. Siegel and J. Xu. Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks. Foundations of Computational Mathematics, pages 1–57, 2022.
- G. Song and H. Zhang. Reproducing kernel banach spaces with the ℓ1superscriptℓ1\ell^{1}roman_ℓ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm ii: Error analysis for regularized least square regression. Neural computation, 23(10):2713–2729, 2011.
- Reproducing kernel banach spaces with the ℓ1superscriptℓ1\ell^{1}roman_ℓ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm. Applied and Computational Harmonic Analysis, 34(1):96–116, 2013.
- T. Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. arXiv preprint arXiv:1810.08033, 2018.
- V. N. Temlyakov. Approximation of functions with bounded mixed derivative. Trudy Mat. Inst. Steklov, 178(1):112, 1986.
- J. L. Verger-Gaugry. Covering a ball with smaller equal balls in ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Discrete &\&& Computational Geometry, 38: 143–155, 2005.
- J. Xu. The finite neuron method and convergence analysis. arXiv preprint arXiv:2010.01458, 2020.
- D. Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. arXiv preprint arXiv:1802.03620, 2018.
- Sampling for nyström extension-based spectral clustering: Incremental perspective and novel analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 11(1):1–25, 2016.