Neural reproducing kernel Banach spaces and representer theorems for deep networks (2403.08750v1)
Abstract: Studying the function spaces defined by neural networks helps to understand the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are reproducing kernel Hilbert spaces, these regimes do not capture the properties of the networks used in practice. In contrast, in this paper we show that deep neural networks define suitable reproducing kernel Banach spaces. These spaces are equipped with norms that enforce a form of sparsity, enabling them to adapt to potential latent structures within the input data and their representations. In particular, leveraging the theory of reproducing kernel Banach spaces, combined with variational results, we derive representer theorems that justify the finite architectures commonly employed in applications. Our study extends analogous results for shallow networks and can be seen as a step towards considering more practically plausible neural architectures.
- N. Aronszajn “Theory of Reproducing Kernels” In Transactions of the American Mathematical Society 68.3, 1950, pp. 337–404
- Francis Bach “Breaking the Curse of Dimensionality with Convex Neural Networks” In Journal of Machine Learning Research 18.19, 2017, pp. 1–53
- “Understanding neural networks with reproducing kernel Banach spaces” In Applied and Computational Harmonic Analysis 62, 2023, pp. 194–236
- “Deep equals shallow for ReLu networks in kernel regimes” In International Conference on Learning Representations (ICLR) 9, 2021
- “On the Inductive Bias of Neural Tangent Kernels” In Advances in Neural Information Processing Systems (NeurIPS) 32, 2019
- V.I. Bogachev “Measure theory” Springer-Verlag, 2007
- “Sparsity of solutions for variational inverse problems with finite-dimensional data” In Calculus of Variations and Partial Differential Equations 59.14, 2020
- “Vector valued reproducing kernel Hilbert spaces and universality” In Analysis and Applications 8.01, 2010, pp. 19–61
- L. Chizat, E. Oyallon and F. Bach “On lazy training in differentiable programming” In Advances in Neural Information Processing Systems (NeurIPS) 32, 2019
- “Vector Measures” American Mathematical Society, 1977
- “When Do Neural Networks Outperform Kernel Methods?” In Advances in Neural Information Processing Systems (NeurIPS) 33, 2020, pp. 14820–14830
- Boris Hanin “Random Neural Networks in the Infinite Width Limit as Gaussian Processes” In Annals of Applied Probability to appear, 2023
- Arthur Jacot, Clément Hongler and Franck Gabriel “Neural Tangent Kernel: Convergence and Generalization in Neural Networks” In Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 8580–8589
- Rong Rong Lin, Hai Zhang Zhang and Jun Zhang “On Reproducing Kernel Banach Spaces: Generic Definitions and Unified Framework of Constructions” In Acta Mathematica Sinica 38.8, 2022, pp. 1459–1483
- Radford M Neal “Bayesian Learning for Neural Networks” Springer, 2012
- “A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case” In Eighth International Conference on Learning Representations (ICLR), 2020
- Rahul Parhi and Robert D. Nowak “Banach Space Representer Theorems for Neural Networks and Ridge Splines” In Journal of Machine Learning Research 22.43, 2021, pp. 1–40
- Rahul Parhi and Robert D. Nowak “What Kinds of Functions Do Deep Neural Networks Learn? Insights from Variational Spline Theory” In SIAM Journal on Mathematics of Data Science 4.2, 2022, pp. 464–489
- “Random Features for Large-Scale Kernel Machines” In Advances in Neural Information Processing Systems (NeurIPS) 20, 2007
- Walter Rudin “Functional Analysis”, International Series in Pure and Applied Mathematics McGraw-Hill, New York, 1991
- Robert Ryan “The F. and M. Riesz theorem for vector measures” In Indag. Math. 25, 1963, pp. 408–412
- “How do infinite width bounded norm networks look in function space?” In Conference on Learning Theory, 2019, pp. 2667–2690 PMLR
- I Singer “Linear functionals on the space of continuous mappings of a compact Hausdorff space into a Banach spaces” In Rev. Math. Pures Appl. 2, 1957, pp. 301–315
- Michael Unser “A Unifying Representer Theorem for Inverse Problems and Machine Learning” In Foundations of Computational Mathematics Springer, 2020, pp. 1–20
- Michael Unser “From kernel methods to neural networks: A unifying variational formulation” In Foundations of Computational Mathematics Springer, 2023, pp. 1–40
- Michael Unser “Ridges, Neural Networks, and the Radon Transform” In Journal of Machine Learning Research 24.37, 2023, pp. 1–33
- “Native Banach spaces for splines and variational inverse problems” In arXiv:1904.10818, 2019
- Michael Unser, Julien Fageot and John Paul Ward “Splines are universal solutions of linear inverse problems with generalized TV regularization” In SIAM Review 59.4 SIAM, 2017, pp. 769–793
- Dirk Werner “Extreme points in spaces of operators and vector–valued measures” In Proceedings of the 12th Winter School on Abstract Analysis Circolo Matematico di Palermo, 1984, pp. 135–143
- Haizhang Zhang, Yuesheng Xu and Jun Zhang “Reproducing Kernel Banach Spaces for Machine Learning” In Journal of Machine Learning Research 10.95, 2009, pp. 2741–2775