Unified Universality Theorem for Deep and Shallow Joint-Group-Equivariant Machines (2405.13682v3)
Abstract: We present a constructive universal approximation theorem for learning machines equipped with joint-group-equivariant feature maps, called the joint-equivariant machines, based on the group representation theory. "Constructive" here indicates that the distribution of parameters is given in a closed-form expression known as the ridgelet transform. Joint-group-equivariance encompasses a broad class of feature maps that generalize classical group-equivariance. Particularly, fully-connected networks are not group-equivariant but are joint-group-equivariant. Our main theorem also unifies the universal approximation theorems for both shallow and deep networks. Until this study, the universality of deep networks has been shown in a different manner from the universality of shallow networks, but our results discuss them on common ground. Now we can understand the approximation schemes of various learning machines in a unified manner. As applications, we show the constructive universal approximation properties of four examples: depth-$n$ joint-equivariant machine, depth-$n$ fully-connected network, depth-$n$ group-convolutional network, and a new depth-$2$ network with quadratic forms whose universality has not been known.
- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv preprint: 2104.13478, 2021.
- Y. Cai. Achieve the Minimum Width of Neural Networks for Universal Approximation. In The Eleventh International Conference on Learning Representations, 2023.
- E. J. Candès. Ridgelets: theory and applications. PhD thesis, Standford University, 1998.
- Neural Ordinary Differential Equations. In Advances in Neural Information Processing Systems, volume 31, pages 6572–6583, Palais des Congrès de Montréal, Montréal CANADA, 2018.
- L. Chizat and F. Bach. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. In Advances in Neural Information Processing Systems 32, pages 3036–3046, Montreal, BC, 2018.
- Optimal Stable Nonlinear Approximation. Foundations of Computational Mathematics, 22(3):607–648, 2022.
- On the Expressive Power of Deep Learning: A Tensor Analysis. In 29th Annual Conference on Learning Theory, volume 49, pages 1–31, 2016.
- Nonlinear Approximation and (Deep) ReLU Networks. Constructive Approximation, 55(1):127–172, 2022.
- W. E. A Proposal on Machine Learning via Dynamical Systems. Communications in Mathematics and Statistics, 5(1):1–11, 2017.
- G. B. Folland. A Course in Abstract Harmonic Analysis. Chapman and Hall/CRC, New York, second edition, 2015.
- Phase Transitions in Rate Distortion Theory and Deep Learning. Foundations of Computational Mathematics, 23(1):329–392, 2023.
- E. Haber and L. Ruthotto. Stable architectures for deep neural networks. Inverse Problems, 34(1):1–22, 2017.
- B. Hanin and M. Sellke. Approximating Continuous Functions by ReLU Nets of Minimal Width. arXiv preprint: 1710.11278, 2017.
- P. Kidger and T. Lyons. Universal Approximation with Deep Narrow Networks. In Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 2306–2327. PMLR, 2020.
- Minimum width for universal approximation using ReLU networks on compact domain. In The Twelfth International Conference on Learning Representations, 2024.
- Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 19460–19470, 2023.
- Q. Li and S. Hao. An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. In Proceedings of The 35th International Conference on Machine Learning, volume 80, pages 2985–2994, Stockholm, 2018. PMLR.
- H. Lin and S. Jegelka. ResNet with one-neuron hidden layers is a Universal Approximator. In Advances in Neural Information Processing Systems, volume 31, Montreal, BC, 2018.
- The Expressive Power of Neural Networks: A View from the Width. In Advances in Neural Information Processing Systems, volume 30, 2017.
- A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665–E7671, 2018.
- N. Murata. An integral representation of functions using three-layered networks and their approximation bounds. Neural Networks, 9(6):947–956, 1996.
- A. Nitanda and T. Suzuki. Stochastic Particle Gradient Descent for Infinite Ensembles. arXiv preprint: 1712.05438, 2017.
- Minimum Width for Universal Approximation. In International Conference on Learning Representations, 2021.
- G. Petrova and P. Wojtaszczyk. Limitations on approximation by deep and shallow neural networks. Journal of Machine Learning Research, 24(353):1–38, 2023.
- G. Rotskoff and E. Vanden-Eijnden. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks. In Advances in Neural Information Processing Systems 31, pages 7146–7155, Montreal, BC, 2018.
- J. W. Siegel. Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces. Journal of Machine Learning Research, 24(357):1–52, 2023.
- S. Sonoda and N. Murata. Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks. In NIPS 2017 Workshop on Optimal Transport & Machine Learning (OTML), pages 1–10, Long Beach, 2017.
- Ridge Regression with Over-Parametrized Two-Layer Networks Converge to Ridgelet Spectrum. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, volume 130, pages 2674–2682. PMLR, 2021a.
- Ghosts in Neural Networks: Existence, Structure and Role of Infinite-Dimensional Null Space. arXiv preprint: 2106.04770, 2021b.
- Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups. In Advances in Neural Information Processing Systems 35, pages 38680–38694, New Orleans, Louisiana, USA, 2022a.
- Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason-Fourier Analysis. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 20405–20422, Baltimore, Maryland, USA, 2022b.
- Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. In Proceedings of the 2nd NeurIPS Workshop on Symmetry and Geometry in Neural Representations, Proceedings of Machine Learning Research. PMLR, 2023a.
- Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. In Proceedings of the 2nd NeurIPS Workshop on Symmetry and Geometry in Neural Representations, Proceedings of Machine Learning Research. PMLR, 2023b.
- A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks. Journal of Statistical Planning and Inference, 233:106184, 2024.
- T. Suzuki. Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics. In Advances in Neural Information Processing Systems 33, pages 19224–19237, 2020.
- M. Telgarsky. Benefits of depth in neural networks. In 29th Annual Conference on Learning Theory, pages 1–23, 2016.
- D. Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017.
- D. Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 639–649. PMLR, 2018.
- D. Yarotsky and A. Zhevnerchuk. The phase diagram of approximation rates for deep neural networks. In Advances in Neural Information Processing Systems, volume 33, pages 13005–13015, 2020.