A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks (2402.15984v2)
Abstract: To investigate neural network parameters, it is easier to study the distribution of parameters than to study the parameters in each neuron. The ridgelet transform is a pseudo-inverse operator that maps a given function $f$ to the parameter distribution $\gamma$ so that a network $\mathtt{NN}[\gamma]$ reproduces $f$, i.e. $\mathtt{NN}[\gamma]=f$. For depth-2 fully-connected networks on a Euclidean space, the ridgelet transform has been discovered up to the closed-form expression, thus we could describe how the parameters are distributed. However, for a variety of modern neural network architectures, the closed-form expression has not been known. In this paper, we explain a systematic method using Fourier expressions to derive ridgelet transforms for a variety of modern networks such as networks on finite fields $\mathbb{F}_p$, group convolutional networks on abstract Hilbert space $\mathcal{H}$, fully-connected networks on noncompact symmetric spaces $G/K$, and pooling layers, or the $d$-plane ridgelet transform.
- A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, 1993.
- Unitarization of the Horocyclic Radon Transform on Symmetric Spaces. In Harmonic and Applied Analysis: From Radon Transforms to Machine Learning, pages 1–54. Springer International Publishing, Cham, 2021.
- Convex neural networks. In Advances in Neural Information Processing Systems 18, pages 123–130, 2006.
- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv preprint: 2104.13478, 2021.
- J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1872–1886, 2013.
- E. J. Candès. Ridgelets: theory and applications. PhD thesis, Standford University, 1998.
- Construction of neural nets using the Radon transform. In International Joint Conference on Neural Networks 1989, volume 1, pages 607–611. IEEE, 1989.
- L. Chizat and F. Bach. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. In Advances in Neural Information Processing Systems 32, pages 3036–3046, 2018.
- T. Cohen and M. Welling. Group Equivariant Convolutional Networks. In Proceedings of The 33rd International Conference on Machine Learning, volume 48, pages 2990–2999, 2016.
- A General Theory of Equivariant CNNs on Homogeneous Spaces. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Constructive Approximation. Springer-Verlag Berlin Heidelberg, 1993.
- J. A. Díaz-García and G. González-Farías. Singular random matrix decompositions: Jacobians. Journal of Multivariate Analysis, 93(2):296–312, 2005.
- D. L. Donoho. Ridge functions and orthonormal ridgelets. Journal of Approximation Theory, 111(2):143–179, 2001.
- D. L. Donoho. Emerging applications of geometric multiscale analysis. Proceedings of the ICM, Beijing 2002, I:209–233, 2002.
- K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4):193–202, 1980.
- K.-I. Funahashi. On the approximate realization of continuous mappings by neural networks. Neural Networks, 2(3):183–192, 1989.
- Hyperbolic Neural Networks. In Advances in Neural Information Processing Systems 31, 2018.
- L. Grafakos. Classical Fourier Analysis. Graduate Texts in Mathematics. Springer New York, second edition, 2008.
- Hyperbolic Attention Networks. In International Conference on Learning Representations, 2019.
- S. Helgason. Groups and Geometric Analysis: Integral Geometry, Invariant Differential Operators, and Spherical Functions, volume 83 of Mathematical Surveys and Monographs. American Mathematical Society, 1984.
- S. Helgason. Geometric Analysis on Symmetric Spaces: Second Edition, volume 39 of Mathematical Surveys and Monographs. American Mathematical Society, second edition, 2008.
- S. Helgason. Integral Geometry and Radon Transforms. Springer-Verlag New York, 2010.
- B. Irie and S. Miyake. Capabilities of three-layered perceptrons. In IEEE 1988 International Conference on Neural Networks, pages 641–648. IEEE, 1988.
- Y. Ito. Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Networks, 4(3):385–394, 1991.
- Approximating multivariable functions by feedforward neural nets. In Handbook on Neural Information Processing, volume 49 of Intelligent Systems Reference Library, pages 143–181. Springer Berlin Heidelberg, 2013.
- Anosov subgroups: dynamical and geometric characterizations. European Journal of Mathematics, 3(4):808–898, 2017.
- R. Kondor and S. Trivedi. On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2747–2755, 2018.
- The ridgelet transform of distributions. Integral Transforms and Special Functions, 25(5):344–358, 2014.
- Hyperbolic geometry of complex networks. Phys. Rev. E, 82(3):36106, 2010.
- ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, pages 1097–1105, 2012.
- W. Kumagai and A. Sannai. Universal Approximation Theorem for Equivariant Maps by Group CNNs. arXiv preprint: 2012.13882, 2020.
- Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86:2278–2324, 1998.
- On the Universality of Invariant Networks. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 4363–4371, 2019.
- A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665–E7671, 2018.
- N. Murata. An integral representation of functions using three-layered networks and their approximation bounds. Neural Networks, 9(6):947–956, 1996.
- Norm-Based Capacity Control in Neural Networks. In Proceedings of The 28th Conference on Learning Theory, volume 40, pages 1–26, 2015.
- M. Nickel and D. Kiela. Poincaré Embeddings for Learning Hierarchical Representations. In Advances in Neural Information Processing Systems, volume 30, 2017.
- M. Nickel and D. Kiela. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 3779–3788, 2018.
- A. Nitanda and T. Suzuki. Stochastic Particle Gradient Descent for Infinite Ensembles. arXiv preprint: 1712.05438, 2017.
- Convex Analysis of the Mean Field Langevin Dynamics. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151, pages 9741–9757, 2022.
- A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case. In International Conference on Learning Representations, 2020.
- R. Parhi and R. D. Nowak. Banach Space Representer Theorems for Neural Networks and Ridge Splines. Journal of Machine Learning Research, 22(43):1–40, 2021.
- Efficient Learning of Sparse Representations with an Energy-Based Model. In Advances In Neural Information Processing Systems 19, pages 1137–1144, 2007.
- G. Rotskoff and E. Vanden-Eijnden. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks. In Advances in Neural Information Processing Systems 31, pages 7146–7155, 2018.
- B. Rubin. Convolution–backprojection method for the k𝑘kitalic_k-plane transform, and Calderón’s identity for ridgelet transforms. Applied and Computational Harmonic Analysis, 16(3):231–242, 2004.
- B. Rubin. A note on the Blaschke-Petkantschin formula, Riesz distributions, and Drury’s identity. Fractional Calculus and Applied Analysis, 21(6):1641–1650, 2018.
- Representation Tradeoffs for Hyperbolic Embeddings. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 4460–4469, 2018.
- How do infinite width bounded norm networks look in function space? In Proceedings of the 32nd Conference on Learning Theory, volume 99, pages 2667–2690, 2019.
- Hyperbolic Neural Networks++. In International Conference on Learning Representations, 2021.
- J. Sirignano and K. Spiliopoulos. Mean Field Analysis of Neural Networks: A Law of Large Numbers. SIAM Journal on Applied Mathematics, 80(2):725–752, 2020.
- S. Sonoda and N. Murata. Neural network with unbounded activation functions is universal approximator. Applied and Computational Harmonic Analysis, 43(2):233–268, 2017.
- Ridge Regression with Over-Parametrized Two-Layer Networks Converge to Ridgelet Spectrum. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics 2021, volume 130, pages 2674–2682, 2021a.
- Ghosts in Neural Networks: Existence, Structure and Role of Infinite-Dimensional Null Space. arXiv preprint: 2106.04770, 2021b.
- Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups. In Advances in Neural Information Processing Systems 35, pages 38680–38694, 2022a.
- Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason-Fourier Analysis. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 20405–20422, 2022b.
- Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. In Proceedings of the 2nd NeurIPS Workshop on Symmetry and Geometry in Neural Representations, 2023a.
- Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. In Proceedings of the 2nd NeurIPS Workshop on Symmetry and Geometry in Neural Representations, 2023b.
- The ridgelet and curvelet transforms. In Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity, pages 89–118. Cambridge University Press, 2010.
- T. Suzuki. Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics. In Advances in Neural Information Processing Systems 33, pages 19224–19237, 2020.
- A. Terras. Harmonic Analysis on Symmetric Spaces—Higher Rank Spaces, Positive Definite Matrix Space and Generalizations. Springer New York, 2016.
- M. Unser. A Representer Theorem for Deep Neural Networks. Journal of Machine Learning Research, 20(110):1–30, 2019.
- Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 39008–39034, 2023.
- D. Yarotsky. Universal Approximations of Invariant Maps by Neural Networks. Constructive Approximation, 55:407–474, 2022.
- Deep Sets. In Advances in Neural Information Processing Systems, volume 30, 2017.
- D.-X. Zhou. Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis, 48(2):787–794, 2020.