Symmetric Single Index Learning (2310.02117v1)
Abstract: Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural networks. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.
- The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks. arXiv preprint arXiv:2202.08658, 2022.
- Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics. arXiv preprint arXiv:2302.11055, 2023.
- From high-dimensional & mean-field dynamics to dimensionless odes: A unifying approach to sgd in two-layers networks. arXiv preprint arXiv:2302.05882, 2023.
- Online stochastic gradient descent on non-convex losses from high-dimensional inference. The Journal of Machine Learning Research, 22(1):4788–4838, 2021.
- Mathieu Beau and Adolfo del Campo. Parent hamiltonians of jastrow wavefunctions. SciPost Physics Core, 4(4):030, 2021.
- High-dimensional limit theorems for sgd: Effective dynamics and critical scaling. arXiv preprint arXiv:2206.04030, 2022.
- Learning single-index models with shallow neural networks. In Advances in Neural Information Processing Systems, 2022.
- Imre Bihari. A generalization of a lemma of bellman and its application to uniqueness problems of differential equations. Acta Mathematica Hungarica, 7(1):81–94, 1956.
- On single index models beyond gaussian data. arXiv preprint arXiv:2307.15804, 2023.
- Smoothing the landscape boosts the signal for sgd: Optimal sample complexity for learning single index models. arXiv preprint arXiv:2305.10633, 2023.
- Neural networks can learn representations with gradient descent. In Conference on Learning Theory, 2022.
- On the eigenvalues of random matrices. Journal of Applied Probability, 31(A):49–62, 1994.
- Learning single-index models in gaussian space. In Conference On Learning Theory, pp. 1887–1930. PMLR, 2018.
- Robert FH Fischer. Precoding and signal shaping for digital transmission. John Wiley & Sons, 2005.
- Martin Jacobsen. Laplace and the origin of the ornstein-uhlenbeck process. Bernoulli, 2(3):271–286, 1996.
- Efficient learning of generalized linear and single index models with isotonic regression. Advances in Neural Information Processing Systems, 24, 2011.
- The isotron algorithm: High-dimensional isotonic regression. In COLT, 2009.
- Edwin Langmann. A method to derive explicit formulas for an elliptic generalization of the jack polynomials. arXiv preprint math-ph/0511015, 2005.
- Set transformer: A framework for attention-based permutation-invariant neural networks. In International Conference on Machine Learning, pp. 3744–3753. PMLR, 2019.
- Ian Grant Macdonald. Symmetric functions and Hall polynomials. Oxford university press, 1998.
- Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time. arXiv preprint arXiv:2306.16361, 2023.
- Ryan O’Donnell. Analysis of boolean functions. arXiv preprint arXiv:2105.10386, 2021.
- Stephen B Pope. Algorithms for ellipsoids. Cornell University Report No. FDA, pp. 08–01, 2008.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652–660, 2017.
- Peter Radchenko. High dimensional single index models. Journal of Multivariate Analysis, 139:266–282, 2015.
- A simple neural network module for relational reasoning. Advances in neural information processing systems, 30, 2017.
- Learning kernel-based halfspaces with the zero-one loss. arXiv preprint arXiv:1005.3681, 2010.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Learning a single neuron with gradient methods. In Conference on Learning Theory, pp. 3756–3786. PMLR, 2020.
- Deep sets. Advances in neural information processing systems, 30, 2017.
- Exponential separations in symmetric neural networks. Advances in Neural Information Processing Systems, 35:33134–33145, 2022.