Fundamental computational limits of weak learnability in high-dimensional multi-index models (2405.15480v4)

Published 24 May 2024 in cs.LG, cond-mat.dis-nn, and cs.CC

Abstract: Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples $n!=!\alpha d$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $\alpha!>!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace where directions that can be learned only above a certain sample complexity $\alpha!>!\alpha_c$, where $\alpha_{c}$ marks a computational phase transition. In a limited but interesting set of really hard directions -- akin to the parity problem -- $\alpha_c$ is found to diverge. Finally, (iii) we show that interactions between different directions can result in an intricate hierarchical learning phenomenon, where directions can be learned sequentially when coupled to easier ones. We discuss in detail the grand staircase picture associated to these functions (and contrast it with the original staircase one). Our theory builds on the optimality of approximate message-passing among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent, which we discuss in this context.

Summary

The paper identifies conditions for trivial subspace learning and establishes a critical sample complexity threshold via AMP state evolution.
It shows that above a critical ratio, an 'easy subspace' emerges, marking a computational phase transition in weak learnability.
The study reveals a hierarchical learning process where interdependent subspaces are progressively recovered, informing efficient neural network training.

Fundamental Limits of Weak Learnability in High-Dimensional Multi-Index Models

Introduction

The paper entitled "Fundamental limits of weak learnability in high-dimensional multi-index models" investigates the theoretical boundaries of learning multi-index models using first-order iterative algorithms in high-dimensional settings. Multi-index models are an important class for studying feature learning with neural networks, as they depend on input covariates through non-linear transformations on projections to lower-dimensional subspaces. This work rigorously examines the minimum sample complexity needed to weakly recover their low-dimensional structure, particularly focusing on the high-dimensional regime where the number of samples $n = \alpha d$ is proportional to the covariate dimension $d$ .

Main Contributions

The primary contributions of this paper are threefold:

Trivial Subspace Learning: The paper identifies the conditions under which a trivial subspace—one that can be learned with a single iteration of a first-order method—exists for any positive $\alpha$ . This is achieved by examining the fixed point of the state evolution equations at initial zero overlap.
Easy Directions and Phase Transition: In cases where the trivial subspace is empty, the paper provides necessary and sufficient conditions for the existence of an "easy subspace." These directions can be learned only above a critical sample complexity $\alpha > \alpha_c$ , where $\alpha_c$ marks a computational phase transition. The derived formula for $\alpha_c$ establishes the boundary between efficient and inefficient learnability.
Hierarchical Learning Phenomena: The paper demonstrates that interactions between learned and yet-to-be-learned directions can result in a hierarchical learning process. Some directions can be learned sequentially when coupled with easier ones. This nested learning ability is analyzed through the state evolution of Approximate Message Passing (AMP) algorithms.

Analytical Framework

The analytical backbone of this research is the use of AMP algorithms tailored for multi-index models. These algorithms optimize over Bayesian posterior distributions and perform efficiently within the class of first-order iterative methods.

State Evolution and Optimality

The state evolution (SE) equations are employed to trace the performance of AMP in high dimensions. This traceability makes AMP a valuable tool for theoretical analysis, not just in estimating the marginal distributions but also in understanding the convergence properties in the high-dimensional limit. Given an initial overlap matrix $M^{(0)}$ , the iterative update of this overlap through SE equations helps in quantifying the progression of learning the subspace spanned by the optimal directions.

Numerical Results and Illustrations

The paper provides compelling numerical results that align with theoretical predictions. For instance, in the case of the 2-sparse parity function $g(z_1, z_2) = \text{sign}(z_1 z_2)$ , a computational phase transition is observed at $\alpha_c = \pi^2/4 \approx 2.467$ . Finite-size simulations of AMP show good agreement with the theoretical $\alpha_c$ . Additionally, models involving hierarchical learning, such as $g(z_1, z_2, z_3) = z_1^2 + \text{sign}(z_1 z_2 z_3)$ , illustrate the complex interactions between different subspaces, giving rise to sequential learning dynamics.

Implications and Future Directions

This research has both practical and theoretical implications. From a practical standpoint, understanding the limits of weak learnability helps in designing more efficient neural network training algorithms under resource constraints. Theoretically, it opens avenues for exploring other high-dimensional statistical models, potentially leading to advancements in understanding the learnability of more complex systems.

Future Work

Future research may extend these findings to multi-index models with non-Gaussian covariates or other iterative algorithms beyond AMP. Additionally, investigating the robustness of these results under different noise models and exploring heterogeneous data distributions can provide a more comprehensive understanding of weak learnability limits.

Conclusion

This paper presents a meticulous investigation into the weak learnability of high-dimensional multi-index models, leveraging the robustness of AMP algorithms and state evolution equations. The classification into trivial, easy, and hard subspaces, along with the derivation of a critical sample complexity $\alpha_c$ , contributes significantly to our understanding of feature learning in neural networks. The insights gained from this research provide a solid foundation for future explorations in statistical learning and high-dimensional data analysis.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zdeborova/status/1797215678878888120

https://twitter.com/QCRH/status/1794968809809367396