Generalization error theory for infinite-width nonlinear networks in the mean-field regime

Develop a rigorous theory that characterizes the generalization error of infinite-width nonlinear neural networks operating in the mean-field regime when trained on finite datasets, for example two-layer ReLU networks with Gaussian inputs trained by gradient flow or gradient descent. Specifically, derive expressions for the expected test error as a function of sample size and relevant model/task parameters so that transferability in this nonlinear setting can be analyzed analytically rather than empirically.

Background

The paper provides exact analytical results for transfer learning in deep linear networks, including closed-form expressions for generalization errors under scratch training, linear transfer, and fine-tuning. These results enable a detailed transferability phase diagram based on dataset size and feature overlap.

To assess whether analogous results hold for nonlinear networks, the authors study student–teacher two-layer ReLU networks in the mean-field, feature-learning regime. While they empirically observe power-law scaling of scratch-trained generalization error and can heuristically predict boundaries between positive and negative transfer, they explicitly state that a theoretical framework for generalization error in this setting is lacking, which prevents a fully rigorous transferability analysis for nonlinear models.

References

However, an expression for the generalization error of the scratch-trained model is also needed to derive the transferability. We are not aware of a theory of generalization error for infinite width nonlinear networks trained on a finite data in the mean field regime.

Features are fate: a theory of transfer learning in high-dimensional regression  (2410.08194 - Tahir et al., 2024) in Section "Student-teacher ReLU networks"