Dice Question Streamline Icon: https://streamlinehq.com

Generalization error theory for infinite-width nonlinear networks in the mean-field regime

Develop a rigorous theory that characterizes the generalization error of infinite-width nonlinear neural networks operating in the mean-field regime when trained on finite datasets, for example two-layer ReLU networks with Gaussian inputs trained by gradient flow or gradient descent. Specifically, derive expressions for the expected test error as a function of sample size and relevant model/task parameters so that transferability in this nonlinear setting can be analyzed analytically rather than empirically.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper provides exact analytical results for transfer learning in deep linear networks, including closed-form expressions for generalization errors under scratch training, linear transfer, and fine-tuning. These results enable a detailed transferability phase diagram based on dataset size and feature overlap.

To assess whether analogous results hold for nonlinear networks, the authors paper student–teacher two-layer ReLU networks in the mean-field, feature-learning regime. While they empirically observe power-law scaling of scratch-trained generalization error and can heuristically predict boundaries between positive and negative transfer, they explicitly state that a theoretical framework for generalization error in this setting is lacking, which prevents a fully rigorous transferability analysis for nonlinear models.

References

However, an expression for the generalization error of the scratch-trained model is also needed to derive the transferability. We are not aware of a theory of generalization error for infinite width nonlinear networks trained on a finite data in the mean field regime.

Features are fate: a theory of transfer learning in high-dimensional regression (2410.08194 - Tahir et al., 10 Oct 2024) in Section "Student-teacher ReLU networks"