Ultimate extrapolation boundaries of Fitting and Transfer paradigms

Ascertain the maximum scale at which learning-rate predictions derived from the Fitting paradigm and from μTransfer-based hyperparameter transfer remain accurate in large-scale pre-training, thereby identifying the ultimate extrapolation boundaries for these approaches.

Background

The study proposes and validates approaches to predict optimal learning rates at substantially larger scales by extrapolating from smaller experiments, including a fitted power-law for validation loss versus data and a scaling law for optimal learning rate. Training was extended to large token counts and parameter scales, but the authors explicitly state that they did not investigate the ultimate limits of extrapolation accuracy.

Determining these boundaries would clarify the safe operating regime for scaling predictions and hyperparameter transfer, directly impacting practical deployment at even larger scales.

References

Due to computational resource constraints, this study did not investigate the ultimate extrapolation boundaries (i.e., the maximum scale at which these predictions remain accurate) for both the Fitting and Transfer paradigms.

How to Set the Learning Rate for Large-Scale Pre-training?  (2601.05049 - Zhou et al., 8 Jan 2026) in Section: Limitations