Cause of diminishing per-module hyperparameter speed-ups at scale
Ascertain whether the observed diminishing speed-ups from per-module hyperparameter configurations at larger model and data scales are primarily due to imperfect hyperparameter transfer in the non-asymptotic regime or instead reflect an inherent asymptotic property of infinite-scale models.
Sponsor
References
However, we do not know whether that is mostly to be explained by imperfect hyperparameter transfer in the non-asymptotic regime, or whether this is due to an asymptotic property of the infinite-scale models.
— Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
(2512.22382 - Mlodozeniec et al., 26 Dec 2025) in Section: Discussion and Conclusion (Limitations and Future Work)