Cause of diminishing per-module hyperparameter speed-ups at scale

Ascertain whether the observed diminishing speed-ups from per-module hyperparameter configurations at larger model and data scales are primarily due to imperfect hyperparameter transfer in the non-asymptotic regime or instead reflect an inherent asymptotic property of infinite-scale models.

Background

The authors find that improvements from per-module hyperparameters identified at small scale appear to diminish slowly as models and datasets grow. They note uncertainty regarding whether this pattern results from non-asymptotic transfer imperfections or is a fundamental property of the infinite-width/depth limits.

They suggest that future work should develop computationally feasible methods to disentangle these explanations and determine the true cause of the diminishing returns.

References

However, we do not know whether that is mostly to be explained by imperfect hyperparameter transfer in the non-asymptotic regime, or whether this is due to an asymptotic property of the infinite-scale models.

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration (2512.22382 - Mlodozeniec et al., 26 Dec 2025) in Section: Discussion and Conclusion (Limitations and Future Work)