Reliable hyperparameter transfer across scales
Develop reliable methods to transfer optimal training hyperparameters such as learning rate and initialization from small-scale proxy models to large-scale Large Language Models while guaranteeing stable training dynamics.
Sponsor
References
Consequently, a critical open question is how to reliably transfer optimal hyperparameters (e.g., learning rate, initialization) found on small-scale proxy models to large-scale target models.
— Beyond the Black Box: Theory and Mechanism of Large Language Models
(2601.02907 - Gan et al., 6 Jan 2026) in Subsubsection Hyperparameter Transfer, Section 4: Training Stage (Advanced Topics and Open Questions)