Explain the mechanism of optimal norm transfer and characterize the constant-norm manifold
Determine the mechanism that causes optimal learning rate and batch size configurations to exhibit norm transfer—i.e., a constant RMS-to-infinity operator norm of the output layer under Scion across model and dataset scaling—and characterize the structure of the corresponding constant-norm manifold along the scaling axes.
References
Why does optimal norm transfer? It is puzzling what makes the optimal scaling trajectory remain on the constant norm manifold, as well as what defines its structure. We don't yet have answers to those questions, but we believe our study scratches the surface of exciting phenomena that remain to be fully understood.
— Optimal Scaling Needs Optimal Norm
(2510.03871 - Filatov et al., 4 Oct 2025) in Section 6 Conclusion and Discussion