Connect the scaling rules to the constant-norm manifold
Establish the relationship between the optimal hyperparameter scaling rules for Scion and the constant operator-norm condition (norm transfer) observed for the output layer across model and dataset scaling, characterizing whether and how the optimal scaling trajectory remains on a constant-norm manifold.
References
Moreover, how do these rules connect with our main finding, a necessary condition of scaling trajectory in (data, model) axes to have the same constant value — or one might say, to remain on a manifold \citep{bernstein2025manifolds}. We don't yet have answers to those questions, but we believe our study scratches the surface of exciting phenomena that remain to be fully understood.
— Optimal Scaling Needs Optimal Norm
(2510.03871 - Filatov et al., 4 Oct 2025) in Section 6 Conclusion and Discussion