Dice Question Streamline Icon: https://streamlinehq.com

Connect the scaling rules to the constant-norm manifold

Establish the relationship between the optimal hyperparameter scaling rules for Scion and the constant operator-norm condition (norm transfer) observed for the output layer across model and dataset scaling, characterizing whether and how the optimal scaling trajectory remains on a constant-norm manifold.

Information Square Streamline Icon: https://streamlinehq.com

Background

A central finding of the paper is norm transfer: across both model-width/depth scaling and dataset-size scaling, the operator norm of the output layer remains approximately constant (around 27) at optimal configurations. This constant-norm condition is presented as a necessary (though not sufficient) invariant for optimality.

The authors ask how this invariant connects to the measured optimal (η, B, D) scaling rules, suggesting the trajectory may reside on a manifold defined by constant norms, but explicitly state they do not yet have an answer.

References

Moreover, how do these rules connect with our main finding, a necessary condition of scaling trajectory in (data, model) axes to have the same constant value — or one might say, to remain on a manifold \citep{bernstein2025manifolds}. We don't yet have answers to those questions, but we believe our study scratches the surface of exciting phenomena that remain to be fully understood.

Optimal Scaling Needs Optimal Norm (2510.03871 - Filatov et al., 4 Oct 2025) in Section 6 Conclusion and Discussion