Formal uniqueness of the Frobenius-norm hypersphere for first-order weight-decay cancellation

Establish a formal uniqueness characterization of the Frobenius-norm hypersphere among matrix-norm constraints for which projection back to a fixed-norm sphere preserves only the tangent component of an update to first order, thereby rendering weight decay a first-order no-op under hypersphere optimization.

Background

The paper proves that under a Frobenius-sphere constraint, projecting updated weights back to a fixed Frobenius norm removes the radial component of the update to first order, making weight decay a first-order no-op. This simplifies optimization by eliminating weight decay as a hyperparameter.

However, the authors do not provide a formal uniqueness result showing whether the Frobenius norm is uniquely characterized by this first-order tangent-preservation property among possible matrix norms used for hypersphere optimization, motivating a theoretical uniqueness analysis.

References

Therefore, in this work, we only study the MuonH optimizer, which is based on the Frobenius norm, and we leave the formal uniqueness characterization as a future work.

Rethinking Language Model Scaling under Transferable Hypersphere Optimization  (2603.28743 - Ren et al., 30 Mar 2026) in Section 3.1 Elimination of Weight Decay