Formal uniqueness of the Frobenius-norm hypersphere for first-order weight-decay cancellation
Establish a formal uniqueness characterization of the Frobenius-norm hypersphere among matrix-norm constraints for which projection back to a fixed-norm sphere preserves only the tangent component of an update to first order, thereby rendering weight decay a first-order no-op under hypersphere optimization.
References
Therefore, in this work, we only study the MuonH optimizer, which is based on the Frobenius norm, and we leave the formal uniqueness characterization as a future work.
— Rethinking Language Model Scaling under Transferable Hypersphere Optimization
(2603.28743 - Ren et al., 30 Mar 2026) in Section 3.1 Elimination of Weight Decay