Clarify Muon’s mechanisms and its relationship to Adam
Determine the underlying optimization mechanisms of the Muon optimizer, which orthogonalizes matrix-shaped gradient updates via spectral normalization, and rigorously characterize its relationship to adaptive optimizers such as Adam that apply root-mean-square normalized, elementwise second-moment updates.
References
The Muon optimizer has recently attracted considerable attention for its strong empirical performance and use of orthogonalized updates on matrix-shaped parameters, yet its underlying mechanisms and relationship to adaptive optimizers such as Adam remain insufficiently understood.
— Delving into Muon and Beyond: Deep Analysis and Extensions
(2602.04669 - Qi et al., 4 Feb 2026) in Abstract