Clarify Muon’s mechanisms and its relationship to Adam

Determine the underlying optimization mechanisms of the Muon optimizer, which orthogonalizes matrix-shaped gradient updates via spectral normalization, and rigorously characterize its relationship to adaptive optimizers such as Adam that apply root-mean-square normalized, elementwise second-moment updates.

Background

The paper investigates Muon, a matrix-based optimizer that orthogonalizes gradient updates via spectral transformations, and compares it to adaptive methods like Adam. Despite growing empirical interest, the authors note that the foundational understanding of Muon’s behavior and its relation to Adam remains incomplete. This motivates a unified spectral framework and controlled experiments to analyze Muon alongside Adam across first- and second-moment-normalized updates.

References

The Muon optimizer has recently attracted considerable attention for its strong empirical performance and use of orthogonalized updates on matrix-shaped parameters, yet its underlying mechanisms and relationship to adaptive optimizers such as Adam remain insufficiently understood.

Delving into Muon and Beyond: Deep Analysis and Extensions  (2602.04669 - Qi et al., 4 Feb 2026) in Abstract