Characterize Muon’s inductive biases and training trajectory
Determine the inductive biases of the Muon optimizer when training deep neural networks, characterize the trajectory it follows through the loss landscape that yields rapid convergence, and ascertain the implications of this trajectory for the properties of the final solution to which Muon-optimized models converge.
References
But we still don't know Muon's biases, we don't know which trajectory in the loss landscape Muon takes to be so quick, and we don't know implications of that to the solution Muon-optimized model converges to.
— To Use or not to Use Muon: How Simplicity Bias in Optimizers Matters
(2603.00742 - Dragutinović et al., 28 Feb 2026) in Section 1 (Introduction)