Conjectured mechanism behind Muon optimizer improvements

Ascertain whether the observed improvements when training spectral mean flows with the Muon optimizer arise from its second-order nature and, specifically, whether this property enhances learning of the model’s multiplicative matrix states in the tensor-network parameterization.

Background

In the long time-series experiments, the authors report that using the Muon optimizer for spectral parameters slightly improved performance. They hypothesize that second-order optimization can better train multiplicative matrix states, a core component of their tensor-network decomposition.

Confirming this conjecture would inform optimizer selection and potentially lead to more reliable training regimes for spectral mean flows, especially in larger models and longer sequences.

References

While we leave a deeper investigation as future work, we conjecture this is because Muon is a second-order optimizer, which are expected to improve the learning of multiplicative matrix states in general.

— Sequence Modeling with Spectral Mean Flows (2510.15366 - Kim et al., 17 Oct 2025) in Appendix: Experiment Details, Long time series

Conjectured mechanism behind Muon optimizer improvements

Sponsor

Background

References

Related Problems