Conjectured mechanism behind Muon optimizer improvements
Ascertain whether the observed improvements when training spectral mean flows with the Muon optimizer arise from its second-order nature and, specifically, whether this property enhances learning of the model’s multiplicative matrix states in the tensor-network parameterization.
References
While we leave a deeper investigation as future work, we conjecture this is because Muon is a second-order optimizer, which are expected to improve the learning of multiplicative matrix states in general.
— Sequence Modeling with Spectral Mean Flows
(2510.15366 - Kim et al., 17 Oct 2025) in Appendix: Experiment Details, Long time series