Optimizer-agnostic theory of spectral gap formation

Develop a general, optimizer-agnostic theory that explains how intra-signal spectral gaps form in the sliding-window trajectory Gram spectrum for neural network training without assuming commutativity between the optimizer preconditioner and the Hessian. Characterize the mechanisms by which Hessian eigenvalue outliers and gradient alignment produce a dominant–subdominant separation when preconditioners such as Muon violate [P, H] ≈ 0, and specify conditions under which gap formation persists across optimizers.

Background

Throughout the paper, gap formation in the trajectory spectrum is linked to the Hessian spectral hierarchy under the assumption that the optimizer preconditioner and the Hessian approximately commute. This mechanism provides a formation explanation for the dominant–subdominant separation but depends on [P, H] ≈ 0.

The authors note that optimizers like Muon employ nonlinear, state-dependent preconditioning that explicitly violates the commutativity assumption, yet empirical results still show spectral gaps and similar capabilities. This motivates a theory that can explain gap formation independent of optimizer-specific preconditioning assumptions.

References

An optimiser-agnostic theory of gap formation is an open problem.

The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training  (2603.28964 - Xu, 30 Mar 2026) in Remark “Formation vs. Persistence,” Section 11.2 (Gap Maximality Principle)