Optimizer-agnostic theory of spectral gap formation
Develop a general, optimizer-agnostic theory that explains how intra-signal spectral gaps form in the sliding-window trajectory Gram spectrum for neural network training without assuming commutativity between the optimizer preconditioner and the Hessian. Characterize the mechanisms by which Hessian eigenvalue outliers and gradient alignment produce a dominant–subdominant separation when preconditioners such as Muon violate [P, H] ≈ 0, and specify conditions under which gap formation persists across optimizers.
References
An optimiser-agnostic theory of gap formation is an open problem.
— The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training
(2603.28964 - Xu, 30 Mar 2026) in Remark “Formation vs. Persistence,” Section 11.2 (Gap Maximality Principle)