Proof of small k* at the edge of stability
Establish a rigorous proof that at the edge of stability in gradient descent, nonlinear interactions among multiple overshooting directions force self-organization that limits the number of simultaneously active modes to a small constant k*. Quantify the critical threshold k_crit (typically 2–4) and demonstrate, as a function of architecture and optimizer, that training dynamics keep only k* modes at the edge.
References
A plausible explanation is that when k > k_{\mathrm{crit} (typically 2--4), the nonlinear interactions between overshoots destabilise the self-correction mechanism. The system self-organises to keep only a few modes at the edge. We do not have a proof of this; it remains an open question.
— The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training
(2603.28964 - Xu, 30 Mar 2026) in Remark “Empirical Observation: Small k*,” Section 16 (Edge of Stability)