Proof of small k* at the edge of stability

Establish a rigorous proof that at the edge of stability in gradient descent, nonlinear interactions among multiple overshooting directions force self-organization that limits the number of simultaneously active modes to a small constant k*. Quantify the critical threshold k_crit (typically 2–4) and demonstrate, as a function of architecture and optimizer, that training dynamics keep only k* modes at the edge.

Background

The paper observes empirically that the number of simultaneously active modes k* is small (often 2–3) and proposes a plausible mechanism tied to the edge-of-stability behavior, where overshoot self-correction constrains the active set.

A formal proof would clarify whether and why only a few modes can remain at the edge and would link k* to architectural and optimizer properties via a critical threshold k_crit, turning the empirical observation into a theorem.

References

A plausible explanation is that when k > k_{\mathrm{crit} (typically 2--4), the nonlinear interactions between overshoots destabilise the self-correction mechanism. The system self-organises to keep only a few modes at the edge. We do not have a proof of this; it remains an open question.

The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training  (2603.28964 - Xu, 30 Mar 2026) in Remark “Empirical Observation: Small k*,” Section 16 (Edge of Stability)