Rigorous convergence rates for norm-based and preconditioned optimizers
Establish rigorous convergence rates for the optimization methods surveyed in this work—including architecture-aware preconditioners and norm-based optimizers such as KFAC, EKFAC, Shampoo, SOAP, SPlus, and Muon—on non-convex deep neural network objectives, and identify assumptions and step-size regimes under which these rates hold.
Sponsor
References
While this thesis has emphasized practical effectiveness and intuitive understanding, establishing rigorous convergence rates for the methods discussed — particularly in non-convex settings characteristic of deep learning — remains largely open.
— Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale
(2512.18373 - Nagwekar, 20 Dec 2025) in Subsection “Theoretical Convergence Guarantees” within Section “Limitations of the Modular Norm Framework”