Asymptotic quadratic behavior as an explanation for large learning rate success
Investigate whether the empirical success of large fixed learning rates in stochastic optimization is explained by the training dynamics or effective loss landscape exhibiting asymptotically quadratic behavior, and rigorously establish or refute this conjectured mechanism.
References
In the quadratic case, \citet{bachmoulines} established that large fixed step-sizes give optimal convergence rates, and we conjecture that the success of large learning rates may be attributed to asymptotic quadratic behavior of the learning process.
— The Road Less Scheduled
(2405.15682 - Defazio et al., 24 May 2024) in Subsection "On Large Learning Rates"