Dice Question Streamline Icon: https://streamlinehq.com

Asymptotic quadratic behavior as an explanation for large learning rate success

Investigate whether the empirical success of large fixed learning rates in stochastic optimization is explained by the training dynamics or effective loss landscape exhibiting asymptotically quadratic behavior, and rigorously establish or refute this conjectured mechanism.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors note that for quadratic objectives, prior work has established that large fixed step sizes can achieve optimal convergence rates, aligning with empirical observations in deep learning. Building on this, they conjecture that asymptotic quadratic behavior of the learning process may broadly underlie the strong empirical performance of large learning rates.

Confirming or refuting this conjecture would clarify the theoretical basis for commonly used large step sizes and guide more principled optimizer design and tuning.

References

In the quadratic case, \citet{bachmoulines} established that large fixed step-sizes give optimal convergence rates, and we conjecture that the success of large learning rates may be attributed to asymptotic quadratic behavior of the learning process.

The Road Less Scheduled (2405.15682 - Defazio et al., 24 May 2024) in Subsection "On Large Learning Rates"