Conditions for using large learning rates in stochastic optimization

Determine the necessary and sufficient conditions under which large fixed learning rates (for example, step size γ on the order of D/G) can be safely used in stochastic optimization problems while maintaining optimal convergence behavior, beyond currently known special cases such as quadratic losses.

Background

The paper observes a theory–practice gap regarding step-size selection: while classical theory prescribes γ = D/(G√T), practitioners often achieve superior performance with much larger fixed step sizes (e.g., γ ≈ D/G). The authors provide a special-case regret bound (Theorem "Large Step size convergence") under a verifiable condition, and note that this condition held in all of their experiments. However, they emphasize that the full conditions guaranteeing the safety and optimality of large learning rates in stochastic settings are not fully understood, motivating a precise characterization of when such step sizes are theoretically justified.

This open problem aims to formalize and generalize beyond special cases (e.g., quadratic losses) the criteria that ensure large learning rates yield optimal or near-optimal convergence rates in stochastic optimization.

References

More generally, the full conditions under which large learning rates can be used are not yet fully understood for stochastic problems.

— The Road Less Scheduled (2405.15682 - Defazio et al., 24 May 2024) in Subsection "On Large Learning Rates"

Conditions for using large learning rates in stochastic optimization

Background

References

Related Problems