Conditions for using large learning rates in stochastic optimization
Determine the necessary and sufficient conditions under which large fixed learning rates (for example, step size γ on the order of D/G) can be safely used in stochastic optimization problems while maintaining optimal convergence behavior, beyond currently known special cases such as quadratic losses.
References
More generally, the full conditions under which large learning rates can be used are not yet fully understood for stochastic problems.
                — The Road Less Scheduled
                
                (2405.15682 - Defazio et al., 24 May 2024) in Subsection "On Large Learning Rates"