Optimal gradient-norm rate for gradient descent stepsizes

Ascertain whether O(N^{-log_2(1+√2)}) is the optimal worst-case convergence rate for the terminal gradient norm ∥∇f(x_N)∥ (or equivalently its square) achievable by gradient descent on L-smooth convex functions across all possible stepsize selections.

Background

Beyond objective-gap convergence, the paper establishes matching N^{{-log_2(1+√2)}} rates for gradient-norm decrease under appropriately ordered long-step schedules, improving upon the classic O(1/N) rate from short steps.

The authors highlight a symmetry between objective-gap and gradient-norm convergence that suggests an analogous optimality exponent for gradient norms. They explicitly state a motivated conjecture that the same N^{{-log_2(1+√2)}} exponent is optimal for gradient-norm convergence among all gradient descent stepsize schemes.

References

If such a symmetry is fundamental, conjecturing O(1/N^{\log_2()}) is also the optimal rate for gradient norm convergence is well motivated.

— Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps (2403.14045 - Grimmer et al., 20 Mar 2024) in Remark (On Optimality of Rates)

Optimal gradient-norm rate for gradient descent stepsizes

Background

References

Related Problems