Optimal objective-gap rate for gradient descent stepsizes

Determine whether O(N^{-log_2(1+√2)}) is the optimal worst-case convergence rate for the terminal objective gap f(x_N)−f(x_*) achievable by gradient descent on L-smooth convex functions over all possible choices of stepsizes (including arbitrary iteration-dependent selections).

Background

The paper proves that specific long-step schedules for gradient descent on L-smooth convex functions achieve an objective-gap rate of order N^{{-log_2(1+√2)}} and improve constants versus the silver stepsize schedule. Within a broader context, Altschuler and Parrilo’s work suggests that no gradient descent stepsize scheme can surpass this exponent for objective-gap convergence.

The authors explicitly note that this N^{{-log_2(1+√2)}} rate is conjectured to be optimal among all gradient descent stepsize schemes for objective-gap reduction, motivating a definitive characterization of the best possible exponent independent of the particular stepsize schedule.

References

Note although O(1/N^{\log_2()}) is conjectured to be the optimal rate for gradient descent among all possible stepsize selections, it is not optimal among all gradient methods.

— Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps (2403.14045 - Grimmer et al., 20 Mar 2024) in Remark (On Optimality of Rates)

Optimal objective-gap rate for gradient descent stepsizes

Background

References

Related Problems