Dice Question Streamline Icon: https://streamlinehq.com

Optimal objective-gap rate for gradient descent stepsizes

Determine whether O(N^{-log_2(1+√2)}) is the optimal worst-case convergence rate for the terminal objective gap f(x_N)−f(x_*) achievable by gradient descent on L-smooth convex functions over all possible choices of stepsizes (including arbitrary iteration-dependent selections).

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper proves that specific long-step schedules for gradient descent on L-smooth convex functions achieve an objective-gap rate of order N{-log_2(1+√2)} and improve constants versus the silver stepsize schedule. Within a broader context, Altschuler and Parrilo’s work suggests that no gradient descent stepsize scheme can surpass this exponent for objective-gap convergence.

The authors explicitly note that this N{-log_2(1+√2)} rate is conjectured to be optimal among all gradient descent stepsize schemes for objective-gap reduction, motivating a definitive characterization of the best possible exponent independent of the particular stepsize schedule.

References

Note although O(1/N{\log_2()}) is conjectured to be the optimal rate for gradient descent among all possible stepsize selections, it is not optimal among all gradient methods.

Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps (2403.14045 - Grimmer et al., 20 Mar 2024) in Remark (On Optimality of Rates)