Determine the minimax-optimal constant stepsize for gradient descent

Determine, for a given number of iterations N and parameters L > 0 and D > 0, the constant stepsize α(N) that minimizes the worst-case final objective gap of gradient descent over the class of L-smooth convex functions: run x_{k+1} = x_k − (α/L) ∇f(x_k) for k = 0,…,N−1 from an initial point x_0 with ∥x_0 − x_⋆∥ ≤ D (where x_⋆ is a minimizer of f), and determine the α(N) that minimizes sup over all such (f, x_0) of f(x_N) − inf f.

Background

The paper studies gradient descent on the class of L-smooth convex functions with N fixed iterations and constant stepsize. The authors formalize the design objective as a minimax problem that seeks the constant stepsize minimizing the worst-case final objective gap over all L-smooth convex objectives and initializations within distance D of a minimizer.

This problem was previously investigated via performance estimation (PEP) and semidefinite programming. Drori and Teboulle (2012) conjectured a closed-form characterization for the optimal stepsize based on balancing performance on quadratic and Huber functions, but a proof remains open. The present note addresses this overarching open problem by proposing and numerically validating a stronger structured-certificate conjecture that would imply the optimality claim.

References

This note addresses the open problem of determining the minimax optimal constant stepsize for gradient descent: Given N, identify the stepsize α(N)∈ minimizing the worst-case final objective gap \begin{equation} \label{eq:minimax-design} \min_{\tilde \alpha\in\mathbb{R} \max_{(f,x_0)\in\mathcal{F}{L,D} f(x_N) - \inf f, \end{equation} where \mathcal{F}{L,D} denotes the set of (f,x_0) such that f is an L-smooth convex function with a minimizer x_\star satisfying |x_0-x_\star|\leq D and x_N is the output of N steps of gradient descent with constant stepsize h_k=\tilde \alpha.

A Strengthened Conjecture on the Minimax Optimal Constant Stepsize for Gradient Descent (2407.11739 - Grimmer et al., 16 Jul 2024) in Section 1 (Introduction)