Dice Question Streamline Icon: https://streamlinehq.com

Minimax optimality of basic g-composable schedules

Establish that for every number of iterations n ≥ 1, every minimax optimal stepsize sequence for gradient descent that minimizes the final gradient norm under an upper bound δ on the initial suboptimality (i.e., a solution to min_{h ∈ R^n} max_{(f,x0) ∈ 𝔽_{L,δ}} (1/2)∥∇f(x_n)∥^2) is a basic stepsize schedule constructed via the composition operations and is g-composable.

Information Square Streamline Icon: https://streamlinehq.com

Background

By H-duality, reversing an optimized basic f-composable schedule yields a g-composable schedule with the same rate. The authors propose optimized basic g-composable schedules (OBS-G) by reversing OBS-F and prove nearly tight rate bounds.

They conjecture that true minimax-optimal schedules for minimizing the final gradient norm are also basic and g-composable, which—if established—would imply information-theoretic optimality of OBS-G and a fundamental separation between gradient descent and accelerated methods for gradient norm reduction.

References

Complementary to our Conjecture~\ref{conj:strong-f-minimax-descripiton}, we expect that the minimax optimal stepsizes for minimizing the final gradient norm are basic $g$-composable schedules. Conjecture For each $n$, every minimax optimal stepsize schedule solving $$ \min_{h \in \mathbb{R}n} \max_{(f,x_0)\in \mathfrak{F}{L,\delta} \frac{1}{2}|\nabla f(x_n)|2, $$ is basic and $g$-composable where $\mathfrak{F}{L,\delta}$ is the set of all problem instances $(f,x_0)$ defined by an $L$-smooth convex $f$ and initialization $x_0$ having suboptimality $f(x_0)-f(x_\star)$ at most $\delta$.

Composing Optimized Stepsize Schedules for Gradient Descent (2410.16249 - Grimmer et al., 21 Oct 2024) in Conjecture (Conjecture~\ref{conj:strong-g-minimax-descripiton}), Section “The H-dual Optimized Basic g-composable Schedule (OBS-G)”