Dice Question Streamline Icon: https://streamlinehq.com

Minimax optimality of basic f-composable schedules

Establish that for every number of iterations n ≥ 1, every minimax optimal stepsize sequence for gradient descent that minimizes the final objective gap over L-smooth convex functions and initializations with distance at most D from a minimizer (i.e., a solution to min_{h ∈ R^n} max_{(f,x0) ∈ F_{L,D}} f(x_n) − inf f) is a basic stepsize schedule constructed via the f-, g-, and s-join composition operations and is f-composable.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper introduces three composable families of stepsize schedules (f-, g-, and s-composable) and corresponding composition operations (f-join, g-join, s-join) that allow building complex schedules from simpler ones while preserving tight convergence guarantees.

Using these operations, the authors show that all numerically computed minimax-optimal schedules for n ≤ 25 from Gupta et al. can be represented (up to small numerical error) as basic schedules. Motivated by this empirical finding, they pose a conjecture asserting that true minimax-optimal schedules are basic and f-composable, which would imply that the optimized basic f-composable schedules (OBS-F) produced by their dynamic programming procedure are information-theoretically optimal.

References

This strong relation between every numerically identified minimax optimal pattern and basic patterns motivates the following natural conjecture. Conjecture For each $n$, every minimax optimal stepsize schedule, solving~eq:minimax, is basic and $f$-composable.

Composing Optimized Stepsize Schedules for Gradient Descent (2410.16249 - Grimmer et al., 21 Oct 2024) in Conjecture (Conjecture~\ref{conj:strong-f-minimax-descripiton}), Section “Numerically Minimax Optimal Stepsizes for n=1,…,25”