Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule (2309.07879v1)
Abstract: Can we accelerate convergence of gradient descent without changing the algorithm -- just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k{\log_{\rho} 2} \approx k{0.7864}$ iterations, where $\rho=1+\sqrt{2}$ is the silver ratio and $k$ is the condition number. This is intermediate between the textbook unaccelerated rate $k$ and the accelerated rate $\sqrt{k}$ due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate $\varepsilon{-\log_{\rho} 2} \approx \varepsilon{-0.7864}$. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period $k{\log_{\rho} 2}$. This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).
- Jason M. Altschuler (27 papers)
- Pablo A. Parrilo (66 papers)