Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps (2403.14045v4)
Abstract: This work considers gradient descent for L-smooth convex optimization with stepsizes larger than the classic regime where descent can be ensured. The stepsize schedules considered are similar to but differ slightly from the recent silver stepsizes of Altschuler and Parrilo. For one of our stepsize sequences, we prove a $O\left(N{- 1.2716\dots}\right)$ convergence rate in terms of objective gap decrease and for the other, we show the same rate of decrease for squared-gradient-norm decrease. This first result improves on the recent result of Altschuler and Parrilo by a constant factor, while the second results improve on the exponent of the prior best squared-gradient-norm convergence guarantee of $O\left(N{-1}\right)$.
- Accelerated gradient descent via long steps, 2023.
- Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming, 145:451–482, 2012.
- Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161:307–345, 2017.
- Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313, 2017.
- An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Program., 201(1–2):63–96, oct 2022.
- Antoine Daccache. Performance estimation of the gradient method with fixed arbitrary step sizes. Master’s thesis, Université Catholique de Louvain, 2019.
- Diego Eloi. Worst-case functions for the gradient method with fixed variable step sizes. Master’s thesis, Université Catholique de Louvain, 2022.
- Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Mathematical Programming, 2023.
- Jason Altschuler. Greed, hedging, and acceleration in convex optimization. Master’s thesis, Massachusetts Institute of Technology, 2018.
- B. Grimmer. Provably Faster Gradient Descent via Long Steps. arxiv:2307.06324, 2023.
- Acceleration by stepsize hedging i: Multi-step descent and the silver stepsize schedule, 2023.
- Acceleration by stepsize hedging ii: Silver stepsize schedule for smooth convex optimization, 2023.
- Time-reversed dissipation induces duality between minimizing gradient norm and function value, 2023.
- Yurii Nesterov. A method for solving the convex programming problem with convergence rate o(1/k2)𝑜1superscript𝑘2o(1/k^{2})italic_o ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983.
- A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, 1983.
- Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl., 188(1):192–219, jan 2021.
- On averaging and extrapolation for gradient descent, 2024.
- Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161:307 – 345, 2015.
- Yurii Nesterov. Lectures on Convex Optimization. Springer Publishing Company, Incorporated, 2nd edition, 2018.
- Wolfram Research, Inc. Mathematica, Version 13.3. Champaign, IL, 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.