Papers
Topics
Authors
Recent
2000 character limit reached

A novel interpretation of Nesterov's acceleration via variable step-size linear multistep methods (2404.10238v1)

Published 16 Apr 2024 in math.NA and cs.NA

Abstract: Nesterov's acceleration in continuous optimization can be understood in a novel way when Nesterov's accelerated gradient (NAG) method is considered as a linear multistep (LM) method for gradient flow. Although the NAG method for strongly convex functions (NAG-sc) has been fully discussed, the NAG method for $L$-smooth convex functions (NAG-c) has not. To fill this gap, we show that the existing NAG-c method can be interpreted as a variable step size LM (VLM) for the gradient flow. Surprisingly, the VLM allows linearly increasing step sizes, which explains the acceleration in the convex case. Here, we introduce a novel technique for analyzing the absolute stability of VLMs. Subsequently, we prove that NAG-c is optimal in a certain natural class of VLMs. Finally, we construct a new broader class of VLMs by optimizing the parameters in the VLM for ill-conditioned problems. According to numerical experiments, the proposed method outperforms the NAG-c method in ill-conditioned cases. These results imply that the numerical analysis perspective of the NAG is a promising working environment, and considering a broader class of VLMs could further reveal novel methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3𝛼3\alpha\leq 3italic_α ≤ 3. ESAIM Control Optim. Calc. Var., 25:Paper No. 2, 34, 2019.
  2. H. Attouch and J. Peypouquet. The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k21superscript𝑘21/k^{2}1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. SIAM J. Optim., 26(3):1824–1834, 2016.
  3. A0subscript𝐴0A_{0}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-stability of variable stepsize BDF methods. J. Comput. Appl. Math., 45(1-2):29–39, 1993.
  4. Dissipative numerical schemes on Riemannian manifolds with applications to gradient flows. SIAM J. Sci. Comput., 40(6):A3789–A3806, 2018.
  5. Y. Drori. The exact information-based complexity of smooth convex minimization. J. Complexity, 39:1–16, 2017.
  6. Discrete gradient methods for solving variational image regularisation models. J. Phys. A, 50(29):295201, 21, 2017.
  7. Solving ordinary differential equations. I, volume 8 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, second edition, 1993.
  8. E. Hairer and G. Wanner. Solving ordinary differential equations. II, volume 14 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 2010.
  9. Accelerated mirror descent in continuous and discrete time. In Advances in Neural Information Processing Systems, volume 28, 2015.
  10. On the equivalence between SOR-type methods for linear systems and the discrete gradient methods for gradient systems. J. Comput. Appl. Math., 342:58–69, 2018.
  11. Y. Nesterov. Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.
  12. Y. E. Nesterov. A method for solving the convex programming problem with convergence rate O⁢(1/k2)𝑂1superscript𝑘2O(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Dokl. Akad. Nauk SSSR, 269(3):543–547, 1983.
  13. B. O’Donoghue and E. Candès. Adaptive restart for accelerated gradient schemes. Found. Comput. Math., 15(3):715–732, 2015.
  14. A geometric integration approach to nonsmooth, nonconvex optimisation. Found. Comput. Math., 22(5):1351–1394, 2022.
  15. Variational image regularization with Euler’s elastica using a discrete gradient scheme. SIAM J. Imaging Sci., 11(4):2665–2691, 2018.
  16. The connections between Lyapunov functions for some optimization algorithms and differential equations. SIAM J. Numer. Anal., 59(3):1542–1565, 2021.
  17. Integration methods and optimization algorithms. In Advances in Neural Information Processing Systems, volume 30, 2017.
  18. Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program., 195(1-2):79–148, 2022.
  19. Acceleration via symplectic discretization of high-resolution differential equations. In Advances in Neural Information Processing Systems, volume 32, 2019.
  20. G. W. Stewart. On the powers of a matrix with perturbations. Numer. Math., 96(2):363–376, 2003.
  21. A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res., 17(153):1–43, 2016.
  22. A unified discretization framework for differential equation approach with Lyapunov arguments for convex optimization. In Advances in Neural Information Processing Systems, volume 36, pages 26092–26120, 2023.
  23. A. Wilson. Lyapunov arguments in optimization. University of California, Berkeley, 2018.
  24. Direct Runge-Kutta discretization achieves acceleration. In Advances in Neural Information Processing Systems, volume 31, 2018.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.