Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights (2305.08658v3)

Published 15 May 2023 in math.OC, cs.NA, math.NA, and stat.ML

Abstract: We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. On symplectic optimization. arXiv:1802.03653, 2018.
  2. Optimization algorithms inspired by the geometry of dissipative systems. arXiv:1912.02928, 2019.
  3. G. J. Cooper and A. Sayfy. Additive methods for the numerical solution of ordinary differential equations. Mathematics of Computation, 35, 1980.
  4. G. J. Cooper and A. Sayfy. Additive Runge-Kutta methods for stiff ordinary differential equations. Mathematics of Computation, 40, 1983.
  5. Using perturbed underdamped Langevin dynamics to efficiently sample from probability distributions. Journal of Statistical Physics, 169(6), 12 2017.
  6. A geometric integration approach to smooth optimisation: Foundations of the discrete gradient method. arXiv:1805.06444, 2018.
  7. Analysis of optimization algorithms via integral quadratic constraints: nonstrongly convex problems. SIAM Journal on Optimization, 28(3):2654–2689, 2018.
  8. On dissipative symplectic integration with applications to gradient-based optimization. Journal of Statistical Mechanics: Theory and Experiment, 2021(4):043402, 2021.
  9. Accelerated diffusion-based sampling by the non-reversible dynamics with skew-symmetric matrices. Entropy, 23(8), 2021.
  10. Skew-symmetrically perturbed gradient flow for convex optimization. volume 157 of Proceedings of Machine Learning Research, pages 721–736. PMLR, 2021.
  11. Accelerating diffusions. The Annals of Applied Probability, 15(2):1433 – 1444, 2005.
  12. Variance reduction for diffusions. Stochastic Processes and their Applications, 125(9):3522–3540, 2015.
  13. Accelerated mirror descent in continuous and discrete time. In Advances in Neural Information Processing Systems 28, pages 2845–2853. 2015.
  14. M. Laborde and A. Oberman. A Lyapunov analysis for accelerated gradient methods: from deterministic to stochastic case. volume 108 of Proceedings of Machine Learning Research, pages 602–612. PMLR, 2020.
  15. Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. Journal of Statistical Physics, 152(2):237–274, 2013.
  16. Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization, 26(1):57–95, 2016.
  17. The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. volume 80 of Proceedings of Machine Learning Research, pages 3325–3334. PMLR, 2018.
  18. A. Megretski and A. Rantzer. System analysis via integral quadratic constraints. IEEE Transactions on Automatic Control, 42(6):819–830, 1997.
  19. M. Muehlebach and M. I. Jordan. A dynamical systems perspective on Nesterov acceleration. volume 97 of Proceedings of Machine Learning Research, pages 4656–4662. PMLR, 2019.
  20. M. Muehlebach and M. I. Jordan. Optimization with momentum: Dynamical, control-theoretic, and symplectic perspectives. Journal of Machine Learning Research, 22(1), 2021.
  21. Y. Nesterov. A method for solving the convex programming problem with convergence rate 𝒪⁢(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Proceedings of the USSR Academy of Sciences, 269:543–547, 1983.
  22. Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341–362, 2012.
  23. Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition, 2014.
  24. A. Orvieto and A. Lucchi. Shadowing properties of optimization algorithms. In Advances in Neural Information Processing Systems 32, pages 12692–12703. 2019.
  25. B. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.
  26. B. T. Polyak and P. Shcherbakov. Lyapunov functions: An optimization theory perspective. IFAC-PapersOnLine, 50(1):7456 – 7461, 2017. 20th IFAC World Congress.
  27. The connections between Lyapunov functions for some optimization algorithms and differential equations. SIAM Journal on Numerical Analysis, 59(3):1542–1565, 2021.
  28. Integration methods and optimization algorithms. In Advances in Neural Information Processing Systems 30, pages 1109–1118, 2017.
  29. Acceleration via symplectic discretization of high-resolution differential equations. In Advances in Neural Information Processing Systems, volume 32, pages 5744–5752, 2019.
  30. A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. Journal of Machine Learning Research, 17(153):1–43, 2016.
  31. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. volume 89 of Proceedings of Machine Learning Research, pages 1195–1204. PMLR, 2019.
  32. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47):E7351–E7358, 2016.
  33. A Lyapunov analysis of accelerated methods in optimization. Journal of Machine Learning Research, 22(113):1–34, 2021.
  34. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets