Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to optimize with convergence guarantees using nonlinear system theory (2403.09389v2)

Published 14 Mar 2024 in eess.SY, cs.LG, and cs.SY

Abstract: The increasing reliance on numerical methods for controlling dynamical systems and training machine learning models underscores the need to devise algorithms that dependably and efficiently navigate complex optimization landscapes. Classical gradient descent methods offer strong theoretical guarantees for convex problems; however, they demand meticulous hyperparameter tuning for non-convex ones. The emerging paradigm of learning to optimize (L2O) automates the discovery of algorithms with optimized performance leveraging learning models and data - yet, it lacks a theoretical framework to analyze convergence of the learned algorithms. In this paper, we fill this gap by harnessing nonlinear system theory. Specifically, we propose an unconstrained parametrization of all convergent algorithms for smooth non-convex objective functions. Notably, our framework is directly compatible with automatic differentiation tools, ensuring convergence by design while learning to optimize.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. B. T. Polyak, “Some methods of speeding up the convergence of iteration methods,” Ussr computational mathematics and mathematical physics, vol. 4, no. 5, pp. 1–17, 1964.
  2. Y. E. Nesterov, “A method of solving a convex programming problem with convergence rate O⁢(1k2)𝑂1superscript𝑘2{O}\bigl{(}\frac{1}{k^{2}}\bigr{)}italic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ),” in Doklady Akademii Nauk, vol. 269, no. 3.   Russian Academy of Sciences, 1983, pp. 543–547.
  3. Y. Bengio, “Practical recommendations for gradient-based training of deep architectures,” in Neural Networks: Tricks of the Trade: Second Edition.   Springer, 2012, pp. 437–478.
  4. F. Dörfler, Z. He, G. Belgioioso, S. Bolognani, J. Lygeros, and M. Muehlebach, “Towards a systems theory of algorithms,” arXiv preprint arXiv:2401.14029, 2024.
  5. L. Lessard, B. Recht, and A. Packard, “Analysis and design of optimization algorithms via integral quadratic constraints,” SIAM Journal on Optimization, vol. 26, no. 1, pp. 57–95, 2016.
  6. A. Sundararajan, B. Van Scoy, and L. Lessard, “Analysis and design of first-order distributed optimization algorithms over time-varying graphs,” IEEE Transactions on Control of Network Systems, vol. 7, no. 4, pp. 1597–1608, 2020.
  7. L. Lessard, “The analysis of optimization algorithms: A dissipativity approach,” IEEE Control Systems Magazine, vol. 42, no. 3, pp. 58–72, 2022.
  8. B. Goujaud, A. Dieuleveut, and A. Taylor, “On fundamental proof structures in first-order optimization,” in 2023 62nd IEEE Conference on Decision and Control (CDC).   IEEE, 2023, pp. 3023–3030.
  9. C. Scherer and C. Ebenbauer, “Convex synthesis of accelerated gradient algorithms,” SIAM Journal on Control and Optimization, vol. 59, no. 6, pp. 4615–4645, 2021.
  10. X. Chen and E. Hazan, “Online control for meta-optimization,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  11. A. Hauswirth, S. Bolognani, G. Hug, and F. Dörfler, “Optimization algorithms as robust feedback controllers,” arXiv preprint arXiv:2103.11329, 2021.
  12. G. Belgioioso, D. Liao-McPherson, M. H. de Badyn, S. Bolognani, R. S. Smith, J. Lygeros, and F. Dörfler, “Online feedback equilibrium seeking,” arXiv preprint arXiv:2210.12088, 2022.
  13. M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,” Advances in neural information processing systems, vol. 29, 2016.
  14. T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin, “Learning to optimize: A primer and a benchmark,” Journal of Machine Learning Research, vol. 23, no. 189, pp. 1–59, 2022.
  15. K. Li and J. Malik, “Learning to optimize,” in International Conference on Learning Representations, 2017.
  16. ——, “Learning to optimize neural nets,” arXiv preprint arXiv:1703.00441, 2017.
  17. H. Heaton, X. Chen, Z. Wang, and W. Yin, “Safeguarded learned convex optimization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 6, 2023, pp. 7848–7855.
  18. J. Miller and M. Hardt, “Stable recurrent models,” arXiv preprint arXiv:1805.10369, 2018.
  19. K.-K. K. Kim, E. R. Patrón, and R. D. Braatz, “Standard representation and unified stability analysis for dynamic artificial neural network models,” Neural Networks, vol. 98, pp. 251–262, 2018.
  20. M. Revay, R. Wang, and I. R. Manchester, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023.
  21. D. P. Bertsekas and J. N. Tsitsiklis, “Gradient convergence in gradient methods with errors,” SIAM Journal on Optimization, vol. 10, no. 3, pp. 627–642, 2000.
  22. L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate, “Neural system level synthesis: Learning over all stabilizing policies for nonlinear systems,” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 2765–2770.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets