OPTAMI: Global Superlinear Convergence of High-order Methods (2410.04083v2)
Abstract: Second-order methods for convex optimization outperform first-order methods in terms of theoretical iteration convergence, achieving rates up to $O(k{-5})$ for highly-smooth functions. However, their practical performance and applications are limited due to their multi-level structure and implementation complexity. In this paper, we present new results on high-order optimization methods, supported by their practical performance. First, we show that the basic high-order methods, such as the Cubic Regularized Newton Method, exhibit global superlinear convergence for $\mu$-strongly star-convex functions, a class that includes $\mu$-strongly convex functions and some non-convex functions. Theoretical convergence results are both inspired and supported by the practical performance of these methods. Secondly, we propose a practical version of the Nesterov Accelerated Tensor method, called NATA. It significantly outperforms the classical variant and other high-order acceleration techniques in practice. The convergence of NATA is also supported by theoretical results. Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as PyTorch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order.
- An accelerated second-order method for distributed stochastic optimization. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 2407–2413, 2021. ISBN 2576-2370. doi: 10.1109/CDC45484.2021.9683400. URL https://doi.org/10.1109/CDC45484.2021.9683400.
- FLECS: A federated learning second-order framework via compression and sketching. arXiv preprint arXiv:2206.02009, 2022.
- Inexact tensor methods and their application to stochastic convex optimization. Optimization Methods and Software, 39(1):42–83, 2024a. doi: 10.1080/10556788.2023.2261604. URL https://doi.org/10.1080/10556788.2023.2261604.
- Advancing the lower bounds: an accelerated, stochastic, second-order method with optimal adaptation to inexactness. In The Twelfth International Conference on Learning Representations, 2024b. URL https://openreview.net/forum?id=otU31x3fus.
- Extra-newton: A first approach to noise-adaptive accelerated second-order methods. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 29859–29872. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/c10804702be5a0cca89331315413f1a2-Paper-Conference.pdf.
- Oracle complexity of second-order methods for smooth convex optimization. Mathematical Programming, 178:327–360, 2019. ISSN 1436-4646. doi: 10.1007/s10107-018-1293-1. URL https://doi.org/10.1007/s10107-018-1293-1.
- M. Baes. Estimate sequence methods: extensions and approximations. Institute for Operations Research, ETH, Zürich, Switzerland, 2(1), 2009.
- A. A. Bennett. Newton’s method in general analysis. Proceedings of the National Academy of Sciences, 2(10):592–598, 1916.
- Quasi-newton methods for machine learning: forget the past, just sample. Optimization Methods and Software, 37:1668–1704, 2022. doi: 10.1080/10556788.2021.1977806. URL https://doi.org/10.1080/10556788.2021.1977806.
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
- C. G. Broyden. Quasi-Newton methods and their application to function minimisation. Mathematics of Computation, 21:368–381, 1967. doi: 10.2307/2003239. URL http://www.jstor.org/stable/2003239.
- Near-optimal method for highly smooth convex optimization. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pages 492–507. PMLR, 5 2019. URL https://proceedings.mlr.press/v99/bubeck19a.html.
- Analysis of a symmetric rank-one trust region method. SIAM Journal on Optimization, 6:1025–1039, 1996. doi: 10.1137/S1052623493252985. URL https://doi.org/10.1137/S1052623493252985.
- Optimal and adaptive monteiro-svaiter acceleration. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20338–20350. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/7ff97417474268e6b5a38bcbfae04944-Paper-Conference.pdf.
- Libsvm: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
- Convergence of Quasi-Newton matrices generated by the symmetric rank one update. Mathematical Programming, 50:177–195, 1991. doi: 10.1007/BF01594934. URL https://doi.org/10.1007/BF01594934.
- Newton method over networks is fast up to the statistical precision. In International Conference on Machine Learning, pages 2398–2409. PMLR, 2021.
- N. Doikov and Y. Nesterov. Local convergence of tensor methods. Mathematical Programming, 193:315–336, 2022. ISSN 1436-4646. doi: 10.1007/s10107-020-01606-x. URL https://doi.org/10.1007/s10107-020-01606-x.
- N. Doikov and Y. Nesterov. Gradient regularization of Newton method with Bregman distances. Mathematical Programming, 2023. ISSN 1436-4646. doi: 10.1007/s10107-023-01943-7. URL https://doi.org/10.1007/s10107-023-01943-7.
- Super-universal regularized newton method. SIAM Journal on Optimization, 34:27–56, 2024. doi: 10.1137/22M1519444. URL https://doi.org/10.1137/22M1519444.
- Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization. EURO Journal on Computational Optimization, 10:100045, 2022. ISSN 2192-4406. doi: https://doi.org/10.1016/j.ejco.2022.100045. URL https://www.sciencedirect.com/science/article/pii/S2192440622000211.
- R. Fletcher. A new approach to variable metric algorithms. The Computer Journal, 13:317–322, 1 1970. ISSN 0010-4620. doi: 10.1093/comjnl/13.3.317. URL https://doi.org/10.1093/comjnl/13.3.317.
- Optimal tensor methods in smooth convex and uniformly convex optimization. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pages 1374–1391. PMLR, 5 2019a. URL https://proceedings.mlr.press/v99/gasnikov19a.html.
- Near optimal methods for minimizing convex functions with Lipschitz p-th derivatives. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pages 1392–1393. PMLR, 5 2019b. URL https://proceedings.mlr.press/v99/gasnikov19b.html.
- Second-order methods with cubic regularization under inexact information. arXiv preprint arXiv:1710.05782, 2017.
- D. Goldfarb. A family of variable-metric methods derived by variational means. Mathematics of Computation, 24:23–26, 1970. doi: 10.2307/2004873. URL https://doi.org/10.2307/2004873.
- A. Griewank. The modification of Newton’s method for unconstrained optimization by bounding cubic terms. Technical report, Technical report NA/12, 1981.
- Shampoo: Preconditioned stochastic tensor optimization. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1842–1850. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/gupta18a.html.
- A damped Newton method achieves global 𝒪(1k2)𝒪1superscript𝑘2\mathcal{O}\left(\frac{1}{k^{2}}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) and local quadratic convergence rate. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 25320–25334. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/a1f0c0cd6caaa4863af5f12608edf63e-Paper-Conference.pdf.
- Doubly adaptive scaled algorithm for machine learning using second-order information. In Tenth International Conference on Learning Representations (ICLR 2022), 2022. URL https://openreview.net/forum?id=HCelXXcSEuH.
- An optimal high-order tensor method for convex optimization. In Conference on Learning Theory, pages 1799–1801. PMLR, 2019.
- Online learning guided curvature approximation: A quasi-newton method with global non-asymptotic superlinear convergence. CoRR, abs/2302.08580, 2023. doi: 10.48550/arXiv.2302.08580. URL https://doi.org/10.48550/arXiv.2302.08580.
- D. Kamzolov. Near-optimal hyperfast second-order method for convex optimization. In Y. Kochetov, I. Bykadorov, and T. Gruzdeva, editors, Mathematical Optimization Theory and Operations Research, pages 167–178. Springer International Publishing, 2020. ISBN 978-3-030-58657-7.
- Optimal combination of tensor optimization methods. In International Conference on Optimization and Applications, pages 166–183. Springer, 2020.
- Exploiting Higher Order Derivatives in Convex Optimization Methods, pages 1–13. Springer International Publishing, 2023a. ISBN 978-3-030-54621-2. doi: 10.1007/978-3-030-54621-2_858-1. URL https://doi.org/10.1007/978-3-030-54621-2_858-1.
- Accelerated adaptive cubic regularized Quasi-Newton methods. arXiv preprint arXiv:2302.04987, 2023b.
- L. V. Kantorovich. Functional analysis and applied mathematics. Uspekhi Matematicheskikh Nauk, 3(6):89–185, 1948a. (In Russian). Translated as N.B.S Report 1509, Washington D.C. (1952).
- L. V. Kantorovich. On Newton’s method for functional equations. Doklady Akademii Nauk SSSR, 59(7):1237–1240, 1948b. (In Russian).
- L. V. Kantorovich. On Newton’s method. Trudy Matematicheskogo Instituta imeni VA Steklova, 28:104–144, 1949. (In Russian).
- L. V. Kantorovich. Some further applications of principle of majorants. Doklady Akademii Nauk SSSR, 80(6):849–852, 1951a. (In Russian).
- L. V. Kantorovich. Principle of majorants and Newton’s method. Doklady Akademii Nauk SSSR, 76(1):17–20, 1951b. (In Russian).
- L. V. Kantorovich. On approximate solution of functional equations. Uspekhi Matematicheskikh Nauk, 11(6):99–116, 1956. (In Russian).
- L. V. Kantorovich. Some further applications of Newton’s method. Vestnik LGU, Seriya Matemetika Mekhanika, 0(7):68–103, 1957. (In Russian).
- A theoretical and experimental study of the symmetric rank-one update. SIAM Journal on Optimization, 3:1–24, 1993. doi: 10.1137/0803001. URL https://doi.org/10.1137/0803001.
- D. Kovalev and A. Gasnikov. The first optimal acceleration of high-order methods in smooth convex optimization. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 35339–35351. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/e56f394bbd4f0ec81393d767caa5a31b-Paper-Conference.pdf.
- D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45:503–528, 1989. doi: 10.1007/BF01589116. URL https://doi.org/10.1007/BF01589116.
- Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28:333–354, 2018. doi: 10.1137/16M1099546. URL https://doi.org/10.1137/16M1099546.
- K. Mishchenko. Regularized Newton method with global 𝒪(1k2)𝒪1superscript𝑘2\mathcal{O}\left(\frac{1}{k^{2}}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) convergence. SIAM Journal on Optimization, 33:1440–1462, 2023. doi: 10.1137/22M1488752. URL https://doi.org/10.1137/22M1488752.
- An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23:1092–1125, 2013. doi: 10.1137/110833786. URL https://doi.org/10.1137/110833786.
- J. J. Moré. The levenberg–marquardt algorithm: implementation and theory. In Conference on Numerical Analysis, University of Dundee, Scotland, 7 1977. URL https://www.osti.gov/biblio/7256021.
- Y. Nesterov. A method for solving the convex programming problem with convergence rate 𝒪(1k2)𝒪1superscript𝑘2\mathcal{O}\left(\frac{1}{k^{2}}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ). Doklady Akademii Nauk SSSR, 269(3):543–547, 1983. (In Russian).
- Y. Nesterov. Accelerating the cubic regularization of Newton’s method on convex problems. Mathematical Programming, 112:159–181, 2008. ISSN 1436-4646. doi: 10.1007/s10107-006-0089-x. URL https://doi.org/10.1007/s10107-006-0089-x.
- Y. Nesterov. Lectures on Convex Optimization. Springer Cham, 2 edition, 2018. ISBN 978-3-319-91577-7. doi: 10.1007/978-3-319-91578-4.
- Y. Nesterov. Inexact high-order proximal-point methods with auxiliary search procedure. SIAM Journal on Optimization, 31:2807–2828, 2021a. doi: 10.1137/20M134705X. URL https://doi.org/10.1137/20M134705X.
- Y. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, 186:157–183, 2021b. ISSN 1436-4646. doi: 10.1007/s10107-019-01449-1. URL https://doi.org/10.1007/s10107-019-01449-1.
- Y. Nesterov. Superfast second-order methods for unconstrained convex optimization. Journal of Optimization Theory and Applications, 191:1–30, 2021c. ISSN 1573-2878. doi: 10.1007/s10957-021-01930-y. URL https://doi.org/10.1007/s10957-021-01930-y.
- Y. Nesterov. Inexact accelerated high-order proximal-point methods. Mathematical Programming, pages 1–26, 2023.
- Y. Nesterov and B. T. Polyak. Cubic regularization of Newton method and its global performance. Mathematical Programming, 108:177–205, 2006. doi: 10.1007/s10107-006-0706-8. URL https://doi.org/10.1007/s10107-006-0706-8.
- I. Newton. Philosophiae naturalis principia mathematica. Edmond Halley, 1687.
- J. Nocedal and S. J. Wright. Numerical Optimization. Springer New York, NY, 1 edition, 1999. doi: 10.1007/b98874.
- B. T. Polyak. Introduction to optimization. Optimization Software, Inc., Publications Division, 1987.
- B. T. Polyak. Newton’s method and its use in optimization. European Journal of Operational Research, 181:1086–1096, 2007. ISSN 0377-2217. doi: https://doi.org/10.1016/j.ejor.2005.06.076. URL https://www.sciencedirect.com/science/article/pii/S0377221706001469.
- R. Polyak. Complexity of the regularized Newton method. arXiv preprint arXiv:1706.08483, 2017.
- R. A. Polyak. Regularized Newton method for unconstrained convex optimization. Mathematical Programming, 120:125–145, 2009. ISSN 1436-4646. doi: 10.1007/s10107-007-0143-3. URL https://doi.org/10.1007/s10107-007-0143-3.
- J. Raphson. Analysis Aequationum Universalis Seu Ad Aequationes Algebraicas Resolvendas Methodus Generalis & Expedita, Ex Nova Infinitarum Serierum Methodo, Deducta Ac Demonstrata. Th. Braddyll, 1697.
- Aide: Fast and communication efficient distributed optimization. arXiv preprint arXiv:1608.06879, 2016.
- Stochastic gradient methods with preconditioned updates. Journal of Optimization Theory and Applications, 201:471–489, 2024. ISSN 1573-2878. doi: 10.1007/s10957-023-02365-3. URL https://doi.org/10.1007/s10957-023-02365-3.
- D. Scieur. Adaptive Quasi-Newton and anderson acceleration framework with explicit global (accelerated) convergence rates. arXiv preprint arXiv:2305.19179, 2023.
- Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning, pages 1000–1008. PMLR, 2014.
- D. F. Shanno. Conditioning of Quasi-Newton methods for function minimization. Mathematics of Computation, 24:647–656, 1970. doi: 10.2307/2004840. URL https://doi.org/10.2307/2004840.
- T. Simpson. Essays on several curious and useful subjects, in speculative and mix’d mathematicks. Illustrated by a variety of examples. H. Woodfall, 1740.
- AdaHessian: An adaptive second order optimizer for machine learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35:10665–10673, 5 2021. doi: 10.1609/aaai.v35i12.17275. URL https://ojs.aaai.org/index.php/AAAI/article/view/17275.
- Y. Zhang and X. Lin. Disco: Distributed optimization for self-concordant empirical loss. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 362–370, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/zhangb15.html.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.