Local Curvature Descent: Squeezing More Curvature out of Standard and Polyak Gradient Descent
Abstract: We contribute to the growing body of knowledge on more powerful and adaptive stepsizes for convex optimization, empowered by local curvature information. We do not go the route of fully-fledged second-order methods which require the expensive computation of the Hessian. Instead, our key observation is that, for some problems (e.g., when minimizing the sum of squares of absolutely convex functions), certain local curvature information is readily available, and can be used to obtain surprisingly powerful matrix-valued stepsizes, and meaningful theory. In particular, we develop three new methods$\unicode{x2013}$LCD1, LCD2 and LCD3$\unicode{x2013}$where the abbreviation stands for local curvature descent. While LCD1 generalizes gradient descent with fixed stepsize, LCD2 generalizes gradient descent with Polyak stepsize. Our methods enhance these classical gradient descent baselines with local curvature information, and our theory recovers the known rates in the special case when no curvature information is used. Our last method, LCD3, is a variable metric version of LCD2; this feature leads to a closed-form expression for the iterates. Our empirical results are encouraging, and show that the local curvature descent improves upon gradient descent.
- Stochastic dradient descent with preconditioned Polyak step-size, preprint arXiv:2310.02093, 2023.
- Fast algorithms for ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-regression, preprint arXiv:2211.03963, 2023.
- M. Al-Baali and H. Khalfan. An overview of some practical quasi-Newton methods for unconstrained optimization. Sultan Qaboos University Journal for Science, 12(2):199, 2007. ISSN 2414-536X, 1027-524X. doi: 10.24200/squjs.vol12iss2pp199-209.
- Broyden’s quasi-Newton methods for a nonlinear system of equations and unconstrained optimization: a review and open problems. Optimization Methods and Software, 29(5):937–954, 2014. ISSN 1055-6788, 1029-4937. doi: 10.1080/10556788.2013.856909.
- LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27, 2011. ISSN 2157-6904. doi: 10.1145/1961189.1961199.
- F. H. Clarke. Optimization and nonsmooth analysis. Society for Industrial and Applied Mathematics, 1990. ISBN 978-0-89871-256-8 978-1-61197-130-9.
- Sampling algorithms and coresets for ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT regression. SIAM Journal on Computing, 38(5):2060–2078, 2009. ISSN 0097-5397, 1095-7111. doi: 10.1137/070696507.
- J. E. Dennis, Jr. and J. J. Moré. Quasi-Newton methods, ,otivation and theory. SIAM Review, 19(1):46–89, 1977. ISSN 0036-1445. doi: 10.1137/1019005.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
- A damped newton method achieves global 𝒪(1k2)𝒪1superscript𝑘2\mathcal{O}(\frac{1}{k^{2}})caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) and local quadratic convergence rate. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- G. E. Hinton. Neural networks for machine learning. Lecture slides, CSC 321, University of Toronto, 2014.
- Regularization techniques for learning with matrices. Journal of Machine Learning Research, 13(59):1865–1890, 2012.
- D. P. Kingma and J. Ba. Adam: a method for stochastic optimization, preprint arXiv:1412.6980, 2017.
- Fast linear convergence of randomized BFGS, preprint arXiv:2002.11337, 2021.
- SP2: a second order stochastic Polyak method, preprint arXiv:2207.08171, 2022.
- Y. Malitsky and K. Mishchenko. Adaptive gradient descent without descent. In Proceedings of the 37th International Conference on Machine Learning, pages 6702–6712. Proceedings of Machine Learning Research, 2020.
- K. Mishchenko. Regularized Newton method with global 𝒪(1/k2)𝒪1superscript𝑘2\mathcal{O}(1/k^{2})caligraphic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) convergence. SIAM Journal on Optimization, 33(3):1440–1462, 2023. ISSN 1052-6234, 1095-7189. doi: 10.1137/22M1488752.
- Active linear regression for ℓℓ\ellroman_ℓp norms and beyond. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 744–753, 2022. doi: 10.1109/FOCS54457.2022.00076.
- I. E. Nesterov. Introductory lectures on convex optimization: a basic course. Applied Optimization. Kluwer Academic Publishers, 2004. ISBN 978-1-4020-7553-7.
- Nocedal and Wright. Numerical optimization. Springer Series in Operations Research and Financial Engineering. Springer, 2006. ISBN 978-0-387-30303-1.
- B. Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864–878, 1963. ISSN 00415553. doi: 10.1016/0041-5553(63)90382-3.
- B. Polyak. Introduction to optimization. Translations Series in Mathematics and Engineering. Optimization Software, Inc. Publications Division, New York, 1987. ISBN 0-911575-116.
- On the convergence of Adam and beyond, preprint arXiv:1904.09237, 2019.
- H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400–407, 1951. ISSN 0003-4851. doi: 10.1214/aoms/1177729586.
- A. Rodomanov and Y. Nesterov. Greedy quasi-Newton methods with explicit superlinear convergence. SIAM Journal on Optimization, 31(1):785–811, 2021. ISSN 1052-6234, 1095-7189. doi: 10.1137/20M1320651.
- Weighted SGD for ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT regression with randomized preconditioning. Journal of Machine Learning Research, 18(211):1–43, 2018.
- AdaBB: adaptive Barzilai-Borwein method for convex optimization, preprint arXiv:2401.08024, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.