Improving Stochastic Cubic Newton with Momentum (2410.19644v2)
Abstract: We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.
- Inexact tensor methods and their application to stochastic convex optimization. arXiv preprint arXiv:2012.15636.
- Advancing the lower bounds: an accelerated, stochastic, second-order method with optimal adaptation to inexactness. In The Twelfth International Conference on Learning Representations.
- Extra-Newton: A first approach to noise-adaptive accelerated second-order methods. Advances in Neural Information Processing Systems, 35:29859–29872.
- Reducing the variance in online optimization by transporting past gradients.
- Bennett, A. A. (1916). Newton’s method in general analysis. Proceedings of the National Academy of Sciences of the United States of America, 2(10):592.
- Cubic regularization in symmetric rank-1 quasi-Newton methods. Mathematical Programming Computation, 10:457–486.
- Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2):245–295.
- Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function-and derivative-evaluation complexity. Mathematical programming, 130(2):295–319.
- On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization. SIAM Journal on Optimization, 22(1):66–86.
- Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Mathematical Programming, 169:337–375.
- Cauchy, A. (1847). Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris, 25(1847):536–538.
- Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27.
- Unified convergence theory of stochastic and variance-reduced cubic newton methods. Transactions on Machine Learning Research.
- The masked sample covariance estimator: An analysis via matrix concentration inequalities.
- Trust region methods. SIAM.
- Momentum improves normalized sgd. In International conference on machine learning, pages 2260–2268. PMLR.
- Momentum improves normalized sgd.
- Momentum-based variance reduction in non-convex sgd.
- First and zeroth-order implementations of the regularized Newton method with lazy approximated hessians. arXiv preprint arXiv:2309.02412.
- Convex optimization based on global lower second-order models. Advances in Neural Information Processing Systems, 33:16546–16556.
- Spectral preconditioning for gradient methods on graded non-convex functions. In 41th International Conference on Machine Learning (ICML 2024), number 41.
- Fine, H. B. (1916). On Newton’s method of approximation. Proceedings of the National Academy of Sciences of the United States of America, 2(9):546.
- A momentum accelerated adaptive cubic regularization method for nonconvex optimization.
- Second-order methods with cubic regularization under inexact information. arXiv preprint arXiv:1710.05782.
- Provable non-accelerations of the heavy-ball method. arXiv preprint arXiv:2307.11291.
- A cubic regularization of newton’s method with finite difference hessian approximations. Numerical Algorithms, 90(2):607–630.
- Griewank, A. (1981). The modification of Newton’s method for unconstrained optimization by bounding cubic terms. Technical report, Technical report NA/12.
- Stochastic subspace cubic Newton method. In International Conference on Machine Learning, pages 4027–4038. PMLR.
- Most tensor problems are np-hard. Journal of the ACM (JACM) 60 45.
- Cubic regularized quasi-Newton methods. arXiv preprint arXiv:2302.04987.
- Kantorovich, L. V. (1948). On Newton’s method for functional equations. In Dokl. Akad. Nauk SSSR, volume 59, pages 1237–1240.
- Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. Volume 23, Number 3, 462-466.
- Adam: A method for stochastic optimization.
- Sub-sampled cubic regularization for non-convex optimization. In International Conference on Machine Learning, pages 1895–1904. PMLR.
- Korpelevich, G. M. (1976). The extragradient method for finding saddle points and other problems. Matecon, 12:747–756.
- Lan, G. (2020). First-order and stochastic optimization methods for machine learning, volume 1. Springer.
- MNIST handwritten digit database.
- Universal heavy-ball method for nonconvex optimization under hölder continuous hessians. Mathematical Programming, pages 1–29.
- Nemirovski, A. (2004). Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251.
- Nesterov, Y. (1983). A method of solving a convex programming problem with convergence rate O(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). 27(2):372–376.
- Nesterov, Y. (2008). Accelerating the cubic regularization of Newton’s method on convex problems. Mathematical Programming, 112(1):159–181.
- Nesterov, Y. (2018). Lectures on convex optimization, volume 137. Springer.
- Cubic regularization of Newton’s method and its global performance. Mathematical Programming, 108(1):177–205.
- Stochastic cubic regularization for fast nonconvex optimization. Part of Advances in Neural Information Processing Systems 31.
- Numerical optimization. Springer Science & Business Media.
- Polyak, B. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17.
- Polyak, B. T. (1987). Introduction to optimization.
- Polyak, B. T. (2007). Newton’s method and its use in optimization. European Journal of Operational Research, 181(3):1086–1096.
- A stochastic approximation method the annals of mathematical statistics. Vol. 22, No. 3. pp. 400-407.
- Scieur, D. (2024). Adaptive quasi-Newton and anderson acceleration framework with explicit global (accelerated) convergence rates. In International Conference on Artificial Intelligence and Statistics, pages 883–891. PMLR.
- Universal average-case optimality of Polyak momentum. In International conference on machine learning, pages 8565–8572. PMLR.
- Better SGD using second-order momentum. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors, Advances in Neural Information Processing Systems.
- Stochastic cubic regularization for fast nonconvex optimization.
- A note on inexact condition for cubic regularized Newton’s method. arXiv preprint arXiv:1808.07384.
- Cubic regularization with momentum for nonconvex optimization.
- Newton-type methods for non-convex optimization under inexact Hessian information. arXiv preprint arXiv:1708.07164 .
- Sub-sampled newton methods with non-uniform sampling.
- Cubic regularized subspace Newton for non-convex optimization. arXiv preprint arXiv:2406.16666.
- Stochastic variance-reduced cubic regularization methods. Journal of Machine Learning Research 20 1-47.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.