Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate (2401.03058v1)
Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.
- Yair Carmon and John C Duchi “Analysis of krylov subspace solutions of regularized non-convex quadratic problems” In Advances in Neural Information Processing Systems 31, 2018
- Yair Carmon and John C Duchi “First-order methods for nonconvex quadratic minimization” In SIAM Review 62.2 SIAM, 2020, pp. 395–436
- Coralia Cartis, Nicholas IM Gould and Philippe L Toint “Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results” In Mathematical Programming 127.2 Springer, 2011, pp. 245–295
- “LIBSVM: a library for support vector machines” In ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 Acm New York, NY, USA, 2011, pp. 1–27
- Andrew R Conn, Nicholas IM Gould and Philippe L Toint “Trust Region Methods” SIAM, 2000
- “Randomized block cubic Newton method” In International Conference on Machine Learning, 2018, pp. 1290–1298 PMLR
- Vance Faber, Jörg Liesen and Petr Tichỳ “On Chebyshev polynomials of matrices” In SIAM Journal on Matrix Analysis and Applications 31.4 SIAM, 2010, pp. 2205–2221
- “Super-acceleration with cyclical step-sizes” In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022, pp. 3028–3065 PMLR
- “RSN: randomized subspace Newton” In Advances in Neural Information Processing Systems 32, 2019, pp. 3028–3065 PMLR
- Anne Greenbaum and Lloyd N Trefethen “GMRES/CR and Arnoldi/Lanczos as matrix approximation problems” In SIAM Journal on Scientific Computing 15.2 SIAM, 1994, pp. 359–368
- Andreas Griewank “The modification of Newton’s method for unconstrained optimization by bounding cubic terms”, 1981
- “Stochastic subspace cubic Newton method” In International Conference on Machine Learning, 2020, pp. 4027–4038 PMLR
- Slavomír Hanzely “Sketch-and-Project Meets Newton Method: Global 𝒪(k−2)𝒪superscript𝑘2\mathcal{O}(k^{-2})caligraphic_O ( italic_k start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) Convergence with Low-Rank Updates” In arXiv preprint arXiv:2305.13082, 2023
- C. Lanczos “An iteration method for the solution of the eigenvalues problem of linear differential and integral operators” In Journal of Research of the National Bureau of Standards 45, 1950, pp. 255–282
- “On the convergence of the coordinate descent method for convex differentiable minimization” In Journal of Optimization Theory and Applications 72.1 Springer, 1992, pp. 7–35
- Yu Nesterov “Efficiency of coordinate descent methods on huge-scale optimization problems” In SIAM Journal on Optimization 22.2 SIAM, 2012, pp. 341–362
- Yurii Nesterov “Lectures on Convex Optimization” Springer International Publishing, 2018
- Yurii Nesterov and Boris T Polyak “Cubic regularization of Newton method and its global performance” In Mathematical Programming 108.1 Springer, 2006, pp. 177–205
- Beresford N Parlett “The symmetric eigenvalue problem” SIAM, 1998
- Barak A Pearlmutter “Fast exact multiplication by the Hessian” In Neural computation 6.1 MIT Press, 1994, pp. 147–160
- Yousef Saad “Numerical methods for large eigenvalue problems: revised edition” SIAM, 2011
- Kim-Chuan Toh and Lloyd N Trefethen “The Chebyshev polynomials of a matrix” In SIAM Journal on Matrix Analysis and Applications 20.2 SIAM, 1998, pp. 400–419
- Stephen J Wright “Coordinate descent algorithms” In Mathematical programming 151.1 Springer, 2015, pp. 3–34