Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate (2401.03058v1)

Published 5 Jan 2024 in math.OC, cs.LG, and stat.ML

Abstract: Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Yair Carmon and John C Duchi “Analysis of krylov subspace solutions of regularized non-convex quadratic problems” In Advances in Neural Information Processing Systems 31, 2018
  2. Yair Carmon and John C Duchi “First-order methods for nonconvex quadratic minimization” In SIAM Review 62.2 SIAM, 2020, pp. 395–436
  3. Coralia Cartis, Nicholas IM Gould and Philippe L Toint “Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results” In Mathematical Programming 127.2 Springer, 2011, pp. 245–295
  4. “LIBSVM: a library for support vector machines” In ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 Acm New York, NY, USA, 2011, pp. 1–27
  5. Andrew R Conn, Nicholas IM Gould and Philippe L Toint “Trust Region Methods” SIAM, 2000
  6. “Randomized block cubic Newton method” In International Conference on Machine Learning, 2018, pp. 1290–1298 PMLR
  7. Vance Faber, Jörg Liesen and Petr Tichỳ “On Chebyshev polynomials of matrices” In SIAM Journal on Matrix Analysis and Applications 31.4 SIAM, 2010, pp. 2205–2221
  8. “Super-acceleration with cyclical step-sizes” In International Conference on Artificial Intelligence and Statistics (AISTATS), 2022, pp. 3028–3065 PMLR
  9. “RSN: randomized subspace Newton” In Advances in Neural Information Processing Systems 32, 2019, pp. 3028–3065 PMLR
  10. Anne Greenbaum and Lloyd N Trefethen “GMRES/CR and Arnoldi/Lanczos as matrix approximation problems” In SIAM Journal on Scientific Computing 15.2 SIAM, 1994, pp. 359–368
  11. Andreas Griewank “The modification of Newton’s method for unconstrained optimization by bounding cubic terms”, 1981
  12. “Stochastic subspace cubic Newton method” In International Conference on Machine Learning, 2020, pp. 4027–4038 PMLR
  13. Slavomír Hanzely “Sketch-and-Project Meets Newton Method: Global 𝒪⁢(k−2)𝒪superscript𝑘2\mathcal{O}(k^{-2})caligraphic_O ( italic_k start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) Convergence with Low-Rank Updates” In arXiv preprint arXiv:2305.13082, 2023
  14. C. Lanczos “An iteration method for the solution of the eigenvalues problem of linear differential and integral operators” In Journal of Research of the National Bureau of Standards 45, 1950, pp. 255–282
  15. “On the convergence of the coordinate descent method for convex differentiable minimization” In Journal of Optimization Theory and Applications 72.1 Springer, 1992, pp. 7–35
  16. Yu Nesterov “Efficiency of coordinate descent methods on huge-scale optimization problems” In SIAM Journal on Optimization 22.2 SIAM, 2012, pp. 341–362
  17. Yurii Nesterov “Lectures on Convex Optimization” Springer International Publishing, 2018
  18. Yurii Nesterov and Boris T Polyak “Cubic regularization of Newton method and its global performance” In Mathematical Programming 108.1 Springer, 2006, pp. 177–205
  19. Beresford N Parlett “The symmetric eigenvalue problem” SIAM, 1998
  20. Barak A Pearlmutter “Fast exact multiplication by the Hessian” In Neural computation 6.1 MIT Press, 1994, pp. 147–160
  21. Yousef Saad “Numerical methods for large eigenvalue problems: revised edition” SIAM, 2011
  22. Kim-Chuan Toh and Lloyd N Trefethen “The Chebyshev polynomials of a matrix” In SIAM Journal on Matrix Analysis and Applications 20.2 SIAM, 1998, pp. 400–419
  23. Stephen J Wright “Coordinate descent algorithms” In Mathematical programming 151.1 Springer, 2015, pp. 3–34
Citations (2)

Summary

We haven't generated a summary for this paper yet.