DRSOM: A Dimension Reduced Second-Order Method (2208.00208v3)
Abstract: In this paper, we propose a Dimension-Reduced Second-Order Method (DRSOM) for convex and nonconvex (unconstrained) optimization. Under a trust-region-like framework, our method preserves the convergence of the second-order method while using only curvature information in a few directions. Consequently, the computational overhead of our method remains comparable to the first-order such as the gradient descent method. Theoretically, we show that the method has a local quadratic convergence and a global convergence rate of $O(\epsilon{-3/2})$ to satisfy the first-order and second-order conditions if the subspace satisfies a commonly adopted approximated Hessian assumption. We further show that this assumption can be removed if we perform a corrector step using a Krylov-like method periodically at the end stage of the algorithm. The applicability and performance of DRSOM are exhibited by various computational experiments, including $L_2 - L_p$ minimization, CUTEst problems, and sensor network localization.
- Convergence rates for the heavy-ball continuous dynamics for non-convex optimization, under Polyak–Łojasiewicz condition. Journal of Global Optimization, 84(3):563–589, November 2022.
- An investigation of Newton-Sketch and subsampled Newton methods. Optimization Methods and Software, 35(4):661–680, July 2020.
- Semidefinite programming for ad hoc wireless sensor network localization. In Third International Symposium on Information Processing in Sensor Networks, IPSN 2004, pages 46–54, 2004.
- Convex Until Proven Guilty: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions, May 2017.
- Accelerated Methods for NonConvex Optimization. SIAM Journal on Optimization, 28(2):1751–1772, January 2018.
- On the Complexity of Steepest Descent, Newton’s and Regularized Newton’s Methods for Nonconvex Unconstrained Optimization Problems. SIAM Journal on Optimization, 20(6):2833–2852, January 2010.
- Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity. Mathematical Programming, 130(2):295–319, December 2011a.
- Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2):245–295, April 2011b.
- Xiaojun Chen. Smoothing methods for nonsmooth, nonconvex minimization. Mathematical Programming, 134(1):71–99, August 2012.
- Complexity of unconstrained L2−Lpsubscript𝐿2subscript𝐿𝑝L_{2}-L_{p}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT minimization. Mathematical Programming, 143(1-2):371–383, February 2014.
- Introduction to derivative-free optimization. Number 8 in MPS-SIAM series on optimization. Society for Industrial and Applied Mathematics/Mathematical Programming Society, Philadelphia, 2009a.
- Trust region methods. SIAM, 2000.
- Geometry of interpolation sets in derivative free optimization. Mathematical programming, 111(1):141–172, 2008.
- Global convergence of general derivative-free trust-region algorithms to first-and second-order critical points. SIAM Journal on Optimization, 20(1):387–415, 2009b.
- A trust region algorithm with a worst-case iteration complexity of 𝒪(ϵ−3/2)𝒪superscriptitalic-ϵ32\mathcal{O}(\epsilon^{-3/2})caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT - 3 / 2 end_POSTSUPERSCRIPT ) for nonconvex optimization. Mathematical Programming, 162(1):1–32, March 2017.
- Worst-Case Complexity of an SQP Method for Nonlinear Equality Constrained Stochastic Optimization, January 2022.
- Inexact Newton Methods. SIAM Journal on Numerical Analysis, 19(2):400–408, April 1982.
- Quasi-Newton methods, motivation and theory. SIAM review, 19(1):46–89, 1977.
- A note on the complexity of Lp minimization. Mathematical Programming, 129(2):285–299, 2011.
- Global convergence of the Heavy-ball method for convex optimization. In 2015 European Control Conference (ECC), pages 310–315, Linz, Austria, July 2015.
- CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization. Computational Optimization and Applications, 60(3):545–557, April 2015.
- Error estimates for iterative algorithms for minimizing regularized quadratic subproblems. Optimization Methods and Software, 35(2):304–328, 2020.
- Algorithm 851: CG_descent, a conjugate gradient method with guaranteed descent. ACM Transactions on Mathematical Software (TOMS), 32(1):113–137, 2006.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Adaptive newton sketch: Linear-time optimization with quadratic convergence and effective hessian dimensionality. In International Conference on Machine Learning, pages 5926–5936. 2021.
- Subspace Methods for Nonlinear Optimization, 2021.
- Linear and Nonlinear Programming, volume 228 of International Series in Operations Research & Management Science. Springer International Publishing, Cham, 2021.
- Yurii Nesterov. Lectures on convex optimization, volume 137. Springer, 2018.
- Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1):177–205, August 2006.
- Numerical optimization. Springer Science & Business Media, 2006.
- ipiano: Inertial proximal algorithm for nonconvex optimization. SIAM Journal on Imaging Sciences, 7(2):1388–1419, 2014.
- JuliaSmoothOptimizers, April 2019.
- Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM Journal on Optimization, 27(1):205–245, 2017.
- B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, January 1964.
- Heavy-ball algorithms always escape saddle points. arXiv preprint arXiv:1907.09697, 2019a.
- Non-Ergodic Convergence Analysis of Heavy-Ball Algorithms. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):5033–5040, July 2019b.
- A subspace implementation of quasi-Newton trust region methods for unconstrained optimization. Numerische Mathematik, 104(2):241–269, August 2006.
- Further relaxations of the semidefinite programming approach to sensor network localization. SIAM Journal on Optimization, 19(2):655–673, 2008.
- David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10(1–2):1–157, 2014.
- Newton-type methods for non-convex optimization under inexact Hessian information. Mathematical Programming, 184(1-2):35–70, November 2020.
- NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning, June 2021.
- Sketch-Based Empirical Natural Gradient Methods for Deep Learning. Journal of Scientific Computing, 92(3):94, September 2022.
- Yinyu Ye. A New Complexity Result on Minimization of a Quadratic Function with a Sphere Constraint. In Recent Advances in Global Optimization, volume 176, pages 19–31. Princeton University Press, 1991.
- Yinyu Ye. Second Order Optimization Algorithms I, 2005.
- Y.-X. Yuan and J. Stoer. A subspace study on conjugate gradient algorithms. ZAMM-Journal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 75(1):69–77, 1995.
- Ya-xiang Yuan. A review on subspace methods for nonlinear optimization. In Proceedings of the International Congress of Mathematics, pages 807–827, 2014.
- Ya-xiang Yuan. Recent advances in trust region algorithms. Mathematical Programming, 151(1):249–281, 2015.
- SK Zavriev and FV Kostyuk. Heavy-ball method in nonconvex optimization problems. Computational Mathematics and Modeling, 4(4):336–341, 1993.
- A nonmonotone line search technique and its application to unconstrained optimization. SIAM journal on Optimization, 14(4):1043–1056, 2004.