A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees (2310.18841v2)
Abstract: We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We allow gradients to be inexact in a relative sense and relax the coupling between inexactness thresholds for the first- and second-order optimality conditions. Our convergence analysis includes both an expectation bound based on martingale analysis and a high-probability bound based on concentration inequalities. We apply our algorithm to empirical risk minimization problems and obtain improved gradient sample complexity over existing works.
- “Finding approximate local minima faster than gradient descent” In STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing ACM, New York, 2017, pp. 1195–1199
- Zeyuan Allen-Zhu “Natasha 2: Faster Non-Convex Optimization Than SGD” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018
- “Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations” In Proceedings of Thirty Third Conference on Learning Theory 125, Proceedings of Machine Learning Research PMLR, 2020, pp. 242–299
- Albert S Berahas, Liyuan Cao and Katya Scheinberg “Global convergence rate analysis of a generic line search algorithm with noise” In SIAM Journal on Optimization 31.2, 2021, pp. 1489–1518
- “A subsampling line-search method with second-order results” In INFORMS Journal on Optimization 4.4, 2022, pp. 403–425
- “Convergence rate analysis of a stochastic trust-region method via supermartingales” In INFORMS journal on optimization 1.2, 2019, pp. 92–119
- “Sample size selection in optimization methods for machine learning” In Mathematical Programming 134.1, 2012, pp. 127–155
- “Accelerated methods for non-convex optimization” In SIAM Journal on Optimization 28, 2018, pp. 1751–1772
- Coralia Cartis, Nicholas I.M. Gould and Philippe L. Toint “Adaptive Cubic Regularisation Methods for Unconstrained Optimization. Part II: Worst-Case Function- and Derivative-Evaluation Complexity” In Mathematical Programming 130.2, 2011, pp. 295–319
- C. Cartis, N.I.M. Gould and Ph.L. Toint “Complexity bounds for second-order optimality in unconstrained optimization” In Journal of Complexity 28, 2012, pp. 93–108
- Ruobing Chen, Matt Menickelly and Katya Scheinberg “Stochastic optimization using a trust-region method and random models” In Mathematical Programming 169, 2018, pp. 447–487
- “Global convergence rate analysis of unconstrained optimization methods based on probabilistic models” In Mathematical Programming 169.2, Ser. A, 2018, pp. 337–375
- “Global convergence rate analysis of unconstrained optimization methods based on probabilistic models” In Mathematical Programming 169, 2018, pp. 337–375
- Frank E Curtis, Katya Scheinberg and Rui Shi “A stochastic trust region algorithm based on careful step normalization” In Informs Journal on Optimization 1.3, 2019, pp. 200–220
- “Trust-region Newton-CG with strong second-order complexity guarantees for nonconvex optimization” In SIAM Journal on Optimization 31, 2021, pp. 518–544
- Rick Durrett “Probability—theory and examples” 49, Cambridge Series in Statistical and Probabilistic Mathematics Cambridge University Press, Cambridge, 2019
- “Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition” In Proceedings of The 28th Conference on Learning Theory 40, Proceedings of Machine Learning Research Paris, France: PMLR, 2015, pp. 797–842
- “Escaping from saddle points—online stochastic gradient for tensor decomposition” In Conference on Learning Theory, 2015, pp. 797–842
- Serge Gratton, Sadok Jerad and Philippe L. Toint “Convergence Properties of an Objective-Function-Free Optimization Regularization Algorithm, Including an 𝒪(ϵ−3/2)𝒪superscriptitalic-ϵ32\mathcal{O}(\epsilon^{-3/2})caligraphic_O ( italic_ϵ start_POSTSUPERSCRIPT - 3 / 2 end_POSTSUPERSCRIPT ) Complexity Bound” In SIAM Journal on Optimization 33.3, 2023, pp. 1621–1646
- Rong Ge, Chi Jin and Yi Zheng “No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis” In Proceedings of the 34th International Conference on Machine Learning 70, Proceedings of Machine Learning Research PMLR, 2017, pp. 1233–1242
- Geovani Nunes Grapiglia, J Yuan and Y Yuan “On the worst-case complexity of nonlinear stepsize control algorithms for convex unconstrained optimization” In Optimization Methods and Software 31.3, 2016, pp. 591–604
- “How to Escape Saddle Points Efficiently” In Proceedings of the 34th International Conference on Machine Learning 70, Proceedings of Machine Learning Research PMLR, 2017, pp. 1724–1732
- “On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points” In J. ACM 68.2, 2021
- Billy Jin, Katya Scheinberg and Miaolan Xie “Sample Complexity Analysis for Adaptive Optimization Algorithms with Stochastic Oracles” In arXiv preprint arXiv:2303.06838, 2023
- “Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing” In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023
- “Stochastic Optimization for Nonconvex Problem With Inexact Hessian Matrix, Gradient, and Function” In IEEE Transactions on Neural Networks and Learning Systems, 2023, pp. 1–13
- “Cubic regularization of Newton method and its global performance” In Mathematical Programming, Series A 108, 2006, pp. 177–205
- “A stochastic line search method with expected complexity analysis” In SIAM Journal on Optimization 30.1, 2020, pp. 349–376
- Clément W. Royer, Michael O’Neill and Stephen J. Wright “A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization” In Mathematical Programming 180.1-2, Ser. A, 2020, pp. 451–488
- “Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization” In SIAM Journal on Optimization 28, 2018, pp. 1448–1477
- Ju Sun, Qing Qu and John Wright “A geometric analysis of phase retrieval” In Information Theory (ISIT), 2016 IEEE International Symposium on, 2016, pp. 2379–2383 IEEE
- Ju Sun, Qing Qu and John Wright “Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture” In IEEE Trans. Inf. Theor. 63.2, 2017, pp. 853–884
- “Stochastic Cubic Regularization for Fast Nonconvex Optimization” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018
- Stephen J. Wright and Benjamin Recht “Optimization for Data Analysis” Cambridge University Press, Cambridge, 2022
- Peng Xu, Fred Roosta and Michael W. Mahoney “Newton-Type Methods for Non-Convex Optimization under Inexact Hessian Information” In Mathematical Programming 184.1-2, 2020, pp. 35–70
- “Inexact Newton-CG algorithms with complexity guarantees” In IMA Journal of Numerical Analysis, 2022
- Yuqian Zhang, Qing Qu and John Wright “From symmetry to geometry: Tractable nonconvex problems” In arXiv preprint arXiv:2007.06753, 2020