Generalized Optimistic Methods for Convex-Concave Saddle Point Problems (2202.09674v2)
Abstract: The optimistic gradient method has seen increasing popularity for solving convex-concave saddle point problems. To analyze its iteration complexity, a recent work [arXiv:1906.01115] proposed an interesting perspective that interprets this method as an approximation to the proximal point method. In this paper, we follow this approach and distill the underlying idea of optimism to propose a generalized optimistic method, which includes the optimistic gradient method as a special case. Our general framework can handle constrained saddle point problems with composite objective functions and can work with arbitrary norms using Bregman distances. Moreover, we develop a backtracking line search scheme to select the step sizes without knowledge of the smoothness coefficients. We instantiate our method with first-, second- and higher-order oracles and give best-known global iteration complexity bounds. For our first-order method, we show that the averaged iterates converge at a rate of $O(1/N)$ when the objective function is convex-concave, and it achieves linear convergence when the objective is strongly-convex-strongly-concave. For our second- and higher-order methods, under the additional assumption that the distance-generating function has Lipschitz gradient, we prove a complexity bound of $O(1/\epsilon\frac{2}{p+1})$ in the convex-concave setting and a complexity bound of $O((L_pD\frac{p-1}{2}/\mu)\frac{2}{p+1}+\log\log\frac{1}{\epsilon})$ in the strongly-convex-strongly-concave setting, where $L_p$ ($p\geq 2$) is the Lipschitz constant of the $p$-th-order derivative, $\mu$ is the strong convexity parameter, and $D$ is the initial Bregman distance to the saddle point. Moreover, our line search scheme provably only requires a constant number of calls to a subproblem solver per iteration on average, making our first- and second-order methods particularly amenable to implementation.
- “Optimal methods for higher-order smooth monotone variational inequalities” In arXiv preprint arXiv:2205.06167, 2022
- M Marques Alves and Benar F Svaiter “A search-free O(1/k3/2)𝑂1superscript𝑘32{O}(1/k^{3/2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT ) homotopy inexact proximal-Newton extragradient algorithm for monotone variational inequalities” In arXiv preprint arXiv:2308.05887, 2023
- K.J. Arrow, L. Hurwicz and H. Uzawa “Studies in Linear and Non-Linear Programming”, Stanford Mathematical Studies in the Social Sciences Stanford, CA: Stanford University Press, 1958
- “Interior Projection-Like Methods for Monotone Variational Inequalities” In Mathematical Programming 104.1 Springer ScienceBusiness Media LLC, 2005, pp. 39–68
- “A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games” In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 108 PMLR, 2020, pp. 2863–2873
- Tamer Başar and Geert Jan Olsder “Dynamic Noncooperative Game Theory, 2nd Edition” SIAM, 1998
- Heinz H. Bauschke, Jérôme Bolte and Marc Teboulle “A Descent Lemma beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications” In Mathematics of Operations Research 42.2 Institute for Operations Researchthe Management Sciences (INFORMS), 2017, pp. 330–348
- Amir Beck “First-Order Methods in Optimization” SIAM, 2017
- Brian Bullins and Kevin A Lai “Higher-order methods for convex-concave min-max optimization and monotone variational inequalities” In SIAM Journal on Optimization 32.3 SIAM, 2022, pp. 2208–2229
- Coralia Cartis, Nicholas I.M. Gould and Philippe L. Toint “Adaptive Cubic Regularisation Methods for Unconstrained Optimization. Part I: Motivation, Convergence and Numerical Results” In Mathematical Programming 127.2 Springer ScienceBusiness Media LLC, 2011, pp. 245–295
- “A First-Order Primal-Dual Algorithm for Convex Problems With Applications to Imaging” In Journal of Mathematical Imaging and Vision 40.1 Springer ScienceBusiness Media LLC, 2011, pp. 120–145
- “On the Ergodic Convergence Rates of a First-Order Primal–Dual Algorithm” In Mathematical Programming 159.1-2 Springer ScienceBusiness Media LLC, 2016, pp. 253–287
- “Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions” In SIAM Journal on Optimization 3.3 Society for Industrial & Applied Mathematics (SIAM), 1993, pp. 538–543
- Yunmei Chen, Guanghui Lan and Yuyuan Ouyang “Optimal Primal-Dual Methods for a Class of Saddle Point Problems” In SIAM Journal on Optimization 24.4, 2014, pp. 1779–1814
- “Online Optimization with Gradual Variations” In Proceedings of the 25th Annual Conference on Learning Theory (COLT) 23, 2012, pp. 61–620
- Laurent Condat “A Primal-Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms” In Journal of Optimization Theory and Applications 158.2, 2013, pp. 460–479
- “Training GANs with Optimism” In Proceedings of International Conference on Learning Representations (ICLR), 2018
- Jonathan Eckstein “Nonlinear Proximal Point Algorithms Using Bregman Functions, with Applications to Convex Programming” In Mathematics of Operations Research 18.1 Institute for Operations Researchthe Management Sciences (INFORMS), 1993, pp. 202–226
- “Finite-Dimensional Variational Inequalities and Complementarity Problems” New York: Springer-Verlag, 2003
- Alireza Fallah, Asuman Ozdaglar and Sarath Pattathil “An Optimal Multistage Stochastic Gradient Method for Minimax Problems” In Proceedings of the 59th IEEE Conference on Decision and Control (CDC) IEEE, 2020
- “A Variational Inequality Perspective on Generative Adversarial Networks” In Proceedings of International Conference on Learning Representations (ICLR), 2019
- Erfan Yazdandoost Hamedani and Necdet Serhat Aybat “A Primal-Dual Algorithm with Line Search for General Convex-Concave Saddle Point Problems” In SIAM Journal on Optimization 31.2, 2021, pp. 1299–1329
- Niao He, Anatoli Juditsky and Arkadi Nemirovski “Mirror Prox Algorithm for Multi-Term Composite Minimization and Semi-Separable Problems” In Computational Optimization and Applications 61.2 Springer ScienceBusiness Media LLC, 2015, pp. 275–319
- “On the Convergence of Single-Call Stochastic Extra-Gradient Methods” In Advances in Neural Information Processing Systems 32, 2019
- Kevin Huang, Junyu Zhang and Shuzhong Zhang “Cubic regularized Newton method for the saddle point models: A global and local convergence analysis” In J. Sci. Comput. 91.60 Springer, 2022, pp. 1–31
- “An approximation-based regularized extra-gradient method for monotone variational inequalities” In arXiv preprint arXiv:2210.04440, 2022
- Pooria Joulani, András György and Csaba Szepesvári “A modular analysis of adaptive (non-) convex optimization: Optimism, composite objectives, variance reduction, and variational bounds” In Theor. Comput. Sci. 808 Elsevier, 2020, pp. 108–138
- G. Korpelevich “The Extragradient Method for Finding Saddle Points and Other Problems” In Russian; English translation in Matekon In Ekonomika i Matematicheskie Metody 12, 1976, pp. 747–756
- Georgios Kotsalis, Guanghui Lan and Tianjiao Li “Simple and optimal methods for stochastic variational inequalities, I: operator extrapolation” In SIAM J. Optim. 32.3 SIAM, 2022, pp. 2041–2073
- “Interaction Matters: A Note on Non-Asymptotic Local Convergence of Generative Adversarial Networks” In Proceedings of the 22nd International Conference on Artificial Intelligenceand Statistics (AISTATS) 89 PMLR, 2019, pp. 907–915
- “Perseus: A simple high-order regularization method for variational inequalities” In arXiv preprint arXiv:2205.03202, 2022
- Tianyi Lin, Panayotis Mertikopoulos and Michael Jordan “Explicit second-order min-max optimization methods with optimal convergence guarantee” In arXiv preprint arXiv:2210.12860, 2022
- Haihao Lu, Robert M. Freund and Yurii Nesterov “Relatively Smooth Convex Optimization by First-Order Methods, and Applications” In SIAM Journal on Optimization 28.1 Society for Industrial & Applied Mathematics (SIAM), 2018, pp. 333–354
- Yu Malitsky “Proximal Extrapolated Gradient Methods for Variational Inequalities” In Optimization Methods and Software 33.1 Informa UK Limited, 2017, pp. 140–164
- Yu. Malitsky “Projected Reflected Gradient Methods for Monotone Variational Inequalities” In SIAM Journal on Optimization 25.1 Society for Industrial & Applied Mathematics (SIAM), 2015, pp. 502–520
- “A First-Order Primal-Dual Algorithm with Linesearch” In SIAM Journal on Optimization 28.1, 2018, pp. 411–432
- Yura Malitsky and Matthew K. Tam “A Forward-Backward Splitting Method for Monotone Inclusions without Cocoercivity” In SIAM Journal on Optimization 30.2 Society for Industrial & Applied Mathematics (SIAM), 2020, pp. 1451–1472
- B. Martinet “Brève Communication. Régularisation D’inéquations Variationnelles Par Approximations Successives” In ESIAM Mathematical Modelling and Numerical Analysis 4.R3 EDP Sciences, 1970, pp. 154–158
- “Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile” In International Conference on Learning Representations (ICLR), 2018
- Aryan Mokhtari, Asuman Ozdaglar and Sarath Pattathil “A Unified Analysis of Extra-Gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach” In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS) 108 PMLR, 2020, pp. 1497–1507
- Aryan Mokhtari, Asuman E. Ozdaglar and Sarath Pattathil “Convergence Rate of 𝒪(1/k)𝒪1𝑘\mathcal{O}(1/k)caligraphic_O ( 1 / italic_k ) for Optimistic Gradient and Extragradient Methods in Smooth Convex-Concave Saddle Point Problems” In SIAM Journal on Optimization 30.4 Society for Industrial & Applied Mathematics (SIAM), 2020, pp. 3230–3251
- Renato D.C. Monteiro and B.F. Svaiter “Complexity of Variants of Tseng’s Modified F-B Splitting and Korpelevich’s Methods for Hemivariational Inequalities with Applications to Saddle-Point and Convex Optimization Problems” In SIAM Journal on Optimization 21.4 Society for Industrial & Applied Mathematics (SIAM), 2011, pp. 1688–1720
- Renato D.C. Monteiro and B.F. Svaiter “On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean” In SIAM Journal on Optimization 20.6 Society for Industrial & Applied Mathematics (SIAM), 2010, pp. 2755–2787
- Renato D.C. Monteiro and Benar F. Svaiter “Iteration-Complexity of a Newton Proximal Extragradient Method for Monotone Variational Inequalities and Inclusion Problems” In SIAM Journal on Optimization 22.3 Society for Industrial & Applied Mathematics (SIAM), 2012, pp. 914–935
- Arkadi Nemirovski “Prox-Method with Rate of Convergence O(1/t)𝑂1𝑡{O}(1/t)italic_O ( 1 / italic_t ) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-concave Saddle Point Problems” In SIAM Journal on Optimization 15.1 Society for Industrial & Applied Mathematics (SIAM), 2004, pp. 229–251
- Yu. Nesterov “Accelerating the Cubic Regularization of Newton’s Method on Convex Problems” In Mathematical Programming 112.1 Springer ScienceBusiness Media LLC, 2008, pp. 159–181
- Yurii Nesterov “Cubic Regularization of Newton’s Method for Convex Problems with Constraints” In CORE Discussion Paper No. 2006/39, 2006
- Yurii Nesterov “Dual Extrapolation and Its Applications to Solving Variational Inequalities and Related Problems” In Mathematical Programming 109.2-3 Springer ScienceBusiness Media LLC, 2007, pp. 319–344
- Yurii Nesterov “Implementable Tensor Methods in Unconstrained Convex Optimization” In Mathematical Programming Springer ScienceBusiness Media LLC, 2019
- “Cubic Regularization of Newton Method and Its Global Performance” In Mathematical Programming 108.1 Springer ScienceBusiness Media LLC, 2006, pp. 177–205
- “Solving Strongly Monotone Variational and Quasi-Variational Inequalities” In Discrete & Continuous Dynamical Systems - A 31.4 American Institute of Mathematical Sciences (AIMS), 2011, pp. 1383–1396
- “Tensor Methods for Strongly Convex Strongly Concave Saddle Point Problems and Strongly Monotone Variational Inequalities” In arXiv preprint arXiv:2012.15595, 2020
- “Training GANs with Centripetal Acceleration” In Optimization Methods and Software 35.5 Informa UK Limited, 2020, pp. 955–973
- L.D. Popov “A Modification of the Arrow-Hurwicz Method for Search of Saddle Points” In Mathematical Notes of the Academy of Sciences of the USSR 28.5 Springer ScienceBusiness Media LLC, 1980, pp. 845–848
- “Online Learning with Predictable Sequences” In Proceedings of the 26th Annual Conference on Learning Theory (COLT) 30 PMLR, 2013, pp. 993–1019
- “Optimization, Learning, and Games with Predictable Sequences” In Advances in Neural Information Processing Systems 26, 2013
- R.Tyrrell Rockafellar “Monotone Operators and the Proximal Point Algorithm” In SIAM Journal on Control and Optimization 14.5 Society for Industrial & Applied Mathematics (SIAM), 1976, pp. 877–898
- “Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators” Cambridge University Press, 2022
- “A Hybrid Approximate Extragradient – Proximal Point Algorithm Using the Enlargement of a Maximal Monotone Operator” In Set-Valued Analysis 7.4 Springer ScienceBusiness Media LLC, 1999, pp. 323–345
- Benar Fux Svaiter “Complexity of the relaxed hybrid proximal-extragradient method under the large-step condition” In arXiv preprint arXiv:2303.04972, 2023
- Paul Tseng “A Modified Forward-Backward Splitting Method for Maximal Monotone Mappings” In SIAM Journal on Control and Optimization 38.2 Society for Industrial & Applied Mathematics (SIAM), 2000, pp. 431–446
- Paul Tseng “On Accelerated Proximal Gradient Methods for Convex-Concave Optimization” In submitted to SIAM J. Optim., 2008
- Paul Tseng “On Linear Convergence of Iterative Methods for the Variational Inequality Problem” In Journal of Computational and Applied Mathematics 60.1-2 Elsevier BV, 1995, pp. 237–252
- “Fast Distributionally Robust Learning with Variance-Reduced Min-Max Optimization” In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 151 PMLR, 2022, pp. 1219–1250
- “Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization” In International Conference on Machine Learning (ICML) 37 PMLR, 2015, pp. 353–361