Diffusion Stochastic Optimization for Min-Max Problems (2401.14585v1)
Abstract: The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.
- “Diffusion optimistic learning for min-max optimization,” in Proc. IEEE ICASSP, Seoul, South Korea, April 2024, pp. 1–5.
- “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 1–9.
- “Towards deep learning models resistant to adversarial attacks,” arXiv:1706.06083, 2017.
- “Sbeed: Convergent reinforcement learning with nonlinear function approximation,” in International Conference on Machine Learning. PMLR, 2018, pp. 1125–1134.
- D. Kovalev and A. Gasnikov, “The first optimal algorithm for smooth and strongly-convex-strongly-concave minimax optimization,” in Advances in Neural Information Processing Systems, 2022, pp. 14691–14703.
- “On lower iteration complexity bounds for the convex concave saddle point problems,” Mathematical Programming, vol. 194, no. 1-2, pp. 901–935, 2022.
- “Tight analysis of extra-gradient and optimistic gradient methods for nonconvex minimax problems,” in Advances in Neural Information Processing Systems, 2022, pp. 31213–31225.
- “Faster single-loop algorithms for minimax optimization without strong concavity,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 5485–5517.
- “On gradient descent ascent for nonconvex-concave minimax problems,” in International Conference on Machine Learning. PMLR, 2020, pp. 6083–6093.
- “Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems,” in Advances in Neural Information Processing Systems, 2020, pp. 1153–1165.
- D. Kinderlehrer and G. Stampacchia, An Introduction to Variational Inequalities and Their Applications, SIAM, 2000.
- S. Komlósi, “On the stampacchia and minty variational inequalities,” Generalized Convexity and Optimization for Economic and Financial Decisions, pp. 231–260, 1999.
- Z. Dou and Y. Li, “On the one-sided convergence of adam-type algorithms in non-convex non-concave min-max optimization,” arXiv:2109.14213, 2021.
- “Decentralized local stochastic extra-gradient for variational inequalities,” in Advances in Neural Information Processing Systems, 2022, pp. 38116–38133.
- A. Bohm, “Solving nonconvex-nonconcave min-max problems exhibiting weak minty solutions,” Transactions on Machine Learning Research, pp. 1–21, 2023.
- “Efficient methods for structured nonconvex-nonconcave min-max optimization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2746–2754.
- “Solving stochastic weak minty variational inequalities without increasing batch size,” arXiv:2302.09029, 2023.
- “The complexity of constrained min-max optimization,” in Proc. ACM SIGACT Symposium on Theory of Computing, 2021, pp. 1466–1478.
- G.M. Korpelevich, “The extragradient method for finding saddle points and other problems,” Matecon, vol. 12, pp. 747–756, 1976.
- “A decentralized parallel algorithm for training generative adversarial nets,” in Advances in Neural Information Processing Systems, 2020, pp. 11056–11070.
- “Training gans with optimism.,” in International Conference on Learning Representations (ICLR 2018), 2018.
- A. H. Sayed, “Adaptation, learning, and optimization over networks,” Foundations and Trends® in Machine Learning, vol. 7, no. 4-5, pp. 311–801, 2014.
- “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,” in Machine Learning and Knowledge Discovery in Databases: European Conference ECML PKDD, Riva del Garda, Italy, 2016, 2016, pp. 795–811.
- “Near-optimal local convergence of alternating gradient descent-ascent for minimax optimization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 7659–7679.
- “Reducing noise in gan training with variance reduced extragradient,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- C. Liu and L. Luo, “Quasi-newton methods for saddle point problems,” in Advances in Neural Information Processing Systems, 2022, pp. 3975–3987.
- “Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems,” in Conference on Learning Theory. PMLR, 2020, pp. 1758–1784.
- “Last-iterate convergence of optimistic gradient method for monotone variational inequalities,” Advances in Neural Information Processing Systems, vol. 35, pp. 21858–21870, 2022.
- “ODE analysis of stochastic gradient methods with optimism and anchoring for minimax problems,” arXiv:1905.10899, 2019.
- T. Yoon and E.K. Ryu, “Accelerated algorithms for smooth convex-concave minimax problems with o (1/k^2 ) rate on squared gradient norm,” in International Conference on Machine Learning. PMLR, 2021, pp. 12098–12109.
- “Negative momentum for improved game dynamics,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 1802–1811.
- “Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems,” in Advances in Neural Information Processing Systems, 2020, pp. 20566–20577.
- A. H. Sayed, Inference and Learning from Data, vol. 3, Cambridge University Press, 2022.
- “A faster decentralized algorithm for nonconvex minimax problems,” in Advances in Neural Information Processing Systems, 2021, pp. 25865–25877.
- H. Gao, “Decentralized stochastic gradient descent ascent for finite-sum minimax problems,” arXiv:2212.02724, 2022.
- “A simple and efficient stochastic algorithm for decentralized nonconvex-strongly-concave minimax optimization,” arXiv:2212.02387, 2022.
- F. Huang and S. Chen, “Near-optimal decentralized momentum method for nonconvex-PL minimax problems,” arXiv:2304.10902, 2023.
- A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex sgd,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- “Lower bounds for non-convex stochastic optimization,” Mathematical Programming, vol. 199, no. 1-2, pp. 165–214, 2023.
- L. D. Popov, “A modification of the arrow-hurwicz method for search of saddle points,” Mathematical Notes of the Academy of Sciences of the USSR, pp. 845–848, 1980.
- “Extra-newton: A first approach to noise-adaptive accelerated second-order methods,” Advances in Neural Information Processing Systems, vol. 35, pp. 29859–29872, 2022.
- “On the convergence of single-call stochastic extra-gradient methods,” in Advances in Neural Information Processing Systems, 2019, pp. 1–11.
- Y. Malitsky and M. K. Tam, “A forward-backward splitting method for monotone inclusions without cocoercivity,” SIAM Journal on Optimization, pp. 1451–1472, 2020.
- “Revisiting stochastic extragradient,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 4573–4582.
- “Exact diffusion for distributed optimization and learning—part i: Algorithm development,” IEEE Transactions on Signal Processing, vol. 67, no. 3, pp. 708–723, 2018.
- S. A. Alghunaim and K. Yuan, “A unified and refined convergence analysis for non-convex decentralized learning,” IEEE Transactions on Signal Processing, vol. 70, pp. 3264–3279, 2022.
- “Decentralized proximal gradient algorithms with linear convergence rates,” IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2787–2794, 2020.
- A. H. Sayed, “Diffusion adaptation over networks,” in Academic Press Library in Signal Processing, vol. 3, pp. 323–453. Elsevier, 2014.
- “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv:1511.06434, 2015.
- “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- “Solving a class of non-convex min-max games using iterative first order methods,” in Advances in Neural Information Processing Systems, 2019, pp. 1–9.