Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion Stochastic Optimization for Min-Max Problems (2401.14585v1)

Published 26 Jan 2024 in cs.LG and math.OC

Abstract: The optimistic gradient method is useful in addressing minimax optimization problems. Motivated by the observation that the conventional stochastic version suffers from the need for a large batch size on the order of $\mathcal{O}(\varepsilon{-2})$ to achieve an $\varepsilon$-stationary solution, we introduce and analyze a new formulation termed Diffusion Stochastic Same-Sample Optimistic Gradient (DSS-OG). We prove its convergence and resolve the large batch issue by establishing a tighter upper bound, under the more general setting of nonconvex Polyak-Lojasiewicz (PL) risk functions. We also extend the applicability of the proposed method to the distributed scenario, where agents communicate with their neighbors via a left-stochastic protocol. To implement DSS-OG, we can query the stochastic gradient oracles in parallel with some extra memory overhead, resulting in a complexity comparable to its conventional counterpart. To demonstrate the efficacy of the proposed algorithm, we conduct tests by training generative adversarial networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. “Diffusion optimistic learning for min-max optimization,” in Proc. IEEE ICASSP, Seoul, South Korea, April 2024, pp. 1–5.
  2. “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 1–9.
  3. “Towards deep learning models resistant to adversarial attacks,” arXiv:1706.06083, 2017.
  4. “Sbeed: Convergent reinforcement learning with nonlinear function approximation,” in International Conference on Machine Learning. PMLR, 2018, pp. 1125–1134.
  5. D. Kovalev and A. Gasnikov, “The first optimal algorithm for smooth and strongly-convex-strongly-concave minimax optimization,” in Advances in Neural Information Processing Systems, 2022, pp. 14691–14703.
  6. “On lower iteration complexity bounds for the convex concave saddle point problems,” Mathematical Programming, vol. 194, no. 1-2, pp. 901–935, 2022.
  7. “Tight analysis of extra-gradient and optimistic gradient methods for nonconvex minimax problems,” in Advances in Neural Information Processing Systems, 2022, pp. 31213–31225.
  8. “Faster single-loop algorithms for minimax optimization without strong concavity,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 5485–5517.
  9. “On gradient descent ascent for nonconvex-concave minimax problems,” in International Conference on Machine Learning. PMLR, 2020, pp. 6083–6093.
  10. “Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems,” in Advances in Neural Information Processing Systems, 2020, pp. 1153–1165.
  11. D. Kinderlehrer and G. Stampacchia, An Introduction to Variational Inequalities and Their Applications, SIAM, 2000.
  12. S. Komlósi, “On the stampacchia and minty variational inequalities,” Generalized Convexity and Optimization for Economic and Financial Decisions, pp. 231–260, 1999.
  13. Z. Dou and Y. Li, “On the one-sided convergence of adam-type algorithms in non-convex non-concave min-max optimization,” arXiv:2109.14213, 2021.
  14. “Decentralized local stochastic extra-gradient for variational inequalities,” in Advances in Neural Information Processing Systems, 2022, pp. 38116–38133.
  15. A. Bohm, “Solving nonconvex-nonconcave min-max problems exhibiting weak minty solutions,” Transactions on Machine Learning Research, pp. 1–21, 2023.
  16. “Efficient methods for structured nonconvex-nonconcave min-max optimization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2746–2754.
  17. “Solving stochastic weak minty variational inequalities without increasing batch size,” arXiv:2302.09029, 2023.
  18. “The complexity of constrained min-max optimization,” in Proc. ACM SIGACT Symposium on Theory of Computing, 2021, pp. 1466–1478.
  19. G.M. Korpelevich, “The extragradient method for finding saddle points and other problems,” Matecon, vol. 12, pp. 747–756, 1976.
  20. “A decentralized parallel algorithm for training generative adversarial nets,” in Advances in Neural Information Processing Systems, 2020, pp. 11056–11070.
  21. “Training gans with optimism.,” in International Conference on Learning Representations (ICLR 2018), 2018.
  22. A. H. Sayed, “Adaptation, learning, and optimization over networks,” Foundations and Trends® in Machine Learning, vol. 7, no. 4-5, pp. 311–801, 2014.
  23. “Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition,” in Machine Learning and Knowledge Discovery in Databases: European Conference ECML PKDD, Riva del Garda, Italy, 2016, 2016, pp. 795–811.
  24. “Near-optimal local convergence of alternating gradient descent-ascent for minimax optimization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 7659–7679.
  25. “Reducing noise in gan training with variance reduced extragradient,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  26. C. Liu and L. Luo, “Quasi-newton methods for saddle point problems,” in Advances in Neural Information Processing Systems, 2022, pp. 3975–3987.
  27. “Last iterate is slower than averaged iterate in smooth convex-concave saddle point problems,” in Conference on Learning Theory. PMLR, 2020, pp. 1758–1784.
  28. “Last-iterate convergence of optimistic gradient method for monotone variational inequalities,” Advances in Neural Information Processing Systems, vol. 35, pp. 21858–21870, 2022.
  29. “ODE analysis of stochastic gradient methods with optimism and anchoring for minimax problems,” arXiv:1905.10899, 2019.
  30. T. Yoon and E.K. Ryu, “Accelerated algorithms for smooth convex-concave minimax problems with o (1/k^2 ) rate on squared gradient norm,” in International Conference on Machine Learning. PMLR, 2021, pp. 12098–12109.
  31. “Negative momentum for improved game dynamics,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2019, pp. 1802–1811.
  32. “Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems,” in Advances in Neural Information Processing Systems, 2020, pp. 20566–20577.
  33. A. H. Sayed, Inference and Learning from Data, vol. 3, Cambridge University Press, 2022.
  34. “A faster decentralized algorithm for nonconvex minimax problems,” in Advances in Neural Information Processing Systems, 2021, pp. 25865–25877.
  35. H. Gao, “Decentralized stochastic gradient descent ascent for finite-sum minimax problems,” arXiv:2212.02724, 2022.
  36. “A simple and efficient stochastic algorithm for decentralized nonconvex-strongly-concave minimax optimization,” arXiv:2212.02387, 2022.
  37. F. Huang and S. Chen, “Near-optimal decentralized momentum method for nonconvex-PL minimax problems,” arXiv:2304.10902, 2023.
  38. A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex sgd,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  39. “Lower bounds for non-convex stochastic optimization,” Mathematical Programming, vol. 199, no. 1-2, pp. 165–214, 2023.
  40. L. D. Popov, “A modification of the arrow-hurwicz method for search of saddle points,” Mathematical Notes of the Academy of Sciences of the USSR, pp. 845–848, 1980.
  41. “Extra-newton: A first approach to noise-adaptive accelerated second-order methods,” Advances in Neural Information Processing Systems, vol. 35, pp. 29859–29872, 2022.
  42. “On the convergence of single-call stochastic extra-gradient methods,” in Advances in Neural Information Processing Systems, 2019, pp. 1–11.
  43. Y. Malitsky and M. K. Tam, “A forward-backward splitting method for monotone inclusions without cocoercivity,” SIAM Journal on Optimization, pp. 1451–1472, 2020.
  44. “Revisiting stochastic extragradient,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 4573–4582.
  45. “Exact diffusion for distributed optimization and learning—part i: Algorithm development,” IEEE Transactions on Signal Processing, vol. 67, no. 3, pp. 708–723, 2018.
  46. S. A. Alghunaim and K. Yuan, “A unified and refined convergence analysis for non-convex decentralized learning,” IEEE Transactions on Signal Processing, vol. 70, pp. 3264–3279, 2022.
  47. “Decentralized proximal gradient algorithms with linear convergence rates,” IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2787–2794, 2020.
  48. A. H. Sayed, “Diffusion adaptation over networks,” in Academic Press Library in Signal Processing, vol. 3, pp. 323–453. Elsevier, 2014.
  49. “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv:1511.06434, 2015.
  50. “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  51. “Solving a class of non-convex min-max games using iterative first order methods,” in Advances in Neural Information Processing Systems, 2019, pp. 1–9.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets