Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-probability complexity guarantees for nonconvex minimax problems (2405.14130v3)

Published 23 May 2024 in math.OC

Abstract: Stochastic smooth nonconvex minimax problems are prevalent in machine learning, e.g., GAN training, fair classification, and distributionally robust learning. Stochastic gradient descent ascent (GDA)-type methods are popular in practice due to their simplicity and single-loop nature. However, there is a significant gap between the theory and practice regarding high-probability complexity guarantees for these methods on stochastic nonconvex minimax problems. Existing high-probability bounds for GDA-type single-loop methods only apply to convex/concave minimax problems and to particular non-monotone variational inequality problems under some restrictive assumptions. In this work, we address this gap by providing the first high-probability complexity guarantees for nonconvex/PL minimax problems corresponding to a smooth function that satisfies the PL-condition in the dual variable. Specifically, we show that when the stochastic gradients are light-tailed, the smoothed alternating GDA method can compute an $\varepsilon$-stationary point within $O(\frac{\ell \kappa2 \delta2}{\varepsilon4} + \frac{\kappa}{\varepsilon2}(\ell+\delta2\log({1}/{\bar{q}})))$ stochastic gradient calls with probability at least $1-\bar{q}$ for any $\bar{q}\in(0,1)$, where $\mu$ is the PL constant, $\ell$ is the Lipschitz constant of the gradient, $\kappa=\ell/\mu$ is the condition number, and $\delta2$ denotes a bound on the variance of stochastic gradients. We also present numerical results on a nonconvex/PL problem with synthetic data and on distributionally robust optimization problems with real data, illustrating our theoretical findings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. Advances in Neural Information Processing Systems, 35:3788–3800, 2022.
  2. Robust accelerated gradient methods for smooth strongly convex functions. SIAM Journal on Optimization, 30(1):717–751, 2020.
  3. Duality in robust optimization: primal worst equals dual best. Operations Research Letters, 37(1):1–6, 2009.
  4. Robust Optimization, volume 28. Princeton University Press, 2009.
  5. Smooth monotone stochastic variational inequalities and saddle point problems: A survey. European Mathematical Society Magazine, (127):15–28, 2023.
  6. Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605, 2020.
  7. Minibatch forward-backward-forward methods for solving stochastic variational inequalities. Stochastic Systems, 11(2):112–139, 2021.
  8. Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
  9. Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311, 2018.
  10. Rtrmc: A Riemannian trust-region method for low-rank matrix completion. Advances in Neural Information Processing Systems, 24, 2011.
  11. Accelerated linear convergence of stochastic momentum methods in Wasserstein distances. In International Conference on Machine Learning, pages 891–901. PMLR, 2019.
  12. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40:120–145, 2011.
  13. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1–27, 2011.
  14. Differential privacy dynamics of Langevin diffusion and noisy gradient descent. Advances in Neural Information Processing Systems, 34:14771–14781, 2021.
  15. Independent policy gradient methods for competitive reinforcement learning. Advances in Neural Information Processing Systems, 33:5527–5540, 2020.
  16. An optimal multistage stochastic gradient method for minimax problems. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 3573–3579. IEEE, 2020.
  17. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476. PMLR, 2018.
  18. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.
  19. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  20. Clipped stochastic methods for variational inequalities with heavy-tailed noise. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 31319–31332, 2022.
  21. Extragradient method: 𝒪⁢(1/k)𝒪1𝑘\mathcal{O}(1/k)caligraphic_O ( 1 / italic_k ) last-iterate convergence for monotone variational inequalities and connections with cocoercivity. In International Conference on Artificial Intelligence and Statistics, pages 366–402, 2022.
  22. High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise. arXiv preprint arXiv:2310.01860, 2023.
  23. Cyclic and randomized stepsizes invoke heavier tails in SGD than constant stepsize. Transactions on Machine Learning Research, August 2023.
  24. A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. Journal of Optimization Theory and Applications, 194(3):1014–1041, 2022.
  25. The heavy-tail phenomenon in SGD. In International Conference on Machine Learning, pages 3964–3975. PMLR, 2021.
  26. Result analysis of the NIPS2003 feature selection challenge. Advances in Neural Information Processing Systems, 17, 2004.
  27. Online matrix completion with side information. Advances in Neural Information Processing Systems, 33:20402–20414, 2020.
  28. On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32, 2019.
  29. Accelerated zeroth-order and first-order momentum methods from mini to minimax optimization. Journal of Machine Learning Research, 23(36):1–70, 2022.
  30. Efficient mirror descent ascent methods for nonsmooth minimax problems. Advances in Neural Information Processing Systems, 34, 2021.
  31. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
  32. Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization. In Advances in Neural Information Processing Systems, 2022.
  33. Differentially private accelerated optimization algorithms. SIAM Journal on Optimization, 32(2):795–821, 2022.
  34. High probability and risk-averse guarantees for stochastic saddle point problems. arXiv preprint arXiv:2304.00444, 2023.
  35. Guanghui Lan. First-order and stochastic optimization methods for machine learning, volume 1. Springer, 2020.
  36. General procedure to provide high-probability guarantees for stochastic saddle point problems. arXiv preprint arXiv:2405.03219, 2024.
  37. Complexity lower bounds for nonconvex-strongly-concave min-max optimization. Advances in Neural Information Processing Systems, 34:1792–1804, 2021.
  38. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning, pages 6083–6093. PMLR, 2020.
  39. Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, pages 2738–2779. PMLR, 2020.
  40. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022. Special Issue on Harmonic Analysis and Machine Learning.
  41. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  42. Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems. arXiv preprint arXiv:2307.07113, 2023.
  43. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems, pages 2208–2216, 2016.
  44. Huber loss-based penalty approach to problems with linear constraints. arXiv preprint arXiv:2311.00874, 2023.
  45. Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019.
  46. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
  47. DTP AIDS Antiviral Screen program. Sido0: a phamacology dataset. https://www.causality.inf.ethz.ch/data/SIDO.html, 2008.
  48. Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060, 1810.
  49. Harish Rajagopal. Multistage step size scheduling for minimax problems. Master’s thesis, ETH Zurich. URL: https://www.research-collection.ethz.ch/handle/20.500.11850/572991, 2022.
  50. Nonconvex min-max optimization: Applications, challenges, and recent theoretical advances. IEEE Signal Processing Magazine, 37(5):55–66, 2020.
  51. High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance. In International Conference on Machine Learning, pages 29563–29648. PMLR, 2023.
  52. A tail-index analysis of stochastic gradient noise in deep neural networks. In International Conference on Machine Learning, pages 5827–5837. PMLR, 2019.
  53. Complete dictionary recovery over the sphere ii: Recovery by riemannian trust-region method. IEEE Transactions on Information Theory, 63(2):885–914, 2017.
  54. J v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, 1928.
  55. Solving min-max optimization with hidden structure via gradient descent ascent. Advances in Neural Information Processing Systems, 34:2373–2386, 2021.
  56. Improved algorithms for convex-concave minimax optimization. Advances in Neural Information Processing Systems, 33:4800–4810, 2020.
  57. Maximum margin clustering. In Advances in Neural Information Processing systems, pages 1537–1544, 2005.
  58. A stochastic gda method with backtracking for solving nonconvex (strongly) concave minimax problems. arXiv preprint arXiv:2403.07806, 2024.
  59. A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems. arXiv e-prints, page arXiv:2403.07806, March 2024.
  60. Optimal epoch stochastic gradient descent ascent methods for min-max optimization. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5789–5800, 2020.
  61. Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. Advances in Neural Information Processing Systems, 33:1153–1165, 2020.
  62. Faster single-loop algorithms for minimax optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics, pages 5485–5517. PMLR, 2022.
  63. Accelerated minimax algorithms flock together. arXiv preprint arXiv:2205.11093, 2022.
  64. Large-scale robust deep auc maximization: A new surrogate loss and empirical studies on medical image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3040–3049, 2021.
  65. An accelerated stochastic ADMM for nonconvex and nonsmooth finite-sum optimization. arXiv preprint arXiv:2306.05899, 2023.
  66. A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems. Advances in Neural Information Processing Systems, 33:7377–7389, 2020.
  67. The complexity of nonconvex-strongly-concave minimax optimization. In Uncertainty in Artificial Intelligence, pages 482–492. PMLR, 2021.
  68. SAPD+: An accelerated stochastic method for nonconvex-concave minimax problems. Advances in Neural Information Processing Systems, 35:21668–21681, 2022.
  69. Robust accelerated primal-dual methods for computing saddle points. SIAM Journal on Optimization, 34(1):1097–1130, 2024.
  70. Stochastic primal-dual coordinate method for regularized empirical risk minimization. The Journal of Machine Learning Research, 18(1):2939–2980, 2017.
  71. Distributionally robust learning with weakly convex losses: Convergence rates and finite-sample guarantees. arXiv preprint arXiv:2301.06619, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com