High-probability complexity guarantees for nonconvex minimax problems (2405.14130v3)
Abstract: Stochastic smooth nonconvex minimax problems are prevalent in machine learning, e.g., GAN training, fair classification, and distributionally robust learning. Stochastic gradient descent ascent (GDA)-type methods are popular in practice due to their simplicity and single-loop nature. However, there is a significant gap between the theory and practice regarding high-probability complexity guarantees for these methods on stochastic nonconvex minimax problems. Existing high-probability bounds for GDA-type single-loop methods only apply to convex/concave minimax problems and to particular non-monotone variational inequality problems under some restrictive assumptions. In this work, we address this gap by providing the first high-probability complexity guarantees for nonconvex/PL minimax problems corresponding to a smooth function that satisfies the PL-condition in the dual variable. Specifically, we show that when the stochastic gradients are light-tailed, the smoothed alternating GDA method can compute an $\varepsilon$-stationary point within $O(\frac{\ell \kappa2 \delta2}{\varepsilon4} + \frac{\kappa}{\varepsilon2}(\ell+\delta2\log({1}/{\bar{q}})))$ stochastic gradient calls with probability at least $1-\bar{q}$ for any $\bar{q}\in(0,1)$, where $\mu$ is the PL constant, $\ell$ is the Lipschitz constant of the gradient, $\kappa=\ell/\mu$ is the condition number, and $\delta2$ denotes a bound on the variance of stochastic gradients. We also present numerical results on a nonconvex/PL problem with synthetic data and on distributionally robust optimization problems with real data, illustrating our theoretical findings.
- Privacy of noisy stochastic gradient descent: More iterations without more privacy loss. Advances in Neural Information Processing Systems, 35:3788–3800, 2022.
- Robust accelerated gradient methods for smooth strongly convex functions. SIAM Journal on Optimization, 30(1):717–751, 2020.
- Duality in robust optimization: primal worst equals dual best. Operations Research Letters, 37(1):1–6, 2009.
- Robust Optimization, volume 28. Princeton University Press, 2009.
- Smooth monotone stochastic variational inequalities and saddle point problems: A survey. European Mathematical Society Magazine, (127):15–28, 2023.
- Alternating proximal-gradient steps for (stochastic) nonconvex-concave minimax problems. arXiv preprint arXiv:2007.13605, 2020.
- Minibatch forward-backward-forward methods for solving stochastic variational inequalities. Stochastic Systems, 11(2):112–139, 2021.
- Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
- Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311, 2018.
- Rtrmc: A Riemannian trust-region method for low-rank matrix completion. Advances in Neural Information Processing Systems, 24, 2011.
- Accelerated linear convergence of stochastic momentum methods in Wasserstein distances. In International Conference on Machine Learning, pages 891–901. PMLR, 2019.
- A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40:120–145, 2011.
- LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1–27, 2011.
- Differential privacy dynamics of Langevin diffusion and noisy gradient descent. Advances in Neural Information Processing Systems, 34:14771–14781, 2021.
- Independent policy gradient methods for competitive reinforcement learning. Advances in Neural Information Processing Systems, 33:5527–5540, 2020.
- An optimal multistage stochastic gradient method for minimax problems. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 3573–3579. IEEE, 2020.
- Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476. PMLR, 2018.
- Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.
- Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
- Clipped stochastic methods for variational inequalities with heavy-tailed noise. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 31319–31332, 2022.
- Extragradient method: 𝒪(1/k)𝒪1𝑘\mathcal{O}(1/k)caligraphic_O ( 1 / italic_k ) last-iterate convergence for monotone variational inequalities and connections with cocoercivity. In International Conference on Artificial Intelligence and Statistics, pages 366–402, 2022.
- High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise. arXiv preprint arXiv:2310.01860, 2023.
- Cyclic and randomized stepsizes invoke heavier tails in SGD than constant stepsize. Transactions on Machine Learning Research, August 2023.
- A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. Journal of Optimization Theory and Applications, 194(3):1014–1041, 2022.
- The heavy-tail phenomenon in SGD. In International Conference on Machine Learning, pages 3964–3975. PMLR, 2021.
- Result analysis of the NIPS2003 feature selection challenge. Advances in Neural Information Processing Systems, 17, 2004.
- Online matrix completion with side information. Advances in Neural Information Processing Systems, 33:20402–20414, 2020.
- On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32, 2019.
- Accelerated zeroth-order and first-order momentum methods from mini to minimax optimization. Journal of Machine Learning Research, 23(36):1–70, 2022.
- Efficient mirror descent ascent methods for nonsmooth minimax problems. Advances in Neural Information Processing Systems, 34, 2021.
- Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
- Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization. In Advances in Neural Information Processing Systems, 2022.
- Differentially private accelerated optimization algorithms. SIAM Journal on Optimization, 32(2):795–821, 2022.
- High probability and risk-averse guarantees for stochastic saddle point problems. arXiv preprint arXiv:2304.00444, 2023.
- Guanghui Lan. First-order and stochastic optimization methods for machine learning, volume 1. Springer, 2020.
- General procedure to provide high-probability guarantees for stochastic saddle point problems. arXiv preprint arXiv:2405.03219, 2024.
- Complexity lower bounds for nonconvex-strongly-concave min-max optimization. Advances in Neural Information Processing Systems, 34:1792–1804, 2021.
- On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning, pages 6083–6093. PMLR, 2020.
- Near-optimal algorithms for minimax optimization. In Conference on Learning Theory, pages 2738–2779. PMLR, 2020.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022. Special Issue on Harmonic Analysis and Machine Learning.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems. arXiv preprint arXiv:2307.07113, 2023.
- Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems, pages 2208–2216, 2016.
- Huber loss-based penalty approach to problems with linear constraints. arXiv preprint arXiv:2311.00874, 2023.
- Solving a class of non-convex min-max games using iterative first order methods. Advances in Neural Information Processing Systems, 32, 2019.
- Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems, pages 1416–1424, 2016.
- DTP AIDS Antiviral Screen program. Sido0: a phamacology dataset. https://www.causality.inf.ethz.ch/data/SIDO.html, 2008.
- Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060, 1810.
- Harish Rajagopal. Multistage step size scheduling for minimax problems. Master’s thesis, ETH Zurich. URL: https://www.research-collection.ethz.ch/handle/20.500.11850/572991, 2022.
- Nonconvex min-max optimization: Applications, challenges, and recent theoretical advances. IEEE Signal Processing Magazine, 37(5):55–66, 2020.
- High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance. In International Conference on Machine Learning, pages 29563–29648. PMLR, 2023.
- A tail-index analysis of stochastic gradient noise in deep neural networks. In International Conference on Machine Learning, pages 5827–5837. PMLR, 2019.
- Complete dictionary recovery over the sphere ii: Recovery by riemannian trust-region method. IEEE Transactions on Information Theory, 63(2):885–914, 2017.
- J v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, 1928.
- Solving min-max optimization with hidden structure via gradient descent ascent. Advances in Neural Information Processing Systems, 34:2373–2386, 2021.
- Improved algorithms for convex-concave minimax optimization. Advances in Neural Information Processing Systems, 33:4800–4810, 2020.
- Maximum margin clustering. In Advances in Neural Information Processing systems, pages 1537–1544, 2005.
- A stochastic gda method with backtracking for solving nonconvex (strongly) concave minimax problems. arXiv preprint arXiv:2403.07806, 2024.
- A Stochastic GDA Method With Backtracking For Solving Nonconvex (Strongly) Concave Minimax Problems. arXiv e-prints, page arXiv:2403.07806, March 2024.
- Optimal epoch stochastic gradient descent ascent methods for min-max optimization. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5789–5800, 2020.
- Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. Advances in Neural Information Processing Systems, 33:1153–1165, 2020.
- Faster single-loop algorithms for minimax optimization without strong concavity. In International Conference on Artificial Intelligence and Statistics, pages 5485–5517. PMLR, 2022.
- Accelerated minimax algorithms flock together. arXiv preprint arXiv:2205.11093, 2022.
- Large-scale robust deep auc maximization: A new surrogate loss and empirical studies on medical image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3040–3049, 2021.
- An accelerated stochastic ADMM for nonconvex and nonsmooth finite-sum optimization. arXiv preprint arXiv:2306.05899, 2023.
- A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems. Advances in Neural Information Processing Systems, 33:7377–7389, 2020.
- The complexity of nonconvex-strongly-concave minimax optimization. In Uncertainty in Artificial Intelligence, pages 482–492. PMLR, 2021.
- SAPD+: An accelerated stochastic method for nonconvex-concave minimax problems. Advances in Neural Information Processing Systems, 35:21668–21681, 2022.
- Robust accelerated primal-dual methods for computing saddle points. SIAM Journal on Optimization, 34(1):1097–1130, 2024.
- Stochastic primal-dual coordinate method for regularized empirical risk minimization. The Journal of Machine Learning Research, 18(1):2939–2980, 2017.
- Distributionally robust learning with weakly convex losses: Convergence rates and finite-sample guarantees. arXiv preprint arXiv:2301.06619, 2023.