On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems (1906.00331v10)

Published 2 Jun 2019 in cs.LG, math.OC, and stat.ML

Abstract: We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.

Authors (3)

Tianyi Lin (50 papers)
Chi Jin (90 papers)
Michael I. Jordan (438 papers)

Citations (473)

View on Semantic Scholar

Summary

Nonconvex-Concave Minimax Optimization via Two-Timescale GDA

The paper explores the gradient descent ascent (GDA) algorithm tailored for nonconvex-concave minimax problems. This involves minimizing over a nonconvex function and maximizing over a concave function, a formulation pervasive in areas like machine learning and economics. The paper investigates the GDA's performance when applied with unequal stepsizes, referred to as two-timescale GDA, and provides a nonasymptotic analysis, highlighting its effectiveness, particularly in the context of training generative adversarial networks (GANs).

Key Contributions

Nonconvex-Concave Minimax Framework: The work addresses the challenge of solving problems where the objective $f$ is nonconvex in one variable while concave in another, with a constraint set that is convex. The authors focus on a scenario where traditional equal stepsize GDA may suffer from issues like divergence or limit cycles.
Two-Timescale Algorithm: By allowing the stepsizes for gradient descent and ascent to differ, the two-timescale GDA aims to stabilize the convergence process. The paper claims this variant efficiently finds a stationary point of the surrogate function $\Phi(\cdot) = \max_y f(x, y)$ .
Complexity Analysis: The authors provide substantial theoretical guarantees, offering complexity results showing that two-timescale GDA requires $O(\kappa^2\epsilon^{-2})$ gradient evaluations to achieve an $\epsilon$ -stationary point in nonconvex-strongly-concave settings. These findings are extended to stochastic settings where the two-timescale SGDA involves $O(\kappa^3\epsilon^{-4})$ stochastic gradient evaluations.
Addressing the Nonconvex-Concave Case: For general nonconvex-concave problems, the paper introduces methods to handle weak convexity, achieving a gradient complexity of $O(\epsilon^{-6})$ for deterministic and $O(\epsilon^{-8})$ for stochastic scenarios. This provides evidence for the efficacy of two-timescale dynamics outside convex-concave cases.
Technical Rigor: The analysis includes developing estimates for error terms and demonstrating how they can be effectively managed through proper stepsize choices and recursive formulations. The authors use a blend of convex analysis and perturbation techniques to establish these results.

Implications and Future Directions

The implications of this research are twofold. Practically, it offers an enhanced tool for training GANs—a pivotal area in deep learning. Theoretically, this paper enriches the understanding of gradient-based methods for nonconvex-concave scenarios, challenging the assumption that such problems are intractable without nested loops or modified objectives.

Future explorations could delve into optimizing the ratios of stepsizes further, or extending the approach to more complex, dynamic game-theoretic contexts. Moreover, while the paper focuses on a particular subset of nonconvex-nonconcave problems, its techniques could inspire broader applications within multi-agent systems and adversarial training paradigms.

In essence, this paper makes a compelling case for two-timescale GDA as a viable strategy for nonconvex-concave minimax optimization, reinforcing the haLLMark of flexibility and efficiency in areas necessitating robust multi-objective solutions.

PDF Markdown

On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems (1906.00331v10)

Summary

Nonconvex-Concave Minimax Optimization via Two-Timescale GDA

Key Contributions

Implications and Future Directions

Related Papers