Alternating-Extrapolation GDA (Alex-GDA)
- Alex-GDA is a novel minimax optimization framework that alternates updates on primal and dual variables using extrapolated gradients.
- It achieves global linear convergence on SCSC and bilinear objectives with iteration complexity matching extragradient methods but using only two gradient evaluations per iteration.
- The framework unifies simultaneous and alternating GDA approaches, offering rigorous theoretical guarantees alongside practical efficiency in optimization tasks.
Alternating-Extrapolation Gradient Descent-Ascent (Alex-GDA) is a general algorithmic framework for minimax optimization that extends and interpolates between standard simultaneous and alternating gradient schemes. Its essential innovation is to alternately update primal () and dual () variables using gradients evaluated at extrapolated points—thereby achieving iteration complexity and convergence properties of extragradient methods, but with significantly reduced computational cost. Alex-GDA is specifically formulated to bridge the theory-practice divide in minimax optimization, providing rigorous guarantees and empirical efficacy on smooth, strongly-convex–strongly-concave (SCSC) and bilinear objectives (Lee et al., 2024).
1. Problem Setting and Function Classes
Alex-GDA is designed for the unconstrained minimax optimization problem:
Two canonical objective classes are considered:
- Strongly-Convex–Strongly-Concave (SCSC) Objectives: is -strongly convex in and -strongly concave in , with Lipschitz gradient bounds
and cross-term smoothness
0
with analogous bounds for 1 in 2. Associated condition numbers: 3, 4, 5.
- Bilinear Objectives: 6 for 7, with nonzero singular values in 8. The Nash equilibrium is non-unique and any 9 with 0, 1 is optimal.
2. Algorithmic Structure of Alex-GDA
Alex-GDA generalizes both simultaneous (Sim-GDA) and alternating (Alt-GDA) GDA methods by utilizing extrapolation in both 2 and 3 updates. The algorithm employs step sizes 4 and extrapolation parameters 5. The iterative updates are:
6
where 7, and initial points 8 are provided. Extrapolation factors 9, 0 allow Alex-GDA to interpolate between Sim-GDA (1), Alt-GDA (2), and the extra-gradient regime (3).
3. Convergence Theory: SCSC Objectives
For SCSC smooth objectives, Alex-GDA achieves global linear convergence with optimal rate scaling. Selection of 4, 5 and step sizes
6
guarantees a geometric decay in the Lyapunov function
7
satisfying 8, 9. The iteration complexity to reach 0 is
1
This matches the optimal complexity of extragradient (EG) methods but at half the gradient evaluation cost (Lee et al., 2024).
4. Performance Comparison: Rates and Gradient Costs
A comparative summary of iteration complexities and per-iteration gradient costs among principal algorithms is:
| Algorithm | Iteration Complexity | Gradients/Iteration |
|---|---|---|
| Simultaneous GDA | 2 | 2 |
| Alternating GDA | 3 | 2 |
| Alex-GDA (4) | 5 | 2 |
| Extragradient (EG) | 6 | 4 |
Alex-GDA thus uniquely matches the optimal iteration complexity of extragradient methods, but with only two gradient evaluations per iteration—half of what is required for EG. This provides both computational efficiency and theoretical optimality in iteration count.
5. Linear Convergence in Bilinear Games
For bilinear objectives 7 with matrix 8 (singular values in 9), Alex-GDA exhibits linear convergence, which is unattainable with Sim-GDA (which diverges) or Alt-GDA (which cycles). For 0 and step sizes such that:
- If 1,
2
- Otherwise,
3
all eigenvalues of the iteration matrix reside in 4, so the iterates converge linearly with
5
This geometric convergence is established via analysis of the linear iteration's spectral radius, employing SVD reduction and Routh-Hurwitz conditions.
6. Practical Considerations and Empirical Behavior
- Gradient and Memory Cost: Alex-GDA requires two gradient evaluations per iteration—one of 6 and one of 7. This is half that of EG or OGD.
- Step Size Selection: For SCSC, set 8, 9. In the bilinear regime with 0, optimal choices are 1 and 2.
- Empirical Results: On a 3-dimensional SCSC quadratic game (with 4, 5), Alex-GDA outperforms Sim-GDA and Alt-GDA, matches EG and OGD, and can be tuned to marginally exceed EG performance. In bilinear games, Sim-GDA diverges and Alt-GDA cycles, while Alex-GDA converges linearly and is gradient-matched with EG/OGD.
- Broader Significance: Alex-GDA bridges the empirical benefits of alternating updates commonly observed in minimax optimization and adversarial learning with global convergence guarantees, and does so using the most economical alternating update strategy documented (Lee et al., 2024).
7. Summary and Impact
Alternating-Extrapolation GDA (Alex-GDA) constitutes a general framework that unifies and accelerates minimax optimization methods. It achieves the accelerated 6 rate of extragradient techniques through judicious use of alternation and extrapolation, all while maintaining minimal gradient and memory costs. Alex-GDA applies to a broad range of settings—most notably where existing alternating or simultaneous GDA variants are provably suboptimal or divergent—thus providing both a theoretical and practical cornerstone for future development in minimax optimization (Lee et al., 2024).