Papers
Topics
Authors
Recent
Search
2000 character limit reached

Alternating-Extrapolation GDA (Alex-GDA)

Updated 13 April 2026
  • Alex-GDA is a novel minimax optimization framework that alternates updates on primal and dual variables using extrapolated gradients.
  • It achieves global linear convergence on SCSC and bilinear objectives with iteration complexity matching extragradient methods but using only two gradient evaluations per iteration.
  • The framework unifies simultaneous and alternating GDA approaches, offering rigorous theoretical guarantees alongside practical efficiency in optimization tasks.

Alternating-Extrapolation Gradient Descent-Ascent (Alex-GDA) is a general algorithmic framework for minimax optimization that extends and interpolates between standard simultaneous and alternating gradient schemes. Its essential innovation is to alternately update primal (xx) and dual (yy) variables using gradients evaluated at extrapolated points—thereby achieving iteration complexity and convergence properties of extragradient methods, but with significantly reduced computational cost. Alex-GDA is specifically formulated to bridge the theory-practice divide in minimax optimization, providing rigorous guarantees and empirical efficacy on smooth, strongly-convex–strongly-concave (SCSC) and bilinear objectives (Lee et al., 2024).

1. Problem Setting and Function Classes

Alex-GDA is designed for the unconstrained minimax optimization problem: minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).

Two canonical objective classes are considered:

  • Strongly-Convex–Strongly-Concave (SCSC) Objectives: fC2f\in C^2 is μx\mu_x-strongly convex in xx and μy\mu_y-strongly concave in yy, with Lipschitz gradient bounds

xf(x,y)xf(x,y)Lxxx,\|\nabla_x f(x',y) - \nabla_x f(x,y)\| \leq L_x \|x'-x\|,

yf(x,y)yf(x,y)Lyyy,\|\nabla_y f(x,y') - \nabla_y f(x,y)\| \leq L_y \|y'-y\|,

and cross-term smoothness

yy0

with analogous bounds for yy1 in yy2. Associated condition numbers: yy3, yy4, yy5.

  • Bilinear Objectives: yy6 for yy7, with nonzero singular values in yy8. The Nash equilibrium is non-unique and any yy9 with minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).0, minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).1 is optimal.

2. Algorithmic Structure of Alex-GDA

Alex-GDA generalizes both simultaneous (Sim-GDA) and alternating (Alt-GDA) GDA methods by utilizing extrapolation in both minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).2 and minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).3 updates. The algorithm employs step sizes minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).4 and extrapolation parameters minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).5. The iterative updates are:

minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).6

where minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).7, and initial points minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).8 are provided. Extrapolation factors minxRdxmaxyRdyf(x,y).\min_{x\in\mathbb{R}^{d_x}}\,\max_{y\in\mathbb{R}^{d_y}}\,f(x,y).9, fC2f\in C^20 allow Alex-GDA to interpolate between Sim-GDA (fC2f\in C^21), Alt-GDA (fC2f\in C^22), and the extra-gradient regime (fC2f\in C^23).

3. Convergence Theory: SCSC Objectives

For SCSC smooth objectives, Alex-GDA achieves global linear convergence with optimal rate scaling. Selection of fC2f\in C^24, fC2f\in C^25 and step sizes

fC2f\in C^26

guarantees a geometric decay in the Lyapunov function

fC2f\in C^27

satisfying fC2f\in C^28, fC2f\in C^29. The iteration complexity to reach μx\mu_x0 is

μx\mu_x1

This matches the optimal complexity of extragradient (EG) methods but at half the gradient evaluation cost (Lee et al., 2024).

4. Performance Comparison: Rates and Gradient Costs

A comparative summary of iteration complexities and per-iteration gradient costs among principal algorithms is:

Algorithm Iteration Complexity Gradients/Iteration
Simultaneous GDA μx\mu_x2 2
Alternating GDA μx\mu_x3 2
Alex-GDA (μx\mu_x4) μx\mu_x5 2
Extragradient (EG) μx\mu_x6 4

Alex-GDA thus uniquely matches the optimal iteration complexity of extragradient methods, but with only two gradient evaluations per iteration—half of what is required for EG. This provides both computational efficiency and theoretical optimality in iteration count.

5. Linear Convergence in Bilinear Games

For bilinear objectives μx\mu_x7 with matrix μx\mu_x8 (singular values in μx\mu_x9), Alex-GDA exhibits linear convergence, which is unattainable with Sim-GDA (which diverges) or Alt-GDA (which cycles). For xx0 and step sizes such that:

  • If xx1,

xx2

  • Otherwise,

xx3

all eigenvalues of the iteration matrix reside in xx4, so the iterates converge linearly with

xx5

This geometric convergence is established via analysis of the linear iteration's spectral radius, employing SVD reduction and Routh-Hurwitz conditions.

6. Practical Considerations and Empirical Behavior

  • Gradient and Memory Cost: Alex-GDA requires two gradient evaluations per iteration—one of xx6 and one of xx7. This is half that of EG or OGD.
  • Step Size Selection: For SCSC, set xx8, xx9. In the bilinear regime with μy\mu_y0, optimal choices are μy\mu_y1 and μy\mu_y2.
  • Empirical Results: On a μy\mu_y3-dimensional SCSC quadratic game (with μy\mu_y4, μy\mu_y5), Alex-GDA outperforms Sim-GDA and Alt-GDA, matches EG and OGD, and can be tuned to marginally exceed EG performance. In bilinear games, Sim-GDA diverges and Alt-GDA cycles, while Alex-GDA converges linearly and is gradient-matched with EG/OGD.
  • Broader Significance: Alex-GDA bridges the empirical benefits of alternating updates commonly observed in minimax optimization and adversarial learning with global convergence guarantees, and does so using the most economical alternating update strategy documented (Lee et al., 2024).

7. Summary and Impact

Alternating-Extrapolation GDA (Alex-GDA) constitutes a general framework that unifies and accelerates minimax optimization methods. It achieves the accelerated μy\mu_y6 rate of extragradient techniques through judicious use of alternation and extrapolation, all while maintaining minimal gradient and memory costs. Alex-GDA applies to a broad range of settings—most notably where existing alternating or simultaneous GDA variants are provably suboptimal or divergent—thus providing both a theoretical and practical cornerstone for future development in minimax optimization (Lee et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Alternating-Extrapolation GDA (Alex-GDA).