Anchored GDA: Stable Min-Max Optimization

Updated 7 April 2026

Anchored Gradient Descent Ascent (Anchored GDA) is a first-order algorithm that introduces an anchoring term to stabilize iterates in smooth convex-concave min-max problems.
It achieves an optimal O(1/t) last-iterate convergence rate under monotonicity and K-smoothness, ensuring bounded iterates and improved stability.
The method outlines practical guidelines for parameter selection and offers robust variants applicable to GAN training, adversarial robustness, and fair machine learning.

Anchored Gradient Descent Ascent (Anchored GDA) denotes a family of first-order algorithms for min-max and minimax optimization—especially in smooth convex-concave and suitably monotone (or weakly monotone) settings—distinguished by the addition of an explicit anchoring term. This term, which pulls iterates toward a reference point (typically the initial iterate or a prox-center), is designed to stabilize the algorithm, ensure boundedness of iterates, and yield faster last-iterate convergence rates compared to ordinary Gradient Descent Ascent (GDA) or extragradient-type methods. Anchored GDA is theoretically distinguished by its optimal $\mathcal{O}(1/t)$ last-iterate convergence for squared gradient norm under monotonicity and smoothness, and it admits robust variants and extensions to semianchored or composite cases, including non-Euclidean geometries.

1. Formal Problem Setting

Anchored GDA is principally analyzed in the context of the standard smooth convex–concave min–max problem: $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ where $x \mapsto L(x, y)$ is convex for each fixed $y$ , and $y \mapsto L(x, y)$ is concave for each fixed $x$ (Surina et al., 4 Apr 2026).

The associated operator is

$G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$

for $z = (x, y) \in \mathbb{R}^{n+m}$ . Standard assumptions are:

Monotonicity: $⟨G(z)-G(w), z-w⟩ \ge 0$ , $\forall z,w$ .
$\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 0-smoothness: $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 1.
Existence of a saddle point: $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 2 with $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 3.

For composite minimax problems (as in structured nonconvex–nonconcave cases), this setup generalizes to

$\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 4

where $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 5 are proper convex (possibly nonsmooth), and $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 6 is $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 7-smooth with Lipschitz-continuous gradient (Lee et al., 2021). The critical object is the saddle-subdifferential operator $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 8.

2. Algorithmic Structure and Parameter Choices

Anchored GDA modifies standard GDA by introducing a contractive "anchor" term: $\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)$ 9 with $x \mapsto L(x, y)$ 0 as the anchor (typically the initial point).

In components: $x \mapsto L(x, y)$ 1 Parameter schedules for smooth monotone problems (Surina et al., 4 Apr 2026): $x \mapsto L(x, y)$ 2

Anchoring can take additional forms. In composite/semianchored settings, the maximization variable is anchored at a reference point (e.g., semi-anchoring the $x \mapsto L(x, y)$ 3 update at $x \mapsto L(x, y)$ 4) and then solved via a Bregman or quadratic proximal subproblem (Lee et al., 2021).

3. Theoretical Guarantees and Rates

Anchored GDA achieves the following last-iterate rate under monotonicity and $x \mapsto L(x, y)$ 5-smoothness, with scheduled parameters as above:

Theorem (Last-iterate $x \mapsto L(x, y)$ 6 rate) (Surina et al., 4 Apr 2026): Under the standard assumptions and $x \mapsto L(x, y)$ 7, $x \mapsto L(x, y)$ 8, $x \mapsto L(x, y)$ 9, for all $y$ 0,

$y$ 1

where $y$ 2 and $y$ 3 depends on $y$ 4, $y$ 5, and $y$ 6.

In semianchored/proximal-point variants under the Minty variational inequality (MVI, possibly with $y$ 7 to accommodate weak monotonicity), SA-MGDA obtains

$y$ 8

and, under strong MVI, a linear convergence rate for the Bregman divergence $y$ 9 (Lee et al., 2021).

The anchoring term is crucial for:

Ensuring overall boundedness of iterates ( $y \mapsto L(x, y)$ 0).
Inducing a contraction in consecutive differences $y \mapsto L(x, y)$ 1, yielding the $y \mapsto L(x, y)$ 2 rate instead of $y \mapsto L(x, y)$ 3 from prior ODE-based GDA analyses for $y \mapsto L(x, y)$ 4.

A comparative summary of anchoring and proximal approaches:

Method	Anchor Mechanism	Rate	Structural Assumptions
Anchored GDA (Surina et al., 4 Apr 2026)	Explicit anchor at $y \mapsto L(x, y)$ 5	$y \mapsto L(x, y)$ 6	Monotone, smooth
Ryu–Yuan–Yin GDA	ODE-inspired, $y \mapsto L(x, y)$ 7-dependent params	$y \mapsto L(x, y)$ 8	Monotone, smooth
SA-MGDA (Lee et al., 2021)	Anchor $y \mapsto L(x, y)$ 9 via first-order step	$x$ 0 ( $x$ 1)	Weak or strong MVI
Multi-step GDA (MGDA)	No stable anchor	No rate/bad stability	Fails for weak MVI/bilinear
PDHG	Two-block Bregman prox-point	Optimal for bilinear	Bilinear structure
Extragradient	No anchor, extra-gradient steps	Requires additional assumptions	Weak monotonicity (no composite terms)

Anchored GDA leverages an anchor term to achieve a sharp rate without additional compactness or strong convexity/concavity assumptions, resolving a previous open problem (Surina et al., 4 Apr 2026). SA-MGDA generalizes the proximal point and PDHG frameworks to the non-Euclidean composite setting (Lee et al., 2021).

5. Practical Guidelines and Implementation

For monotone smooth min-max problems, set: $x$ 2 Selecting $x$ 3 (minimum) tightens the leading constant.

In semi-anchored variants, the reference anchor is updated each iteration via a first-order step. Proximal or Bregman distances may be used to match the geometry:

For composite terms $x$ 4, efficient computation of proximal mappings is essential.
If $x$ 5 is smooth-adaptable with respect to a Legendre function $x$ 6, non-Euclidean mirror descent can be employed, replacing Euclidean proximity by relative distances $x$ 7 (Lee et al., 2021).

Parameter $x$ 8 (proximal stepsize) is set as $x$ 9. When $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 0 is unknown, backtracking line search (reducing $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 1 by factors $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 2 until a Bregman non-expansiveness criterion holds) ensures convergence without knowledge of $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 3.

For inner-loop subproblems (e.g., ascent in $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 4), the number $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 5 of inner steps is chosen as $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 6 to achieve $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 7-accuracy with a total complexity of $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 8 gradient evaluations.

6. Limitations, Open Directions, and Extensions

Constants $G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}$ 9 and $z = (x, y) \in \mathbb{R}^{n+m}$ 0 in the last-iterate bound $z = (x, y) \in \mathbb{R}^{n+m}$ 1 depend on $z = (x, y) \in \mathbb{R}^{n+m}$ 2 and $z = (x, y) \in \mathbb{R}^{n+m}$ 3, which may be a priori unknown (Surina et al., 4 Apr 2026).

Known limitations and avenues for further research:

Extending anchored GDA and SA-MGDA to stochastic gradient settings, or to non-monotone and nonconvex–nonconcave minimax problems, remains open.
Further acceleration (e.g., achieving $z = (x, y) \in \mathbb{R}^{n+m}$ 4 rates) likely requires multi-call, extragradient, or additional regularization steps.
In practice, initialization away from an optimal anchor or incorrect step-size selection can degrade observed rates.
For composite settings, efficient proximal oracles may not always be available, constraining practical deployment.

A plausible implication is that anchoring, in both explicit and semi-anchored forms, constitutes a robust and unifying principle underlying a spectrum of monotone min-max first-order algorithms, bridging extragradient, PDHG, and Bregman-proximal approaches.

7. Empirical Results and Applications

Empirical studies demonstrate that SA-MGDA stabilizes Multi-step GDA on toy bilinear instances and outperforms vanilla MGDA (with or without strong-concavity regularization) in fair-classification benchmarks (Lee et al., 2021). The approach is applicable to training generative adversarial networks, adversarial robustness, and fair machine learning. Anchored GDA has not yet been widely applied to large-scale stochastic or deep learning settings; extensions in this direction remain important future work.

Markdown Report Issue Upgrade to Chat

References (2)

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent (2026)

Semi-Anchored Multi-Step Gradient Descent Ascent Method for Structured Nonconvex-Nonconcave Composite Minimax Problems (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anchored Gradient Descent Ascent (Anchored GDA).