Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anchored GDA: Stable Min-Max Optimization

Updated 7 April 2026
  • Anchored Gradient Descent Ascent (Anchored GDA) is a first-order algorithm that introduces an anchoring term to stabilize iterates in smooth convex-concave min-max problems.
  • It achieves an optimal O(1/t) last-iterate convergence rate under monotonicity and K-smoothness, ensuring bounded iterates and improved stability.
  • The method outlines practical guidelines for parameter selection and offers robust variants applicable to GAN training, adversarial robustness, and fair machine learning.

Anchored Gradient Descent Ascent (Anchored GDA) denotes a family of first-order algorithms for min-max and minimax optimization—especially in smooth convex-concave and suitably monotone (or weakly monotone) settings—distinguished by the addition of an explicit anchoring term. This term, which pulls iterates toward a reference point (typically the initial iterate or a prox-center), is designed to stabilize the algorithm, ensure boundedness of iterates, and yield faster last-iterate convergence rates compared to ordinary Gradient Descent Ascent (GDA) or extragradient-type methods. Anchored GDA is theoretically distinguished by its optimal O(1/t)\mathcal{O}(1/t) last-iterate convergence for squared gradient norm under monotonicity and smoothness, and it admits robust variants and extensions to semianchored or composite cases, including non-Euclidean geometries.

1. Formal Problem Setting

Anchored GDA is principally analyzed in the context of the standard smooth convex–concave min–max problem: minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y) where xL(x,y)x \mapsto L(x, y) is convex for each fixed yy, and yL(x,y)y \mapsto L(x, y) is concave for each fixed xx (Surina et al., 4 Apr 2026).

The associated operator is

G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}

for z=(x,y)Rn+mz = (x, y) \in \mathbb{R}^{n+m}. Standard assumptions are:

  • Monotonicity: G(z)G(w),zw0⟨G(z)-G(w), z-w⟩ \ge 0, z,w\forall z,w.
  • minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)0-smoothness: minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)1.
  • Existence of a saddle point: minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)2 with minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)3.

For composite minimax problems (as in structured nonconvex–nonconcave cases), this setup generalizes to

minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)4

where minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)5 are proper convex (possibly nonsmooth), and minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)6 is minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)7-smooth with Lipschitz-continuous gradient (Lee et al., 2021). The critical object is the saddle-subdifferential operator minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)8.

2. Algorithmic Structure and Parameter Choices

Anchored GDA modifies standard GDA by introducing a contractive "anchor" term: minxRn  maxyRmL(x,y)\min_{x\in\mathbb{R}^n}\;\max_{y\in\mathbb{R}^m} L(x, y)9 with xL(x,y)x \mapsto L(x, y)0 as the anchor (typically the initial point).

In components: xL(x,y)x \mapsto L(x, y)1 Parameter schedules for smooth monotone problems (Surina et al., 4 Apr 2026): xL(x,y)x \mapsto L(x, y)2

Anchoring can take additional forms. In composite/semianchored settings, the maximization variable is anchored at a reference point (e.g., semi-anchoring the xL(x,y)x \mapsto L(x, y)3 update at xL(x,y)x \mapsto L(x, y)4) and then solved via a Bregman or quadratic proximal subproblem (Lee et al., 2021).

3. Theoretical Guarantees and Rates

Anchored GDA achieves the following last-iterate rate under monotonicity and xL(x,y)x \mapsto L(x, y)5-smoothness, with scheduled parameters as above:

Theorem (Last-iterate xL(x,y)x \mapsto L(x, y)6 rate) (Surina et al., 4 Apr 2026): Under the standard assumptions and xL(x,y)x \mapsto L(x, y)7, xL(x,y)x \mapsto L(x, y)8, xL(x,y)x \mapsto L(x, y)9, for all yy0,

yy1

where yy2 and yy3 depends on yy4, yy5, and yy6.

In semianchored/proximal-point variants under the Minty variational inequality (MVI, possibly with yy7 to accommodate weak monotonicity), SA-MGDA obtains

yy8

and, under strong MVI, a linear convergence rate for the Bregman divergence yy9 (Lee et al., 2021).

The anchoring term is crucial for:

  • Ensuring overall boundedness of iterates (yL(x,y)y \mapsto L(x, y)0).
  • Inducing a contraction in consecutive differences yL(x,y)y \mapsto L(x, y)1, yielding the yL(x,y)y \mapsto L(x, y)2 rate instead of yL(x,y)y \mapsto L(x, y)3 from prior ODE-based GDA analyses for yL(x,y)y \mapsto L(x, y)4.

A comparative summary of anchoring and proximal approaches:

Method Anchor Mechanism Rate Structural Assumptions
Anchored GDA (Surina et al., 4 Apr 2026) Explicit anchor at yL(x,y)y \mapsto L(x, y)5 yL(x,y)y \mapsto L(x, y)6 Monotone, smooth
Ryu–Yuan–Yin GDA ODE-inspired, yL(x,y)y \mapsto L(x, y)7-dependent params yL(x,y)y \mapsto L(x, y)8 Monotone, smooth
SA-MGDA (Lee et al., 2021) Anchor yL(x,y)y \mapsto L(x, y)9 via first-order step xx0 (xx1) Weak or strong MVI
Multi-step GDA (MGDA) No stable anchor No rate/bad stability Fails for weak MVI/bilinear
PDHG Two-block Bregman prox-point Optimal for bilinear Bilinear structure
Extragradient No anchor, extra-gradient steps Requires additional assumptions Weak monotonicity (no composite terms)

Anchored GDA leverages an anchor term to achieve a sharp rate without additional compactness or strong convexity/concavity assumptions, resolving a previous open problem (Surina et al., 4 Apr 2026). SA-MGDA generalizes the proximal point and PDHG frameworks to the non-Euclidean composite setting (Lee et al., 2021).

5. Practical Guidelines and Implementation

For monotone smooth min-max problems, set: xx2 Selecting xx3 (minimum) tightens the leading constant.

In semi-anchored variants, the reference anchor is updated each iteration via a first-order step. Proximal or Bregman distances may be used to match the geometry:

  • For composite terms xx4, efficient computation of proximal mappings is essential.
  • If xx5 is smooth-adaptable with respect to a Legendre function xx6, non-Euclidean mirror descent can be employed, replacing Euclidean proximity by relative distances xx7 (Lee et al., 2021).

Parameter xx8 (proximal stepsize) is set as xx9. When G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}0 is unknown, backtracking line search (reducing G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}1 by factors G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}2 until a Bregman non-expansiveness criterion holds) ensures convergence without knowledge of G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}3.

For inner-loop subproblems (e.g., ascent in G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}4), the number G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}5 of inner steps is chosen as G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}6 to achieve G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}7-accuracy with a total complexity of G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}8 gradient evaluations.

6. Limitations, Open Directions, and Extensions

Constants G(z)=(xL(x,y) yL(x,y))G(z) = \begin{pmatrix} \nabla_x L(x, y) \ -\nabla_y L(x, y) \end{pmatrix}9 and z=(x,y)Rn+mz = (x, y) \in \mathbb{R}^{n+m}0 in the last-iterate bound z=(x,y)Rn+mz = (x, y) \in \mathbb{R}^{n+m}1 depend on z=(x,y)Rn+mz = (x, y) \in \mathbb{R}^{n+m}2 and z=(x,y)Rn+mz = (x, y) \in \mathbb{R}^{n+m}3, which may be a priori unknown (Surina et al., 4 Apr 2026).

Known limitations and avenues for further research:

  • Extending anchored GDA and SA-MGDA to stochastic gradient settings, or to non-monotone and nonconvex–nonconcave minimax problems, remains open.
  • Further acceleration (e.g., achieving z=(x,y)Rn+mz = (x, y) \in \mathbb{R}^{n+m}4 rates) likely requires multi-call, extragradient, or additional regularization steps.
  • In practice, initialization away from an optimal anchor or incorrect step-size selection can degrade observed rates.
  • For composite settings, efficient proximal oracles may not always be available, constraining practical deployment.

A plausible implication is that anchoring, in both explicit and semi-anchored forms, constitutes a robust and unifying principle underlying a spectrum of monotone min-max first-order algorithms, bridging extragradient, PDHG, and Bregman-proximal approaches.

7. Empirical Results and Applications

Empirical studies demonstrate that SA-MGDA stabilizes Multi-step GDA on toy bilinear instances and outperforms vanilla MGDA (with or without strong-concavity regularization) in fair-classification benchmarks (Lee et al., 2021). The approach is applicable to training generative adversarial networks, adversarial robustness, and fair machine learning. Anchored GDA has not yet been widely applied to large-scale stochastic or deep learning settings; extensions in this direction remain important future work.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anchored Gradient Descent Ascent (Anchored GDA).