Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

Published 4 Apr 2026 in math.OC and cs.AI | (2604.03782v1)

Abstract: We analyze the last-iterate convergence of the Anchored Gradient Descent Ascent algorithm for smooth convex-concave min-max problems. While previous work established a last-iterate rate of $\mathcal{O}(1/t{2-2p})$ for the squared gradient norm, where $p \in (1/2, 1)$, it remained an open problem whether the improved exact $\mathcal{O}(1/t)$ rate is achievable. In this work, we resolve this question in the affirmative. This result was discovered autonomously by an AI system capable of writing formal proofs in Lean. The Lean proof can be accessed at https://github.com/google-deepmind/formal-conjectures/pull/3675/commits/a13226b49fd3b897f4c409194f3bcbeb96a08515

Summary

  • The paper introduces novel step-size schedules for Anchored GDA, achieving an optimal O(1/t) convergence rate for the squared gradient norm.
  • It employs a discrete, non-ergodic analysis with formal machine verification using Lean 4 to rigorously confirm the convergence proof.
  • The work offers practical benefits in robust and stochastic optimization settings by reducing variance and improving algorithm stability.

Improved Last-Iterate Rate for Anchored Gradient Descent Ascent in Convex-Concave Min-Max Problems

Problem Setting and Context

The paper addresses the last-iterate convergence behavior of the Anchored Gradient Descent Ascent (Anchored GDA) algorithm when applied to smooth convex-concave min-max objectives. Such problems, characterized by minxmaxyL(x,y)\min_x \max_y L(x, y) with LL convex in xx and concave in yy, appear pervasively in adversarial learning, domain adaptation, optimal transport, robust training, and fairness-aware learning. Traditional GDA suffers from inherent oscillations in this regime. Numerous alternatives (e.g., Extragradient [korpelevich1976extragradient], Optimistic GDA [popov1980modification], and Halpern iteration [Halpern1967FixedPO]) have been empirically and theoretically explored, but robust step-size design for efficient last-iterate convergence with a single gradient call remains a challenge, especially under stochastic gradients.

Anchored GDA introduces an anchoring term in the gradient-based update, pulling iterates towards a fixed point—typically the initialization. Prior theoretical results [ryu2019ode] established an O(1/t22p)\mathcal{O}(1/t^{2-2p}) rate (for p(1/2,1)p \in (1/2,1)) on the squared gradient norm at the last iterate, but the optimal non-ergodic O(1/t)\mathcal{O}(1/t) rate remained unresolved. The current work settles this question affirmatively, demonstrating that Anchored GDA achieves O(1/t)\mathcal{O}(1/t) last-iterate rates under standard monotonicity and Lipschitz assumptions.

Algorithmic Framework and Theoretical Advances

Let zt=(xt,yt)z_t = (x_t, y_t) denote the joint iterate at time tt, LL0 the anchor, and LL1 the gradient operator. Anchored GDA updates are given by

LL2

with time-dependent step-sizes LL3 and anchoring parameters LL4.

Key assumptions for the analysis are:

  • Monotonicity of LL5;
  • Lipschitz continuity of LL6 (constant LL7);
  • Existence of at least one saddle point LL8 (LL9).

Previous analyses (notably [ryu2019ode]) utilized parameter schedules of the form xx0, xx1 (with xx2) yielding a last-iterate convergence rate of xx3. This left the xx4 regime unresolved for the theoretically optimal xx5.

This work proposes revised schedules:

xx6

Detailed recurrence analysis in the discrete setting—eschewing continuous-time ODE analogies—yields the main result:

Theorem: Under standard assumptions, Anchored GDA with the above schedules satisfies for all xx7

xx8

where xx9 depends on the Lipschitz constant yy0, initialization yy1, and schedule parameter yy2.

The proof structure is:

  1. Boundedness of iterates: Establish a global bound on yy3;
  2. Iterate stability: Control the norm yy4 by exploiting the strong contraction from the anchoring;
  3. Non-ergodic convergence: Transfer these bounds to the gradient operator, yielding last-iterate guarantees.

Notably, the results do not require uniqueness of the saddle point, nor do they rely on averaged/ergodic iterates, which are less useful in non-convex landscapes and in stochastic optimization settings.

Numerical and Analytical Implications

The established yy5 rate for the squared gradient norm is tight for the class of monotone variational inequalities with Lipschitz continuous operators and is competitive with the best known rates for single-call algorithms in deterministic settings. The result also demonstrates that commonly adopted, intuitively motivated schedules (polynomial or sublinear) may attain strictly suboptimal asymptotics.

Anchored GDA further offers practical advantages in stochastic scenarios: the anchoring term serves as a variance reduction mechanism, mitigating the instability and noise amplification inherent in methods requiring double-sampling (e.g., Extragradient). This makes the approach particularly compelling for applications with high-variance gradient oracles, including GAN training, robust optimization, and reinforcement learning [goodfellow2014generative, madry2018towards, du2017stochastic].

Formal Verification and The Role of AI

A distinct aspect of this work is the formalization and machine-verification of the convergence proof using Lean 4, with the proof autonomously generated by a formal-mathematics AI agent developed at Google DeepMind. The analytic derivations (e.g., contraction lemmas, parameter schedule asymptotics) and non-asymptotic technical bounds are rigorously checked, advancing the state-of-the-art in verified optimization for non-ergodic, non-convex regimes.

This direction is likely to affect future theoretical work, both by setting higher standards for correctness in mathematical optimization, and by accelerating the discovery of intricate convergence results for new algorithmic schemes. The modularity of the analysis paves the way for extensions to structured monotone inclusions, tighter parameter tuning, and the formal exploration of stochastic rate results.

Conclusion

This work resolves the last-open question regarding the non-ergodic convergence rate of Anchored GDA for smooth convex-concave min-max objectives, establishing the scheme attains the optimal yy6 rate on the squared norm of the gradient operator at the last iterate. The proof is notable for its direct, discrete analysis and for being fully machine-verified by an autonomous agent, highlighting both the enduring utility of anchoring and the emerging role of AI in formal mathematical discovery. Implications extend to the principled design of single-call algorithms for robust, efficient equilibrium computation in game-theoretic machine learning problems. Future research will likely involve extending these guarantees to stochastic, non-monotone, or structured settings, facilitated by formal methods and AI-driven proof assistants.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.