- The paper introduces novel step-size schedules for Anchored GDA, achieving an optimal O(1/t) convergence rate for the squared gradient norm.
- It employs a discrete, non-ergodic analysis with formal machine verification using Lean 4 to rigorously confirm the convergence proof.
- The work offers practical benefits in robust and stochastic optimization settings by reducing variance and improving algorithm stability.
Improved Last-Iterate Rate for Anchored Gradient Descent Ascent in Convex-Concave Min-Max Problems
Problem Setting and Context
The paper addresses the last-iterate convergence behavior of the Anchored Gradient Descent Ascent (Anchored GDA) algorithm when applied to smooth convex-concave min-max objectives. Such problems, characterized by minxmaxyL(x,y) with L convex in x and concave in y, appear pervasively in adversarial learning, domain adaptation, optimal transport, robust training, and fairness-aware learning. Traditional GDA suffers from inherent oscillations in this regime. Numerous alternatives (e.g., Extragradient [korpelevich1976extragradient], Optimistic GDA [popov1980modification], and Halpern iteration [Halpern1967FixedPO]) have been empirically and theoretically explored, but robust step-size design for efficient last-iterate convergence with a single gradient call remains a challenge, especially under stochastic gradients.
Anchored GDA introduces an anchoring term in the gradient-based update, pulling iterates towards a fixed point—typically the initialization. Prior theoretical results [ryu2019ode] established an O(1/t2−2p) rate (for p∈(1/2,1)) on the squared gradient norm at the last iterate, but the optimal non-ergodic O(1/t) rate remained unresolved. The current work settles this question affirmatively, demonstrating that Anchored GDA achieves O(1/t) last-iterate rates under standard monotonicity and Lipschitz assumptions.
Algorithmic Framework and Theoretical Advances
Let zt=(xt,yt) denote the joint iterate at time t, L0 the anchor, and L1 the gradient operator. Anchored GDA updates are given by
L2
with time-dependent step-sizes L3 and anchoring parameters L4.
Key assumptions for the analysis are:
- Monotonicity of L5;
- Lipschitz continuity of L6 (constant L7);
- Existence of at least one saddle point L8 (L9).
Previous analyses (notably [ryu2019ode]) utilized parameter schedules of the form x0, x1 (with x2) yielding a last-iterate convergence rate of x3. This left the x4 regime unresolved for the theoretically optimal x5.
This work proposes revised schedules:
x6
Detailed recurrence analysis in the discrete setting—eschewing continuous-time ODE analogies—yields the main result:
Theorem: Under standard assumptions, Anchored GDA with the above schedules satisfies for all x7
x8
where x9 depends on the Lipschitz constant y0, initialization y1, and schedule parameter y2.
The proof structure is:
- Boundedness of iterates: Establish a global bound on y3;
- Iterate stability: Control the norm y4 by exploiting the strong contraction from the anchoring;
- Non-ergodic convergence: Transfer these bounds to the gradient operator, yielding last-iterate guarantees.
Notably, the results do not require uniqueness of the saddle point, nor do they rely on averaged/ergodic iterates, which are less useful in non-convex landscapes and in stochastic optimization settings.
Numerical and Analytical Implications
The established y5 rate for the squared gradient norm is tight for the class of monotone variational inequalities with Lipschitz continuous operators and is competitive with the best known rates for single-call algorithms in deterministic settings. The result also demonstrates that commonly adopted, intuitively motivated schedules (polynomial or sublinear) may attain strictly suboptimal asymptotics.
Anchored GDA further offers practical advantages in stochastic scenarios: the anchoring term serves as a variance reduction mechanism, mitigating the instability and noise amplification inherent in methods requiring double-sampling (e.g., Extragradient). This makes the approach particularly compelling for applications with high-variance gradient oracles, including GAN training, robust optimization, and reinforcement learning [goodfellow2014generative, madry2018towards, du2017stochastic].
A distinct aspect of this work is the formalization and machine-verification of the convergence proof using Lean 4, with the proof autonomously generated by a formal-mathematics AI agent developed at Google DeepMind. The analytic derivations (e.g., contraction lemmas, parameter schedule asymptotics) and non-asymptotic technical bounds are rigorously checked, advancing the state-of-the-art in verified optimization for non-ergodic, non-convex regimes.
This direction is likely to affect future theoretical work, both by setting higher standards for correctness in mathematical optimization, and by accelerating the discovery of intricate convergence results for new algorithmic schemes. The modularity of the analysis paves the way for extensions to structured monotone inclusions, tighter parameter tuning, and the formal exploration of stochastic rate results.
Conclusion
This work resolves the last-open question regarding the non-ergodic convergence rate of Anchored GDA for smooth convex-concave min-max objectives, establishing the scheme attains the optimal y6 rate on the squared norm of the gradient operator at the last iterate. The proof is notable for its direct, discrete analysis and for being fully machine-verified by an autonomous agent, highlighting both the enduring utility of anchoring and the emerging role of AI in formal mathematical discovery. Implications extend to the principled design of single-call algorithms for robust, efficient equilibrium computation in game-theoretic machine learning problems. Future research will likely involve extending these guarantees to stochastic, non-monotone, or structured settings, facilitated by formal methods and AI-driven proof assistants.