Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 109 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Extragradient Method: Convergence Guarantees

Updated 11 November 2025
  • Extragradient Method is a two-step prediction-correction algorithm designed to solve monotone variational inequalities and saddle-point problems.
  • It guarantees an O(1/K) convergence rate for the last iterate’s squared norm without needing extra assumptions like strong monotonicity or Jacobian smoothness.
  • Comparative analysis reveals that while EG retains robust convergence in general settings, alternative methods like optimistic gradient may lack the necessary cocoercivity properties.

The extragradient method is a foundational first-order algorithm for solving monotone variational inequalities, saddle-point problems, and root-finding problems involving monotone and Lipschitz operators. Its main appeal stems from robustness and the ability to achieve sharp convergence guarantees even when classical conditions like strong monotonicity or Jacobian smoothness are absent.

1. Variational Inequality Framework and the Extragradient Iteration

Consider the unconstrained variational inequality problem: find xRdx^* \in \mathbb{R}^d such that F(x)=0F(x^*) = 0, where F:RdRdF: \mathbb{R}^d \rightarrow \mathbb{R}^d is monotone and LL-Lipschitz: F(x)F(y),xy0x,yRd,F(x)F(y)Lxy.\langle F(x) - F(y), x - y \rangle \geq 0 \quad \forall x, y \in \mathbb{R}^d, \qquad \|F(x) - F(y)\| \leq L\|x - y\|. The classical extragradient method (EG), introduced by Korpelevich (1976), proceeds at iteration kk with a step size η>0\eta>0 as: xk+12=xkηF(xk), xk+1=xkηF(xk+12).\begin{aligned} x_{k+\frac{1}{2}} & = x_k - \eta F(x_k), \ x_{k+1} & = x_k - \eta F(x_{k+\frac{1}{2}}). \end{aligned} This two-step scheme computes a forward step (prediction) followed by a correction at the extrapolated point, effectively mitigating the cycling and instability endemic to gradient descent when FF is monotone but not strongly so.

2. Last-Iterate O(1/K)O(1/K) Rate and Analysis under Minimal Smoothness

A central contribution is the establishment of an O(1/K)O(1/K) rate for the squared norm of the operator at the last iterate—not just averaged or best iterate—without additional jacobian smoothness or cocoercivity assumptions (Gorbunov et al., 2021).

Main result: Let FF be monotone and LL-Lipschitz. For η(0,1/(2L))\eta \in (0, 1/(2L)) and all K0K\geq 0,

F(xK)2x0x2η2(1L2η2)(K+1)\|F(x_K)\|^2 \leq \frac{\|x_0-x^*\|^2}{\eta^2(1-L^2\eta^2)(K+1)}

and the gap function

GapF(xK)=supyRdF(y),xKyx0xη(1L2η2)1/2(K+1).\operatorname{Gap}_F(x_K) = \sup_{y\in \mathbb{R}^d} \langle F(y), x_K - y \rangle \leq \frac{\|x_0 - x^*\|}{\eta(1-L^2\eta^2)^{1/2}(K+1)}.

Proof strategy:

  • Non-increasing operator norm: Lemma 3.2 shows F(xk+1)F(xk)\|F(x_{k+1})\| \leq \|F(x_k)\| for η1/(2L)\eta \leq 1/(2L), based on performance estimation summing weighted monotonicity and Lipschitz inequalities.
  • Descent in solution distance: For all kk,

xkx2xk+1x2η2(1L2η2)F(xk)2.\|x_k - x^*\|^2 - \|x_{k+1} - x^*\|^2 \geq \eta^2(1-L^2\eta^2)\|F(x_k)\|^2.

  • Telescoping: Summing over kk yields a global bound on kF(xk)2\sum_k \|F(x_k)\|^2, and by monotonicity, the last iterate term achieves the O(1/K)O(1/K) rate.

Crucially, no assumption involving the Jacobian of FF or additional smoothness is needed beyond monotonicity and LL-Lipschitz continuity.

3. Cocoercivity Structure and Limits

The convergence properties of extragradient-type methods are often linked to cocoercivity—a property that ensures nonexpansiveness of the operator IFI - \ell F. The analysis in this context reveals several sharp distinctions:

  • Cocoercivity for Affine Operators: For affine F(x)=Ax+bF(x)=Ax+b that is monotone and LL-Lipschitz, the EG update operator FEG,η(x):=F(xηF(x))F_{EG,\eta}(x) := F(x-\eta F(x)) is 2/η2/\eta-cocoercive for η<1/L\eta < 1/L.
  • Non-cocoercivity in General: For generic monotone LL-Lipschitz FF, EG is not \ell-cocoercive for any >0\ell>0 and any η>0\eta>0. Explicit counterexamples show the induced mapping can be expansive, sharply distinguishing the limits of operator-theoretic interpretations.
  • Star-cocoercivity: When FF is merely star-monotone (i.e., F(x),xx0\langle F(x), x-x^*\rangle \geq 0 for all xx), then FEG,ηF_{EG,\eta} is 2/η2/\eta star-cocoercive around xx^*. This property is sufficient for best-iterate O(1/K)O(1/K) bounds via uniform random-iterate arguments but is generally weaker than full cocoercivity.

Summary table of operator properties:

Setting Cocoercivity of FEG,ηF_{EG,\eta} Consequence for Rates
Affine FF Yes (2/η2/\eta) Deterministic O(1/K)O(1/K) last-iterate
General monotone No O(1/K)O(1/K) via star-cocoercivity at xx^*
Star-monotone Star-cocoercive Best-iterate O(1/K)O(1/K)

4. EG, Optimistic Gradient, and Hamiltonian Gradient: Comparative Behaviors

The operator-theoretic relationships between EG and other first-order methods are clarified:

  • Optimistic Gradient (OG)/Past-Extragradient: xk+1=xk2ηF(xk)+ηF(xk1)x_{k+1} = x_k - 2\eta F(x_k) + \eta F(x_{k-1}) can be represented as an update zk+1=zkFOG,η(zk)z_{k+1} = z_k - F_{OG,\eta}(z_k). Even for linear monotone FF, FOG,ηF_{OG,\eta} is neither cocoercive nor star-cocoercive for any η>0\eta>0 due to non-dissipative spectrum structure.
  • Hamiltonian Gradient Method (HGM): This applies gradient descent to the merit function H(x)=12F(x)2H(x) = \frac{1}{2}\|F(x)\|^2. For affine FF, this yields a cocoercive gradient operator and O(1/K)O(1/K) convergence. However, for general non-affine monotone FF, H(x)H(x) may fail to be convex, and the resulting algorithm can lose convergence guarantees.

These distinctions explain the superior stability and convergence robustness of EG compared to OG and HGM in monotone settings. The lack of full cocoercivity in general also precludes interpreting EG as a simple gradient descent on a "proximal-point" surrogate.

5. Practical Implications and Guidance

  • Step-size selection: The analysis establishes that in monotone LL-Lipschitz settings, taking η1/(2L)\eta \approx 1/(2L) is both safe and rate-optimal for the decay of F(xk)2\|F(x_k)\|^2.
  • Best-iterate vs last-iterate: Previous results for EG relied on averaging or random selection among iterates to obtain O(1/K)O(1/K) rates; the present result guarantees this for the natural last iterate, aligning theoretical guarantees with typical algorithm usage.
  • Connecting with empirical tuning: Results suggest that the classical tuning heuristics for EG (step size set inversely proportional to the Lipschitz constant) are theoretically justified, and no additional smoothness parameters are needed for robust last-iterate rates.

6. Impact and Theoretical Significance

The last-iterate O(1/K)O(1/K) guarantee for EG under monotonicity and Lipschitz continuity (Gorbunov et al., 2021) addresses a persistent theoretical gap, aligning the method's strong empirical performance with rigorous convergence rates. The sharp dichotomy with cocoercivity structures further clarifies why EG, not OG or naive gradient-based methods, exhibits stability and reliable convergence in complex monotone game-theoretic and saddle-point formulations. By demonstrating that no extra smoothness beyond LL-Lipschitz continuity is necessary, this work provides a definitive convergence characterization for EG and guides the tuning and comparison of modern extragradient-type algorithms in practical large-scale optimization and machine learning applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Extragradient Method.