Extragradient Method: Convergence Guarantees

Updated 11 November 2025

Extragradient Method is a two-step prediction-correction algorithm designed to solve monotone variational inequalities and saddle-point problems.
It guarantees an O(1/K) convergence rate for the last iterate’s squared norm without needing extra assumptions like strong monotonicity or Jacobian smoothness.
Comparative analysis reveals that while EG retains robust convergence in general settings, alternative methods like optimistic gradient may lack the necessary cocoercivity properties.

The extragradient method is a foundational first-order algorithm for solving monotone variational inequalities, saddle-point problems, and root-finding problems involving monotone and Lipschitz operators. Its main appeal stems from robustness and the ability to achieve sharp convergence guarantees even when classical conditions like strong monotonicity or Jacobian smoothness are absent.

1. Variational Inequality Framework and the Extragradient Iteration

Consider the unconstrained variational inequality problem: find $x^* \in \mathbb{R}^d$ such that $F(x^*) = 0$ , where $F: \mathbb{R}^d \rightarrow \mathbb{R}^d$ is monotone and $L$ -Lipschitz: $\langle F(x) - F(y), x - y \rangle \geq 0 \quad \forall x, y \in \mathbb{R}^d, \qquad \|F(x) - F(y)\| \leq L\|x - y\|.$ The classical extragradient method (EG), introduced by Korpelevich (1976), proceeds at iteration $k$ with a step size $\eta>0$ as: $\begin{aligned} x_{k+\frac{1}{2}} & = x_k - \eta F(x_k), \ x_{k+1} & = x_k - \eta F(x_{k+\frac{1}{2}}). \end{aligned}$ This two-step scheme computes a forward step (prediction) followed by a correction at the extrapolated point, effectively mitigating the cycling and instability endemic to gradient descent when $F$ is monotone but not strongly so.

2. Last-Iterate $O(1/K)$ Rate and Analysis under Minimal Smoothness

A central contribution is the establishment of an $O(1/K)$ rate for the squared norm of the operator at the last iterate—not just averaged or best iterate—without additional jacobian smoothness or cocoercivity assumptions (Gorbunov et al., 2021).

Main result: Let $F$ be monotone and $L$ -Lipschitz. For $\eta \in (0, 1/(2L))$ and all $K\geq 0$ ,

$\|F(x_K)\|^2 \leq \frac{\|x_0-x^*\|^2}{\eta^2(1-L^2\eta^2)(K+1)}$

and the gap function

$\operatorname{Gap}_F(x_K) = \sup_{y\in \mathbb{R}^d} \langle F(y), x_K - y \rangle \leq \frac{\|x_0 - x^*\|}{\eta(1-L^2\eta^2)^{1/2}(K+1)}.$

Proof strategy:

Non-increasing operator norm: Lemma 3.2 shows $\|F(x_{k+1})\| \leq \|F(x_k)\|$ for $\eta \leq 1/(2L)$ , based on performance estimation summing weighted monotonicity and Lipschitz inequalities.
Descent in solution distance: For all $k$ ,

$\|x_k - x^*\|^2 - \|x_{k+1} - x^*\|^2 \geq \eta^2(1-L^2\eta^2)\|F(x_k)\|^2.$

Telescoping: Summing over $k$ yields a global bound on $\sum_k \|F(x_k)\|^2$ , and by monotonicity, the last iterate term achieves the $O(1/K)$ rate.

Crucially, no assumption involving the Jacobian of $F$ or additional smoothness is needed beyond monotonicity and $L$ -Lipschitz continuity.

3. Cocoercivity Structure and Limits

The convergence properties of extragradient-type methods are often linked to cocoercivity—a property that ensures nonexpansiveness of the operator $I - \ell F$ . The analysis in this context reveals several sharp distinctions:

Cocoercivity for Affine Operators: For affine $F(x)=Ax+b$ that is monotone and $L$ -Lipschitz, the EG update operator $F_{EG,\eta}(x) := F(x-\eta F(x))$ is $2/\eta$ -cocoercive for $\eta < 1/L$ .
Non-cocoercivity in General: For generic monotone $L$ -Lipschitz $F$ , EG is not $\ell$ -cocoercive for any $\ell>0$ and any $\eta>0$ . Explicit counterexamples show the induced mapping can be expansive, sharply distinguishing the limits of operator-theoretic interpretations.
Star-cocoercivity: When $F$ is merely star-monotone (i.e., $\langle F(x), x-x^*\rangle \geq 0$ for all $x$ ), then $F_{EG,\eta}$ is $2/\eta$ star-cocoercive around $x^*$ . This property is sufficient for best-iterate $O(1/K)$ bounds via uniform random-iterate arguments but is generally weaker than full cocoercivity.

Summary table of operator properties:

Setting	Cocoercivity of $F_{EG,\eta}$	Consequence for Rates
Affine $F$	Yes ( $2/\eta$ )	Deterministic $O(1/K)$ last-iterate
General monotone	No	$O(1/K)$ via star-cocoercivity at $x^*$
Star-monotone	Star-cocoercive	Best-iterate $O(1/K)$

4. EG, Optimistic Gradient, and Hamiltonian Gradient: Comparative Behaviors

The operator-theoretic relationships between EG and other first-order methods are clarified:

Optimistic Gradient (OG)/Past-Extragradient: $x_{k+1} = x_k - 2\eta F(x_k) + \eta F(x_{k-1})$ can be represented as an update $z_{k+1} = z_k - F_{OG,\eta}(z_k)$ . Even for linear monotone $F$ , $F_{OG,\eta}$ is neither cocoercive nor star-cocoercive for any $\eta>0$ due to non-dissipative spectrum structure.
Hamiltonian Gradient Method (HGM): This applies gradient descent to the merit function $H(x) = \frac{1}{2}\|F(x)\|^2$ . For affine $F$ , this yields a cocoercive gradient operator and $O(1/K)$ convergence. However, for general non-affine monotone $F$ , $H(x)$ may fail to be convex, and the resulting algorithm can lose convergence guarantees.

These distinctions explain the superior stability and convergence robustness of EG compared to OG and HGM in monotone settings. The lack of full cocoercivity in general also precludes interpreting EG as a simple gradient descent on a "proximal-point" surrogate.

5. Practical Implications and Guidance

Step-size selection: The analysis establishes that in monotone $L$ -Lipschitz settings, taking $\eta \approx 1/(2L)$ is both safe and rate-optimal for the decay of $\|F(x_k)\|^2$ .
Best-iterate vs last-iterate: Previous results for EG relied on averaging or random selection among iterates to obtain $O(1/K)$ rates; the present result guarantees this for the natural last iterate, aligning theoretical guarantees with typical algorithm usage.
Connecting with empirical tuning: Results suggest that the classical tuning heuristics for EG (step size set inversely proportional to the Lipschitz constant) are theoretically justified, and no additional smoothness parameters are needed for robust last-iterate rates.

6. Impact and Theoretical Significance

The last-iterate $O(1/K)$ guarantee for EG under monotonicity and Lipschitz continuity (Gorbunov et al., 2021) addresses a persistent theoretical gap, aligning the method's strong empirical performance with rigorous convergence rates. The sharp dichotomy with cocoercivity structures further clarifies why EG, not OG or naive gradient-based methods, exhibits stability and reliable convergence in complex monotone game-theoretic and saddle-point formulations. By demonstrating that no extra smoothness beyond $L$ -Lipschitz continuity is necessary, this work provides a definitive convergence characterization for EG and guides the tuning and comparison of modern extragradient-type algorithms in practical large-scale optimization and machine learning applications.

PDF Markdown Chat (Pro)

References (1)

Extragradient Method: $O(1/K)$ Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity (2021)

Follow Topic

Get notified by email when new papers are published related to Extragradient Method.