Last-Iterate Convergence in Optimization

Updated 4 July 2026

Last-iterate convergence is the property where the actual algorithm iterate converges to a solution, equilibrium, or vanishing residual without relying on averaging.
It is analyzed across various contexts including monotone variational inequalities, convex-concave saddle problems, and game-theoretic learning, often achieving rates like O(1/N) under specific conditions.
Recent studies extend these results to stochastic optimization, momentum methods, and mean field games, emphasizing its practical impact on algorithm design and performance guarantees.

Last-iterate convergence is the property that the actual iterate produced by an algorithm at time $t$ , rather than an average, a best iterate, or a randomly selected output, converges to a solution, equilibrium, or vanishing residual. In modern optimization and game-theoretic learning, this notion is used to describe the behavior of the deployed decision $x_t$ , $z_t$ , or $p_t$ itself. The literature treats it as substantially stronger than ergodic convergence because many classical no-regret and saddle-point methods converge only after averaging, while the last iterate may oscillate, diverge, or converge at a slower rate (Gorbunov et al., 2022, Lee et al., 2021, Golowich et al., 2020).

1. Concept, criteria, and residuals

The meaning of last-iterate convergence depends on the ambient problem class. In monotone variational inequalities, the basic problem is to find $x^\star\in X$ such that

$(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$

with the unconstrained case $X=\mathbb{R}^d$ reducing to $F(x^\star)=0$ . In that setting, the natural last-iterate residual is problem-dependent: for unconstrained problems it is $\|F(x_N)\|^2$ , while in constrained problems it may be a step or projection residual such as $\|x_N-x_{N-1}\|^2$ (Gorbunov et al., 2022).

In saddle-point and game formulations, the last iterate is often measured by a gap function, Euclidean distance to the equilibrium set, or a Bregman divergence. In extensive-form games, the literature uses quantities such as $x_t$ 0 or $x_t$ 1, with $x_t$ 2 denoting the actual strategy pair at time $x_t$ 3 (Lee et al., 2021). In regularized or mirror-descent settings, KL and generalized KL divergences are especially common because they align with simplex and treeplex geometry.

In stochastic bandits, last-iterate quality is identified with simple regret rather than cumulative regret. For a sampling distribution $x_t$ 4, the relevant quantity is

$x_t$ 5

so the issue is whether the current distribution $x_t$ 6 is concentrating on the optimal arm (Zhan et al., 26 Oct 2025). In stochastic and composite optimization, the criterion is typically an objective gap $x_t$ 7, a gradient norm $x_t$ 8, or a Bregman divergence generated by a mirror map (Liu et al., 2023).

A persistent theme is that last-iterate convergence is not merely an alternative norm of analysis. It changes what counts as success. In counterfactual-regret-based game solving, for example, CFR-style guarantees are fundamentally ergodic, and the last iterate can diverge even when the average converges (Lee et al., 2021). This distinction recurs across optimization, online learning, and reinforcement-learning-adjacent models.

2. Monotone variational inequalities and convex-concave saddle problems

A central deterministic result is the last-iterate theory for the Past Extragradient method, also known as Optimistic Gradient, on monotone Lipschitz variational inequalities. For

$x_t$ 9

the unconstrained recursion is equivalent to

$z_t$ 0

Under monotonicity and $z_t$ 1-Lipschitz continuity alone, the method satisfies an exact last-iterate rate. In the unconstrained case, for $z_t$ 2,

$z_t$ 3

so $z_t$ 4. In the constrained case, for projected PEG with $z_t$ 5, the residual $z_t$ 6 also decays as $z_t$ 7. The analysis proceeds by explicit decreasing potentials rather than by comparing the last iterate to a best iterate, and it removes the Lipschitz Jacobian assumption used in earlier unconstrained work (Gorbunov et al., 2022).

The extragradient method exhibits a different last-iterate profile in smooth convex-concave saddle-point problems. For

$z_t$ 8

the ergodic primal-dual gap is known to converge at rate $z_t$ 9, but the last iterate is provably slower. The upper bound

$p_t$ 0

is matched by a lower bound of $p_t$ 1 for 1-SCLI methods, yielding a quadratic separation between averaged and last-iterate behavior in smooth convex-concave saddle-point problems (Golowich et al., 2020). This separation is one of the clearest formal demonstrations that last-iterate analysis is not a cosmetic refinement of ergodic theory.

Other saddle-point methods obtain stronger rates under stronger structure. Hamiltonian Gradient Descent minimizes

$p_t$ 2

with $p_t$ 3, and achieves linear convergence when $p_t$ 4 satisfies a Polyak–Łojasiewicz condition induced by spectral bounds on $p_t$ 5. The paper establishes such rates not only in strongly convex-strongly concave problems but also under a “sufficiently bilinear” condition (Abernethy et al., 2019). In constrained simplex min-max problems, Optimistic Multiplicative Weights Update is shown to converge locally near a strict KKT equilibrium for sufficiently small stepsize by a Jacobian spectral-radius argument on a lifted dynamical system (Lei et al., 2020).

Taken together, these results delineate three distinct mechanisms for last-iterate convergence in deterministic saddle structures: direct Lyapunov descent, spectral stability of the update map, and PL-type descent on an auxiliary Hamiltonian.

3. Games, regret minimization, and sequence-form dynamics

In two-player zero-sum extensive-form games with perfect recall, last-iterate convergence has been developed through optimistic mirror-descent methods over treeplexes. The sequence-form saddle problem

$p_t$ 6

admits several optimistic algorithms. VOGDA, based on the squared Euclidean regularizer, enjoys linear or exponential last-iterate convergence. VOMWU, based on the vanilla entropy regularizer, converges at rate $p_t$ 7 under uniqueness of the Nash equilibrium. DOMWU, based on the dilated entropy regularizer, again achieves linear or exponential last-iterate convergence, while DOGDA is shown only to converge asymptotically without an explicit rate. The same work sharply contrasts these global optimistic methods with CFR and CFR+, whose guarantees are ergodic and whose last iterates may diverge, even in rock-paper-scissors (Lee et al., 2021).

Regret-matching dynamics present a different obstacle: the underlying regret operator is neither monotone nor pseudo-monotone. Numerical evidence shows that RM $p_t$ 8, alternating RM $p_t$ 9, and predictive RM $x^\star\in X$ 0 lack last-iterate convergence guarantees even on a simple $x^\star\in X$ 1 matrix game. Positive results are obtained for extragradient-style smoothed variants. ExRM $x^\star\in X$ 2 and SPRM $x^\star\in X$ 3 converge asymptotically in the last iterate, achieve $x^\star\in X$ 4 best-iterate duality-gap guarantees, and attain linear last-iterate convergence when combined with restarting. The analysis is built on a Minty condition and a geometric characterization of limit points rather than standard monotonicity arguments (Cai et al., 2023).

The scope of last-iterate theory extends beyond zero-sum normal-form games. For optimistic mirror descent in classes satisfying a nonnegative sum of regrets condition, including constant-sum polymatrix and strategically zero-sum games, the dynamics have bounded second-order path length,

$x^\star\in X$ 5

which yields $x^\star\in X$ 6 regret and $x^\star\in X$ 7 convergence to an $x^\star\in X$ 8-approximate Nash equilibrium. The same framework also covers potential and near-potential games, and gives an “either close to Nash or better than robust price of anarchy” alternative for smooth games (Anagnostides et al., 2022).

A distinct line of work shows that in games with utilities linear in each player’s own strategy and in the joint strategy of the opponents, average-iterate convergence can be converted into last-iterate convergence by a black-box transformation. The reduction $x^\star\in X$ 9 constructs a new uncoupled dynamics whose played strategy at time $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 0 is exactly the running average of an internal online learner. Applied to OMWU, it yields an $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 1 last-iterate convergence rate under gradient feedback and a $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 2 rate under bandit feedback in zero-sum polymatrix games (Cai et al., 4 Jun 2025). This shows that in some game classes the gap between average and last iterates can be removed algorithmically rather than analytically.

4. Time-varying games and partial-information separations

The static-game intuition that extragradient and optimistic methods behave similarly fails in time-varying environments. In unconstrained periodic bilinear zero-sum games,

$(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 3

extra-gradient converges to the common Nash equilibrium when $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 4, with

$(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 5

whereas OGDA and negative momentum diverge exponentially on an explicit period-2 example. In convergent perturbed games $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 6, all three methods converge if $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 7, and extra-gradient still converges under the weaker assumption $(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 8 (Feng et al., 2023).

The same separation persists in constrained periodic games on simplices. In a 2-periodic zero-sum game with a common fully mixed equilibrium, Optimistic MWU fails dramatically: for any initial condition not exactly at equilibrium,

$(F(x^\star),x-x^\star)\ge 0,\qquad \forall x\in X,$ 9

By contrast, Extra-gradient MWU converges globally to the unique common fully mixed equilibrium whenever

$X=\mathbb{R}^d$ 0

The constrained analysis shows that periodic variation can create attracting boundary fixed points for optimism that are not equilibria, while KL descent remains available for the extragradient method (Feng et al., 2024).

In stochastic bandits, last-iterate analysis has recently been developed for Follow-the-Regularized-Leader. For $X=\mathbb{R}^d$ 1-Tsallis-INF with

$X=\mathbb{R}^d$ 2

the expected Bregman divergence to the optimal point mass satisfies

$X=\mathbb{R}^d$ 3

This is presented as the first last-iterate convergence result for FTRL in stochastic bandits. The same work shows that the bound implies $X=\mathbb{R}^d$ 4 for each suboptimal arm, hence $X=\mathbb{R}^d$ 5, but it does not prove the heuristic $X=\mathbb{R}^d$ 6 simple-regret rate suggested by logarithmic regret (Zhan et al., 26 Oct 2025).

These results establish two objective cautions. First, last-iterate convergence in static games does not transfer automatically to time-varying games. Second, strong cumulative-regret guarantees do not by themselves determine the asymptotic rate of the current iterate.

5. Stochastic optimization, momentum, and finite-sum methods

For composite stochastic mirror descent on

$X=\mathbb{R}^d$ 7

a unified last-iterate framework now covers general domains, composite objectives, non-Euclidean norms, smoothness, strong convexity, and high-probability analysis. The method

$X=\mathbb{R}^d$ 8

admits expected and sub-Gaussian high-probability last-iterate bounds without compactness assumptions or almost surely bounded noise. In general convex settings the rates are $X=\mathbb{R}^d$ 9 or $F(x^\star)=0$ 0 in the nonsmooth known-horizon case; in strongly convex settings they become $F(x^\star)=0$ 1; and the same framework is extended to heavy-tailed and sub-Weibull noise (Liu et al., 2023).

Momentum methods display both positive and negative last-iterate phenomena. In nonconvex stochastic optimization for neural networks, the SUM framework

$F(x^\star)=0$ 2

covers stochastic heavy ball and stochastic Nesterov momentum. Under standard smoothness, bounded second moments, and Robbins–Monro stepsizes, it yields

$F(x^\star)=0$ 3

for the last iterate, with constant momentum $F(x^\star)=0$ 4 (Xu et al., 2022). By contrast, in convex stochastic optimization, standard SGDM with constant momentum is provably suboptimal: there exists an $F(x^\star)=0$ 5-Lipschitz convex function on which the last iterate suffers the lower bound $F(x^\star)=0$ 6. An FTRL-based SGDM with increasing momentum and shrinking updates restores the optimal $F(x^\star)=0$ 7 last-iterate rate without projections onto bounded domains (Li et al., 2021).

Least-squares in the interpolation regime provides an archetypal setting where constant-step-size SGD has explicit final-iterate guarantees despite the absence of strong convexity. Under increasingly refined spectral assumptions, the last iterate satisfies $F(x^\star)=0$ 8, then $F(x^\star)=0$ 9, and under capacity and source conditions even polynomial rates faster than $\|F(x_N)\|^2$ 0 (Varre et al., 2021). This is a representative example in which interpolation removes the persistent variance floor that normally obstructs last-iterate analysis.

For finite-sum methods, the literature now includes nonasymptotic last-iterate results for both cyclic and shuffled updates. Incremental gradient and incremental proximal methods on

$\|F(x_N)\|^2$ 1

admit first nonasymptotic last-iterate guarantees in general convex smooth and convex Lipschitz settings, nearly matching best known average-iterate oracle complexities up to square-root-logarithmic or logarithmic factors, with an explicit continual-learning interpretation through regularized task updates (Cai et al., 2024). For shuffling methods—Random Reshuffle, Shuffle Once, and Incremental Gradient—the last iterate has objective-value convergence even without strong convexity: in the smooth convex setting the rate matches the known $\|F(x_N)\|^2$ 2-type average-iterate behavior up to logarithmic factors; in the smooth strongly convex setting RR and SO achieve a nearly tight $\|F(x_N)\|^2$ 3 objective-gap rate; and in the Lipschitz convex setting the last iterate attains the classical $\|F(x_N)\|^2$ 4 rate (Liu et al., 2024).

A recurrent structural lesson is that momentum, shuffling, and stochasticity do not have a uniform effect on the last iterate. The decisive issue is the compatibility between noise geometry, memory, and the Lyapunov or regret structure used in the proof.

6. Mean field and graphon mean field games

In large-population control models, last-iterate convergence has recently been established for both mean field games and graphon mean field games. For regularized graphon mean field games, mirror descent yields the weighted KL criterion

$\|F(x_N)\|^2$ 5

Under monotonicity and regularization, the tabular full-information case satisfies

$\|F(x_N)\|^2$ 6

that is, an $\|F(x_N)\|^2$ 7 last-iterate rate. Under bandit feedback, the tabular rate becomes $\|F(x_N)\|^2$ 8, and in linear GMFG it becomes $\|F(x_N)\|^2$ 9 (Dong et al., 2024). The decisive mechanism is a KL recursion with contraction factor $\|x_N-x_{N-1}\|^2$ 0 plus estimation-error terms.

For finite-horizon monotone mean field games, a proximal-point-type method provides a direct last-iterate convergence theorem under a Lasry–Lions-type weak monotonicity condition. If $\|x_N-x_{N-1}\|^2$ 1 is defined by a KL-regularized policy improvement step and $\|x_N-x_{N-1}\|^2$ 2, then

$\|x_N-x_{N-1}\|^2$ 3

The same work interprets each proximal step as solving a regularized mean field game and approximates it by regularized mirror descent. The inner mirror-descent dynamics converge exponentially to the regularized equilibrium: $\|x_N-x_{N-1}\|^2$ 4 for sufficiently small $\|x_N-x_{N-1}\|^2$ 5 (Isobe et al., 2024).

These results place mean field models within the same general template seen elsewhere in last-iterate theory: monotonicity or regularization yields a one-step descent in a Bregman geometry, and the main challenge is identifying the residual that is both mathematically tractable and operationally meaningful.

Last-iterate convergence is therefore not a single theorem but a family of problem-specific phenomena. In some regimes it is global and nonasymptotic, as for PEG on monotone Lipschitz variational inequalities and for several regularized mean field models. In others it is only local, as for OMWU in constrained convex-concave optimization. In still others it separates sharply from average-iterate behavior, as in extragradient for smooth convex-concave saddle problems, CFR-style methods in extensive-form games, or optimistic methods in periodic games. The common thread is that the last iterate reflects the actual state of the learning dynamics, and its analysis typically requires structure beyond what is needed for regret or ergodic convergence alone.