Anytime Last-Iterate Guarantee

Updated 4 July 2026

Anytime last-iterate guarantee is a concept that ensures explicit finite-time bounds on the current output, rather than on averaged iterates.
It applies across various settings—from smooth convex SGD and bandits to saddle-point problems—by providing guarantees on metrics like objective suboptimality and residual norms.
The approach adapts to structural conditions such as smoothness, convexity, and noise levels, often yielding rates from O(1/√T) to O(T^(–3/4)) while emphasizing deployment-friendly performance.

An anytime last-iterate guarantee is a non-asymptotic guarantee on the quality of the current iterate, current policy, current action, or current released model at every time index, rather than on an ergodic average, a suffix average, or a horizon-specific output selected after the run. Across recent work, the object being controlled varies—objective suboptimality, residual norms, primal–dual gaps, policy suboptimality, action gaps, fairness deficits, or Rényi divergence—but the unifying requirement is prefix-wise validity: one may stop at the present time and retain an explicit guarantee on the present output (Attia et al., 15 Jul 2025, Liu et al., 2024, Lu et al., 12 May 2026, Kong et al., 2024).

1. Conceptual scope and formalizations

In optimization, the canonical form is a bound such as

$f(x_K)-f(x_*)\le \Phi(K),$

holding for every iterate $K$ , not only for an average $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ (Cai et al., 2024). In smooth interpolation SGD, the guarantee is placed directly on the raw last iterate $x_t$ , with no special averaging or output selection (Attia et al., 15 Jul 2025). In monotone inclusion and saddle-point problems, the controlled quantity is often a residual, such as $\Res_{F,A}(z_T)$ or $\|F(z^T)\|$ (Cai et al., 14 Apr 2026, Golowich et al., 2020). In bandits and reinforcement learning, the notion is formalized as Uniform Last-Iterate (ULI):

$\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$

where $\Delta_{A_t}$ is the suboptimality gap of the action played at round $t$ (Liu et al., 2024).

A central distinction is between prefix-wise control and cumulative or ergodic control. Regret, PAC bounds, uniform-PAC, and averaged-iterate analyses may permit arbitrarily poor finite-time behavior of the currently deployed action or iterate, even when cumulative performance is near-optimal. Several papers therefore treat anytime last-iterate guarantees as strictly stronger than standard cumulative criteria, especially in settings where deployment must use the current policy or model rather than a mixture or an average (Liu et al., 2024, Lu et al., 12 May 2026).

The usage of “anytime” is not fully uniform across the literature. Some papers reserve it for horizon-free procedures that do not require prior knowledge of $T$ and whose bounds are uniform over prefixes (Attia et al., 15 Jul 2025). Others emphasize that a deterministic last-iterate bound is available for every finite horizon, even when the tuning contains the horizon explicitly (Preobrazhenskaia et al., 12 Apr 2026). This suggests that the invariant notion is prefix-wise validity, while horizon-freeness is an additional algorithmic property rather than the sole defining feature.

Setting	Controlled quantity	Representative prefix-wise guarantee
Smooth convex SGD	$K$ 0	$K$ 1 (Attia et al., 15 Jul 2025)
Bandits / RL	$K$ 2	$K$ 3 (Liu et al., 2024)
Monotone inclusion	$K$ 4	$K$ 5 (Cai et al., 14 Apr 2026)
CMDPs	reward gap and constraint violation	$K$ 6 on the current policy (Lu et al., 12 May 2026)
Cyclic DP-SGD	$K$ 7	explicit RDP upper bound for any $K$ 8 (Kong et al., 2024)
Online fairness	maximum deficit	$K$ 9 (Kahana et al., 19 May 2026)

2. Horizon-aware optimality and the horizon-free barrier in convex optimization

A decisive early result in convex Lipschitz optimization showed that the last iterate can match the information-theoretically optimal rate if the step-size schedule is redesigned specifically for the final time $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 0. For general convex problems, Jain et al. use the standard suffix-averaged schedule $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 1 and replace it by the dyadically modified last-iterate schedule

$\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 2

while in the strongly convex case they replace $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 3 by

$\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 4

Under Lipschitz $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 5 and diameter $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 6, this yields

$\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 7

and with probability at least $\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 8,

$\bar x_K=\tfrac1K\sum_{k=1}^K x_k$ 9

Under $x_t$ 0-strong convexity, the corresponding bound is

$x_t$ 1

with high-probability rate

$x_t$ 2

These bounds remove the classical extra $x_t$ 3 factor for the last iterate, but the construction is explicitly non-anytime because the dyadic breakpoints depend on the total horizon $x_t$ 4 (Jain et al., 2019).

The proof mechanism is a modification scheme that transfers suffix-average guarantees under the original schedule to last-iterate guarantees under the modified schedule. Its key device is a one-shot “look-ahead” lemma,

$x_t$ 5

which prevents significant post hoc deterioration in function value after a sufficiently good point has been reached. Intuitively, the dyadic halving “slows down” the method exactly where suffix averaging would have concentrated the mass (Jain et al., 2019).

A later impossibility result shows that this horizon dependence is not an artifact of the proof. For convex $x_t$ 6-Lipschitz optimization over a domain of diameter $x_t$ 7, if a deterministic step-size sequence is truly anytime, then its worst-case last-iterate error must obey

$x_t$ 8

and more precisely, if $x_t$ 9 for all $\Res_{F,A}(z_T)$0 and some non-decreasing $\Res_{F,A}(z_T)$1, then

$\Res_{F,A}(z_T)$2

This resolves the conjecture that no truly anytime schedule can attain the exact $\Res_{F,A}(z_T)$3 last-iterate rate in this setting, and it does so even in the noiseless GD case (Kornowski et al., 15 Apr 2026).

The resulting picture is sharply bifurcated. Horizon-aware schedules can close the last-iterate gap to the lower bound, whereas horizon-free schedules must pay at least a polylogarithmic penalty. The classical textbook rule $\Res_{F,A}(z_T)$4 remains anytime, but only with the familiar $\Res_{F,A}(z_T)$5 last-iterate rate (Kornowski et al., 15 Apr 2026). A plausible implication is that, in convex Lipschitz optimization, the exact optimal last-iterate rate is not compatible with genuine horizon-free operation.

3. Positive anytime guarantees for stochastic first-order methods

The smooth interpolation regime admits a qualitatively different phenomenon. For convex $\Res_{F,A}(z_T)$6-smooth $\Res_{F,A}(z_T)$7 and fixed constant step-size $\Res_{F,A}(z_T)$8, SGD satisfies

$\Res_{F,A}(z_T)$9

where $\|F(z^T)\|$ 0. The bound is established directly for the raw last iterate, uses no step decay, requires no knowledge of $\|F(z^T)\|$ 1, and in fact holds uniformly for every prefix $\|F(z^T)\|$ 2. Balancing the bias and variance terms yields a near-optimal $\|F(z^T)\|$ 3 rate in the low-noise regime, while in pure interpolation $\|F(z^T)\|$ 4 the greedy choice $\|F(z^T)\|$ 5 gives

$\|F(z^T)\|$ 6

The analysis relies on a regret-style decomposition with non-uniform weights $\|F(z^T)\|$ 7, a smoothness lower bound converting inner products into function-value gaps and gradient-difference terms, and a step-dependent Young inequality that isolates the coefficient of the final iterate (Attia et al., 15 Jul 2025).

For smooth quadratics in the interpolation regime, including randomized Kaczmarz, a different anytime guarantee is available under the greedy step-size $\|F(z^T)\|$ 8. The analysis introduces stochastic contraction processes $\|F(z^T)\|$ 9 with common mean $\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 0, and proves

$\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 1

Consequently,

$\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 2

and for randomized Kaczmarz,

$\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 3

This improves the previously known $\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 4 guarantee in that regime (Dereziński et al., 10 Apr 2026).

Adaptive methods do not automatically inherit strong last-iterate guarantees. For scalar AdaGrad-Norm in convex non-smooth optimization, the deterministic final-iterate bound depends on an exponent parameter

$\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 5

which captures the growth of the cumulative squared subgradients. Optimizing the base parameter $\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 6 over the worst case leads to

$\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 7

and matching lower bounds show that this rate is tight. The proof is built around a weighted last-iterate inequality of Zamani and Glineur and a backward choice of weights that annihilates all coefficients except the one attached to the final iterate (Preobrazhenskaia et al., 12 Apr 2026).

These results show that positive anytime guarantees arise from different structural mechanisms: low noise at the optimum, interpolation, stochastic contraction, or finely tuned weighted inequalities. They also show that there is no universal last-iterate rate even within first-order convex optimization: rates from $\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 8 to $\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,$ 9 to $\Delta_{A_t}$ 0 all occur, depending on geometry, noise, and algorithm class (Attia et al., 15 Jul 2025, Dereziński et al., 10 Apr 2026, Preobrazhenskaia et al., 12 Apr 2026).

4. Variational inequalities, saddle-point problems, and games

In smooth convex–concave saddle-point problems, the last iterate can be provably slower than the averaged iterate. For Extragradient (EG) with constant step-size $\Delta_{A_t}$ 1, one has for every $\Delta_{A_t}$ 2

$\Delta_{A_t}$ 3

Choosing $\Delta_{A_t}$ 4 gives $\Delta_{A_t}$ 5 and $\Delta_{A_t}$ 6, whereas the classical ergodic EG guarantee is $\Delta_{A_t}$ 7. A matching lower bound of $\Delta_{A_t}$ 8 for the last iterate establishes a quadratic separation between ergodic and last-iterate convergence rates (Golowich et al., 2020).

Monotone inclusion problems admit anytime last-iterate bounds for anchoring-only schemes. Proximal Anchored Gradient Descent (P-AGD),

$\Delta_{A_t}$ 9

with

$t$ 0

satisfies

$t$ 1

hence $t$ 2 for every $t$ 3. The proof combines a resolvent-based rewrite with a boundedness argument and a decay estimate for consecutive differences $t$ 4 (Cai et al., 14 Apr 2026).

Other operator-splitting and game-theoretic settings exhibit linear anytime behavior. For min–max optimization, Hamiltonian Gradient Descent (HGD) minimizes $t$ 5 and, under a PL inequality for $t$ 6, yields

$t$ 7

The same paper proves analogous linear last-iterate convergence for Consensus Optimization under a suitable parameter choice, and extends the theory beyond strongly convex–strongly concave problems to sufficiently bilinear regimes (Abernethy et al., 2019).

Under noisy feedback in $t$ 8-co-coercive games, vanilla stochastic gradient ascent with

$t$ 9

admits a piecewise anytime bound on the gradient residual:

$T$ 0

Choosing $T$ 1 gives the first last-iterate bound under non-vanishing affine-growth noise in this class:

$T$ 2

The same analysis also gives almost sure convergence of the iterates to the Nash equilibrium set and time-average bounds (Chandak et al., 21 Apr 2026).

Monotone mean field games present a split picture. Exact proximal-point iterations converge in the last iterate asymptotically under Lasry–Lions monotonicity:

$T$ 3

For the regularized mirror-descent subroutine used to approximate each proximal step, however, there is a genuine anytime exponential bound:

$T$ 4

Thus exact outer convergence is asymptotic, while the inner approximate solver is uniformly contractive at every finite time (Isobe et al., 2024).

5. Finite-sum, continual, and policy-learning settings

Incremental and shuffled methods were long known mainly through ergodic guarantees, but recent analyses move them into the anytime last-iterate regime. For finite-sum optimization

$T$ 5

Cai and Diakonikolas obtain the first last-iterate guarantees for incremental gradient and incremental proximal methods in general convex smooth settings, and for incremental proximal methods also in convex Lipschitz settings. Their bounds hold for every epoch $T$ 6, and the resulting oracle complexities nearly match the best known average-iterate guarantees up to a square-root-log or log factor. In the continual-learning interpretation, they also argue that a large amount of regularization is crucial to preventing catastrophic forgetting (Cai et al., 2024).

For shuffling-based gradient methods—Random Reshuffle, Shuffle Once, and Incremental Gradient—objective-value last-iterate guarantees are available even without strong convexity. In the Lipschitz convex case, appropriate step-size schedules give the standard subgradient last-iterate rate

$T$ 7

valid for any permutation sequence. In the smooth strongly convex regime, the last iterate attains a nearly sharp $T$ 8 bound for RR/SO, matching known lower bounds up to logarithms (Liu et al., 2024).

Bandits and reinforcement learning formulate the concept most explicitly. ULI requires a high-probability event on which every action played is near-optimal:

$T$ 9

with near-optimal ULI corresponding to $K$ 00 in

$K$ 01

The paper shows that ULI implies near-optimal uniform-PAC and regret guarantees, but not conversely. Elimination-based finite-arm algorithms and a meta-algorithm wrapping a high-probability adversarial learner achieve near-optimal ULI, and an oracle-efficient linear-bandit algorithm obtains

$K$ 02

By contrast, optimistic algorithms such as lil’UCB do not achieve near-optimal ULI (Liu et al., 2024).

In constrained MDPs, the motivation is explicitly deployment-oriented: mixture-policy guarantees are theoretically standard but practically mismatched when a single policy must be deployed. An inexact augmented Lagrangian method therefore targets last-policy convergence. If the augmented-Lagrangian subproblem at outer iteration $K$ 03 is solved to accuracy $K$ 04, then after $K$ 05 outer iterations the final policy satisfies

$K$ 06

and, crucially, the same form holds for the current policy at every outer iteration $K$ 07. With Projected Q-Ascent as the primal oracle, the total number of policy-gradient evaluations is $K$ 08 (Lu et al., 12 May 2026).

Approximate last-iterate convergence also appears in overparameterized GANs. For the Implicit Update dynamics, one has

$K$ 09

and for the Predictive Method,

$K$ 10

In both cases the neighborhood radius shrinks with width as

$K$ 11

so the dynamics exhibit an anytime exponential-plus-bias guarantee rather than exact asymptotic convergence of the raw last iterate (Du, 2021).

6. Privacy, fairness, and recurring limitations

The last-iterate viewpoint extends beyond optimization error. In cyclically sampled DP-SGD on nonconvex composite losses, the released object is only the final or current model, so privacy accounting for the last iterate is the relevant quantity. Under weak-convexity/upper-curvature assumptions and step-size $K$ 12, the Rényi divergence between neighboring runs satisfies, for any stopping time $K$ 13,

$K$ 14

where $K$ 15, $K$ 16, and

$K$ 17

Thus the privacy cost of releasing the current iterate is controlled prefix-wise, without relying on subsampling amplification (Kong et al., 2024).

An analogous prefix-wise perspective appears in perpetual online fairness. In the deficit framework, each round produces deficits $K$ 18 for tracked quality variables, and the goal is to keep all deficits below a slowly growing threshold at every prefix. The $K$ 19-potential rule chooses the action minimizing the next-round potential

$K$ 20

Under the moment conditions in the paper, with fixed $K$ 21 and

$K$ 22

every round $K$ 23 is $K$ 24-fair. In particular,

$K$ 25

A matching lower bound shows that any perpetual guarantee must satisfy

$K$ 26

so $K$ 27 growth is unavoidable in general (Kahana et al., 19 May 2026).

Several recurring misconceptions are therefore incorrect. First, a good averaged iterate does not imply a good current iterate; the smooth saddle-point case gives a sharp $K$ 28 versus $K$ 29 separation (Golowich et al., 2020). Second, horizon-aware optimality does not imply horizon-free optimality; the convex Lipschitz lower bound rules out exact $K$ 30 last-iterate performance for deterministic anytime schedules (Kornowski et al., 15 Apr 2026). Third, “anytime” does not fix a unique metric: different papers control function values, residuals, action gaps, privacy loss, fairness deficits, or neighborhood radii (Attia et al., 15 Jul 2025, Liu et al., 2024, Kong et al., 2024, Kahana et al., 19 May 2026).

This suggests that anytime last-iterate guarantees are best understood as a family of deployment-aligned finite-time guarantees rather than a single theorem schema. Their feasibility and rate depend sharply on structural conditions—interpolation, smoothness, co-coercivity, monotonicity, contraction, regularization, or low effective noise—and on the metric that defines “current performance.”