Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anytime Last-Iterate Guarantee

Updated 4 July 2026
  • Anytime last-iterate guarantee is a concept that ensures explicit finite-time bounds on the current output, rather than on averaged iterates.
  • It applies across various settings—from smooth convex SGD and bandits to saddle-point problems—by providing guarantees on metrics like objective suboptimality and residual norms.
  • The approach adapts to structural conditions such as smoothness, convexity, and noise levels, often yielding rates from O(1/√T) to O(T^(–3/4)) while emphasizing deployment-friendly performance.

An anytime last-iterate guarantee is a non-asymptotic guarantee on the quality of the current iterate, current policy, current action, or current released model at every time index, rather than on an ergodic average, a suffix average, or a horizon-specific output selected after the run. Across recent work, the object being controlled varies—objective suboptimality, residual norms, primal–dual gaps, policy suboptimality, action gaps, fairness deficits, or Rényi divergence—but the unifying requirement is prefix-wise validity: one may stop at the present time and retain an explicit guarantee on the present output (Attia et al., 15 Jul 2025, Liu et al., 2024, Lu et al., 12 May 2026, Kong et al., 2024).

1. Conceptual scope and formalizations

In optimization, the canonical form is a bound such as

f(xK)f(x)Φ(K),f(x_K)-f(x_*)\le \Phi(K),

holding for every iterate KK, not only for an average xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k (Cai et al., 2024). In smooth interpolation SGD, the guarantee is placed directly on the raw last iterate xtx_t, with no special averaging or output selection (Attia et al., 15 Jul 2025). In monotone inclusion and saddle-point problems, the controlled quantity is often a residual, such as $\Res_{F,A}(z_T)$ or F(zT)\|F(z^T)\| (Cai et al., 14 Apr 2026, Golowich et al., 2020). In bandits and reinforcement learning, the notion is formalized as Uniform Last-Iterate (ULI):

Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,

where ΔAt\Delta_{A_t} is the suboptimality gap of the action played at round tt (Liu et al., 2024).

A central distinction is between prefix-wise control and cumulative or ergodic control. Regret, PAC bounds, uniform-PAC, and averaged-iterate analyses may permit arbitrarily poor finite-time behavior of the currently deployed action or iterate, even when cumulative performance is near-optimal. Several papers therefore treat anytime last-iterate guarantees as strictly stronger than standard cumulative criteria, especially in settings where deployment must use the current policy or model rather than a mixture or an average (Liu et al., 2024, Lu et al., 12 May 2026).

The usage of “anytime” is not fully uniform across the literature. Some papers reserve it for horizon-free procedures that do not require prior knowledge of TT and whose bounds are uniform over prefixes (Attia et al., 15 Jul 2025). Others emphasize that a deterministic last-iterate bound is available for every finite horizon, even when the tuning contains the horizon explicitly (Preobrazhenskaia et al., 12 Apr 2026). This suggests that the invariant notion is prefix-wise validity, while horizon-freeness is an additional algorithmic property rather than the sole defining feature.

Setting Controlled quantity Representative prefix-wise guarantee
Smooth convex SGD KK0 KK1 (Attia et al., 15 Jul 2025)
Bandits / RL KK2 KK3 (Liu et al., 2024)
Monotone inclusion KK4 KK5 (Cai et al., 14 Apr 2026)
CMDPs reward gap and constraint violation KK6 on the current policy (Lu et al., 12 May 2026)
Cyclic DP-SGD KK7 explicit RDP upper bound for any KK8 (Kong et al., 2024)
Online fairness maximum deficit KK9 (Kahana et al., 19 May 2026)

2. Horizon-aware optimality and the horizon-free barrier in convex optimization

A decisive early result in convex Lipschitz optimization showed that the last iterate can match the information-theoretically optimal rate if the step-size schedule is redesigned specifically for the final time xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k0. For general convex problems, Jain et al. use the standard suffix-averaged schedule xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k1 and replace it by the dyadically modified last-iterate schedule

xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k2

while in the strongly convex case they replace xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k3 by

xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k4

Under Lipschitz xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k5 and diameter xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k6, this yields

xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k7

and with probability at least xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k8,

xˉK=1Kk=1Kxk\bar x_K=\tfrac1K\sum_{k=1}^K x_k9

Under xtx_t0-strong convexity, the corresponding bound is

xtx_t1

with high-probability rate

xtx_t2

These bounds remove the classical extra xtx_t3 factor for the last iterate, but the construction is explicitly non-anytime because the dyadic breakpoints depend on the total horizon xtx_t4 (Jain et al., 2019).

The proof mechanism is a modification scheme that transfers suffix-average guarantees under the original schedule to last-iterate guarantees under the modified schedule. Its key device is a one-shot “look-ahead” lemma,

xtx_t5

which prevents significant post hoc deterioration in function value after a sufficiently good point has been reached. Intuitively, the dyadic halving “slows down” the method exactly where suffix averaging would have concentrated the mass (Jain et al., 2019).

A later impossibility result shows that this horizon dependence is not an artifact of the proof. For convex xtx_t6-Lipschitz optimization over a domain of diameter xtx_t7, if a deterministic step-size sequence is truly anytime, then its worst-case last-iterate error must obey

xtx_t8

and more precisely, if xtx_t9 for all $\Res_{F,A}(z_T)$0 and some non-decreasing $\Res_{F,A}(z_T)$1, then

$\Res_{F,A}(z_T)$2

This resolves the conjecture that no truly anytime schedule can attain the exact $\Res_{F,A}(z_T)$3 last-iterate rate in this setting, and it does so even in the noiseless GD case (Kornowski et al., 15 Apr 2026).

The resulting picture is sharply bifurcated. Horizon-aware schedules can close the last-iterate gap to the lower bound, whereas horizon-free schedules must pay at least a polylogarithmic penalty. The classical textbook rule $\Res_{F,A}(z_T)$4 remains anytime, but only with the familiar $\Res_{F,A}(z_T)$5 last-iterate rate (Kornowski et al., 15 Apr 2026). A plausible implication is that, in convex Lipschitz optimization, the exact optimal last-iterate rate is not compatible with genuine horizon-free operation.

3. Positive anytime guarantees for stochastic first-order methods

The smooth interpolation regime admits a qualitatively different phenomenon. For convex $\Res_{F,A}(z_T)$6-smooth $\Res_{F,A}(z_T)$7 and fixed constant step-size $\Res_{F,A}(z_T)$8, SGD satisfies

$\Res_{F,A}(z_T)$9

where F(zT)\|F(z^T)\|0. The bound is established directly for the raw last iterate, uses no step decay, requires no knowledge of F(zT)\|F(z^T)\|1, and in fact holds uniformly for every prefix F(zT)\|F(z^T)\|2. Balancing the bias and variance terms yields a near-optimal F(zT)\|F(z^T)\|3 rate in the low-noise regime, while in pure interpolation F(zT)\|F(z^T)\|4 the greedy choice F(zT)\|F(z^T)\|5 gives

F(zT)\|F(z^T)\|6

The analysis relies on a regret-style decomposition with non-uniform weights F(zT)\|F(z^T)\|7, a smoothness lower bound converting inner products into function-value gaps and gradient-difference terms, and a step-dependent Young inequality that isolates the coefficient of the final iterate (Attia et al., 15 Jul 2025).

For smooth quadratics in the interpolation regime, including randomized Kaczmarz, a different anytime guarantee is available under the greedy step-size F(zT)\|F(z^T)\|8. The analysis introduces stochastic contraction processes F(zT)\|F(z^T)\|9 with common mean Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,0, and proves

Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,1

Consequently,

Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,2

and for randomized Kaczmarz,

Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,3

This improves the previously known Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,4 guarantee in that regime (Dereziński et al., 10 Apr 2026).

Adaptive methods do not automatically inherit strong last-iterate guarantees. For scalar AdaGrad-Norm in convex non-smooth optimization, the deterministic final-iterate bound depends on an exponent parameter

Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,5

which captures the growth of the cumulative squared subgradients. Optimizing the base parameter Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,6 over the worst case leads to

Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,7

and matching lower bounds show that this rate is tight. The proof is built around a weighted last-iterate inequality of Zamani and Glineur and a backward choice of weights that annihilates all coefficients except the one attached to the final iterate (Preobrazhenskaia et al., 12 Apr 2026).

These results show that positive anytime guarantees arise from different structural mechanisms: low noise at the optimum, interpolation, stochastic contraction, or finely tuned weighted inequalities. They also show that there is no universal last-iterate rate even within first-order convex optimization: rates from Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,8 to Pr[t1:  ΔAtf(δ,t)]1δ,\Pr\bigl[\forall t\ge1:\;\Delta_{A_t}\le f(\delta,t)\bigr]\ge 1-\delta,9 to ΔAt\Delta_{A_t}0 all occur, depending on geometry, noise, and algorithm class (Attia et al., 15 Jul 2025, Dereziński et al., 10 Apr 2026, Preobrazhenskaia et al., 12 Apr 2026).

4. Variational inequalities, saddle-point problems, and games

In smooth convex–concave saddle-point problems, the last iterate can be provably slower than the averaged iterate. For Extragradient (EG) with constant step-size ΔAt\Delta_{A_t}1, one has for every ΔAt\Delta_{A_t}2

ΔAt\Delta_{A_t}3

Choosing ΔAt\Delta_{A_t}4 gives ΔAt\Delta_{A_t}5 and ΔAt\Delta_{A_t}6, whereas the classical ergodic EG guarantee is ΔAt\Delta_{A_t}7. A matching lower bound of ΔAt\Delta_{A_t}8 for the last iterate establishes a quadratic separation between ergodic and last-iterate convergence rates (Golowich et al., 2020).

Monotone inclusion problems admit anytime last-iterate bounds for anchoring-only schemes. Proximal Anchored Gradient Descent (P-AGD),

ΔAt\Delta_{A_t}9

with

tt0

satisfies

tt1

hence tt2 for every tt3. The proof combines a resolvent-based rewrite with a boundedness argument and a decay estimate for consecutive differences tt4 (Cai et al., 14 Apr 2026).

Other operator-splitting and game-theoretic settings exhibit linear anytime behavior. For min–max optimization, Hamiltonian Gradient Descent (HGD) minimizes tt5 and, under a PL inequality for tt6, yields

tt7

The same paper proves analogous linear last-iterate convergence for Consensus Optimization under a suitable parameter choice, and extends the theory beyond strongly convex–strongly concave problems to sufficiently bilinear regimes (Abernethy et al., 2019).

Under noisy feedback in tt8-co-coercive games, vanilla stochastic gradient ascent with

tt9

admits a piecewise anytime bound on the gradient residual:

TT0

Choosing TT1 gives the first last-iterate bound under non-vanishing affine-growth noise in this class:

TT2

The same analysis also gives almost sure convergence of the iterates to the Nash equilibrium set and time-average bounds (Chandak et al., 21 Apr 2026).

Monotone mean field games present a split picture. Exact proximal-point iterations converge in the last iterate asymptotically under Lasry–Lions monotonicity:

TT3

For the regularized mirror-descent subroutine used to approximate each proximal step, however, there is a genuine anytime exponential bound:

TT4

Thus exact outer convergence is asymptotic, while the inner approximate solver is uniformly contractive at every finite time (Isobe et al., 2024).

5. Finite-sum, continual, and policy-learning settings

Incremental and shuffled methods were long known mainly through ergodic guarantees, but recent analyses move them into the anytime last-iterate regime. For finite-sum optimization

TT5

Cai and Diakonikolas obtain the first last-iterate guarantees for incremental gradient and incremental proximal methods in general convex smooth settings, and for incremental proximal methods also in convex Lipschitz settings. Their bounds hold for every epoch TT6, and the resulting oracle complexities nearly match the best known average-iterate guarantees up to a square-root-log or log factor. In the continual-learning interpretation, they also argue that a large amount of regularization is crucial to preventing catastrophic forgetting (Cai et al., 2024).

For shuffling-based gradient methods—Random Reshuffle, Shuffle Once, and Incremental Gradient—objective-value last-iterate guarantees are available even without strong convexity. In the Lipschitz convex case, appropriate step-size schedules give the standard subgradient last-iterate rate

TT7

valid for any permutation sequence. In the smooth strongly convex regime, the last iterate attains a nearly sharp TT8 bound for RR/SO, matching known lower bounds up to logarithms (Liu et al., 2024).

Bandits and reinforcement learning formulate the concept most explicitly. ULI requires a high-probability event on which every action played is near-optimal:

TT9

with near-optimal ULI corresponding to KK00 in

KK01

The paper shows that ULI implies near-optimal uniform-PAC and regret guarantees, but not conversely. Elimination-based finite-arm algorithms and a meta-algorithm wrapping a high-probability adversarial learner achieve near-optimal ULI, and an oracle-efficient linear-bandit algorithm obtains

KK02

By contrast, optimistic algorithms such as lil’UCB do not achieve near-optimal ULI (Liu et al., 2024).

In constrained MDPs, the motivation is explicitly deployment-oriented: mixture-policy guarantees are theoretically standard but practically mismatched when a single policy must be deployed. An inexact augmented Lagrangian method therefore targets last-policy convergence. If the augmented-Lagrangian subproblem at outer iteration KK03 is solved to accuracy KK04, then after KK05 outer iterations the final policy satisfies

KK06

and, crucially, the same form holds for the current policy at every outer iteration KK07. With Projected Q-Ascent as the primal oracle, the total number of policy-gradient evaluations is KK08 (Lu et al., 12 May 2026).

Approximate last-iterate convergence also appears in overparameterized GANs. For the Implicit Update dynamics, one has

KK09

and for the Predictive Method,

KK10

In both cases the neighborhood radius shrinks with width as

KK11

so the dynamics exhibit an anytime exponential-plus-bias guarantee rather than exact asymptotic convergence of the raw last iterate (Du, 2021).

6. Privacy, fairness, and recurring limitations

The last-iterate viewpoint extends beyond optimization error. In cyclically sampled DP-SGD on nonconvex composite losses, the released object is only the final or current model, so privacy accounting for the last iterate is the relevant quantity. Under weak-convexity/upper-curvature assumptions and step-size KK12, the Rényi divergence between neighboring runs satisfies, for any stopping time KK13,

KK14

where KK15, KK16, and

KK17

Thus the privacy cost of releasing the current iterate is controlled prefix-wise, without relying on subsampling amplification (Kong et al., 2024).

An analogous prefix-wise perspective appears in perpetual online fairness. In the deficit framework, each round produces deficits KK18 for tracked quality variables, and the goal is to keep all deficits below a slowly growing threshold at every prefix. The KK19-potential rule chooses the action minimizing the next-round potential

KK20

Under the moment conditions in the paper, with fixed KK21 and

KK22

every round KK23 is KK24-fair. In particular,

KK25

A matching lower bound shows that any perpetual guarantee must satisfy

KK26

so KK27 growth is unavoidable in general (Kahana et al., 19 May 2026).

Several recurring misconceptions are therefore incorrect. First, a good averaged iterate does not imply a good current iterate; the smooth saddle-point case gives a sharp KK28 versus KK29 separation (Golowich et al., 2020). Second, horizon-aware optimality does not imply horizon-free optimality; the convex Lipschitz lower bound rules out exact KK30 last-iterate performance for deterministic anytime schedules (Kornowski et al., 15 Apr 2026). Third, “anytime” does not fix a unique metric: different papers control function values, residuals, action gaps, privacy loss, fairness deficits, or neighborhood radii (Attia et al., 15 Jul 2025, Liu et al., 2024, Kong et al., 2024, Kahana et al., 19 May 2026).

This suggests that anytime last-iterate guarantees are best understood as a family of deployment-aligned finite-time guarantees rather than a single theorem schema. Their feasibility and rate depend sharply on structural conditions—interpolation, smoothness, co-coercivity, monotonicity, contraction, regularization, or low effective noise—and on the metric that defines “current performance.”

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anytime Last Iterate Guarantee.