Non-Asymptotic Convergence Rate

Updated 17 November 2025

Non-asymptotic convergence rate is a finite-sample bound that explicitly quantifies error metrics using defined constants and dependencies on sample size and problem parameters.
It underpins methodologies in statistical estimation, optimization, and generative modeling, enabling adaptive stopping and precise hyperparameter tuning.
These explicit bounds offer actionable insights by revealing phase transitions, bias-variance trade-offs, and minimax optimality within practical computational regimes.

A non-asymptotic convergence rate is a quantitative, finite-sample bound on the error or discrepancy between an algorithmic iterate (e.g., estimator, sampled distribution, iterate in optimization) and its statistical or mathematical target. Unlike asymptotic rates—which describe the leading-order scaling as the sample size, number of iterations, or time step goes to infinity—non-asymptotic rates provide explicit, fully finite-sample inequalities, specifying all constants, dependencies on model parameters, and covering the regime of practical computational and statistical interest. These rates are central in modern statistics, machine learning, optimization, high-dimensional probability, and stochastic differential equations, where practitioners demand explicit confidence, risk, or approximation guarantees at the scale realized in experiments and deployments.

1. Formal Definition and Fundamental Role

Given an iterative algorithm $A_n$ (e.g., an estimator, optimizer, or Markov chain after $n$ steps), a non-asymptotic convergence rate is a bound of the form

$\mathsf{Err}_n \leq C\,\Phi(n;d,\beta,\ldots),$

where $\mathsf{Err}_n$ is an error metric (e.g., excess risk, Wasserstein distance, divergence, normed distance to optimum), $C$ is a constant possibly depending on dimension $d$ , regularity parameters, and problem-specific quantities, and $\Phi(n;...\,)$ is a known function of $n$ (often monotonic decreasing, e.g., $n^{-1/2}$ , $(\log n/n)^{2\beta/(2\beta+d)}$ , $e^{-cn}$ ). Critically, these bounds are valid for all $n\geq 1$ , not just in the limit as $n\to\infty$ .

The non-asymptotic perspective calibrates algorithmic and statistical decisions in terms of finite resources, exposing phase transitions, the effect of dimension, regularity, model approximation, and stochasticity on practical performance. It is essential for optimal hyperparameter selection, adaptive stopping, and understanding sample or iteration complexity trade-offs.

2. Non-Asymptotic Rates in Statistical Estimation

Non-asymptotic rates are foundational in empirical process theory and nonparametric statistics. For example, in density estimation with vanilla generative adversarial networks (GANs) (Puchkin et al., 2021), under $\beta$ -Hölder smoothness of the target density and proper network class choices, the JS-divergence between the GAN estimate $\hat p_{\rm GAN}$ and the true density $p^*$ satisfies

$\mathsf{JS}(\hat p_{\rm GAN},p^*) = O\left(\left(\frac{\log n}{n}\right)^{2\beta/(2\beta+d)}\right)$

with explicit constants, with probability at least $1-\delta$ . This matches the minimax non-asymptotic lower bound for $L_2$ /JS risk, up to logarithmic terms, with all smoothness, dimension, and sample size dependencies completely specified (see Theorem 2 and Theorem 3 in (Puchkin et al., 2021)). Likewise, for the empirical measure $\mu_N$ based on $N$ i.i.d. samples from $\mu$ in Wasserstein distance, (Fournier, 2022) gives dimension- and moment-explicit, non-asymptotic bounds: $\E[W_p^p(\mu_N,\mu)]^{1/p} \leq \begin{cases} O(N^{-1/2}), & p > d/2,\ O(N^{-1/2}(\log N)^{1/p}), & p = d/2,\ O(N^{-1/d}), & p < d/2, \end{cases}$ with all constants made explicit, including for the unbounded-support case with only finite $q$ -th moments.

Non-asymptotic rates are thus crucial for benchmarking estimators, adaptive sample-size allocation, and understanding information-theoretic limits outside the asymptotic regime.

3. Non-Asymptotic Analysis in Optimization

In optimization, non-asymptotic convergence rates characterize the decrease of error metrics (function gap, gradient norm, proximity to optimal set) in terms of iteration number. For the stochastic proximal point (SPP) method applied to convex objectives under "weak linear regularity," (Patrascu, 2019) shows

$\E[\mathrm{dist}_{X^*}^2(x^k)] \leq O(1/k),$

with all structural constants spelled out, even when the objective is not strongly convex. In the interpolation regime, linear (geometric) rates are achieved: $\E[\mathrm{dist}_{X^*}^2(x^k)] \leq (1-\mu \sigma_{F,\mu})^k\,\mathrm{dist}_{X^*}^2(x^0),$ where all quantities are non-asymptotically explicit.

For quasi-Newton methods, prior theory provided only asymptotic rates or local non-asymptotic rates requiring an initial phase inside a small neighborhood. Recent works break this limitation:

(Jin et al., 2020): For BFGS/DFP under strong convexity and Hessian-Lipschitz, the local non-asymptotic superlinear contraction holds:

$\|x_k - x^*\| = O((1/\sqrt{k})^k), \quad f(x_k)-f(x^*)=O((1/k)^k)$

on explicit neighborhoods, with all constants given.

(Jin et al., 25 Apr 2024): For BFGS with Armijo-Wolfe line search, the first global non-asymptotic rates are proven, including both linear and superlinear (O((1/t)^t)) phases, with transition thresholds explicitly quantified by problem parameters (dimension, condition number, initial Hessian choice).

These results provide strong finite-iteration guarantees, replacing vague asymptotics with concrete complexity bounds.

4. Non-Asymptotic Stochastic Approximation and Learning

In stochastic approximation, non-asymptotic rates govern the finite-sample performance of online algorithms:

In stochastic gradient descent (SGD), non-asymptotic rates for Polyak-Ruppert averaging (Anastasiou et al., 2019) demonstrate that for strongly convex quadratic objectives, the law of the normalized average converges to a normal vector at the rate

$O\left(\frac{d^2}{\sqrt{n}}\right),$

with dimension explicit, and all constants specified for practical confidence interval construction.

In streaming and mini-batch SGD (Godichon-Baggioni et al., 2021), non-asymptotic rates interpolate between $O(N_t^{-\alpha})$ for pure SGD and the optimal $O(1/N_t)$ with constant given by the Cramér-Rao bound upon Polyak-Ruppert averaging, for arbitrary batch growth schedules and under minimal convexity assumptions.

Adaptive algorithms such as AdaGrad (Jin et al., 8 Sep 2024, Liu et al., 2022) are analyzed via stopping-time techniques and division-by-adaptive step-size arguments, yielding near-optimal rates

$\frac{1}{T} \sum_{n=1}^T \E\|\nabla g(\theta_n)\|^2 \leq O\left(\frac{\log T}{\sqrt{T}}\right)$

in the general nonconvex smooth stochastic setting, or $O(1/T)$ in convex/quasar-convex domains, again with universal (not only asymptotic) validity.

5. Non-Asymptotic Rates for Generative Modeling and Sampling

Non-asymptotic error control for MCMC, diffusion-based generative models, and Langevin Monte Carlo underpins the design of accurate samplers in high dimensions. Recent research gives fully explicit, dimension-dependent finite-sample bounds:

For vanilla Euler schemes in non-log-concave targets, (Majka et al., 2018) establishes

$W_2(\mathcal{L}(X_k),\pi) \leq C_2 (1- c h)^{k/2} + \widetilde C h^{1/4}$

under contractivity at infinity, with $C_2, c, \widetilde C$ analytic in problem parameters.

Accelerated high-order Langevin algorithms (Neufeld et al., 9 May 2024) achieve, for Hölder regularity exponent $q \in (0,1]$ ,

$W_1(\mathcal{L}(\theta_n), \pi_\beta) \leq C_1 e^{-C_0 \lambda n} + C_2 \lambda^{1 + q/2},$

improving over the traditional $O(\lambda^{1/2})$ bias barrier. All constants and their high-degree dimension dependencies are explicit.

In discrete-time diffusion models, non-asymptotic bounds in total variation scale as $O(d^2/T)$ (deterministic) or $O(1/\sqrt{T})$ (stochastic), with higher-order accelerations yielding $O(1/T^2)$ and $O(1/T)$ (Li et al., 2023).

Such results are fundamental to sample complexity analysis, tuning (step-size, number of iterations, discretization), and reveal when additional regularity or algorithmic structure yields polynomial improvements.

6. Near-Optimality, Minimaxity, and Limitations

A principal benchmark for non-asymptotic rates is statistical optimality—matching known minimax lower bounds (often up to logarithmic factors or constants). Examples include:

GAN density estimation with the JS divergence achieving the classical minimax rate for $\beta$ -smooth densities (Puchkin et al., 2021).
Empirical measure Wasserstein convergence achieving the Dudley/Ajtai-Komlós-Tusnády/Dobrić-Yukich lower bounds (Fournier, 2022).
Contrastive Divergence for exponential families attaining the parametric $O(n^{-1/2})$ risk rate (rather than $O(n^{-1/3})$ ), with iterate-averaged CD saturating the asymptotic Cramér-Rao bound up to factor 4 (Glaser et al., 15 Oct 2025).

In algorithmic settings, non-asymptotic rates inform both theoretical and practical design, revealing the exact conditions under which optimality is achieved, and where gaps (bias, variance, constants, dimension) may persist.

The main limitations in current work are often the size (sometimes exponential in $d$ ) of explicit constants, possible conservatism in allowable step sizes, and the necessary regularity (e.g., Hölder exponent $q$ in sampling). Recent theory continues to close these gaps, with increasing use of probabilistic comparison principles, coupling, and empirical process theory.

7. Typical Methodological Ingredients and Proof Structure

Non-asymptotic analysis requires:

Construction of Lyapunov/potential functions guaranteeing uniform moment or risk control,
Explicit martingale and empirical process inequalities (Bernstein, Talagrand, Lindeberg–Stein),
Tight control of bias-variance trade-off and dependencies on initialization or parameterization,
Discretization error bounds in numerical methods (often via interpolation and tailored error decomposition),
Metric contraction and stability analysis under weak regularity assumptions.

A striking feature in recent advances is the normalized, explicit path from model assumptions through algorithmic structure to finite-sample inequalities, making the entire analytic pipeline transparent and actionable for arXiv-knowledgeable researchers.

Table: Non-Asymptotic Rates in Selected Domains

Context	Error Metric / Rate	Reference
GAN density estimation, $\beta$ -smooth	$O\left((\log n / n)^{2\beta/(2\beta+d)}\right)$	(Puchkin et al., 2021)
Wasserstein empirical convergence	$O(N^{-1/d}) - O(N^{-1/2})$ (explicit in $d$ )	(Fournier, 2022)
SPP in convex optimization	$O(1/k)$ (distance)	(Patrascu, 2019)
Polyak-Ruppert averaged SGD	$O(d^2/\sqrt{n})$ (normal approx., CI error)	(Anastasiou et al., 2019)
BFGS/Quasi-Newton (local superlinear)	$O((1/\sqrt{k})^{k})$	(Jin et al., 2020)
BFGS (global, Armijo-Wolfe)	$O((1/t)^t)$ after threshold	(Jin et al., 25 Apr 2024)
AdaGrad for non-convex SGD	$O(\log T/\sqrt{T})$ (avg grad norm)	(Jin et al., 8 Sep 2024)
Diffusion-based models (deterministic)	$O(d^2/T)$ ; $O(d^6/T^2)$ (accelerated)	(Li et al., 2023)
aHOLA (Wasserstein-1, nonconvex)	$O(\lambda^{1+q/2})$	(Neufeld et al., 9 May 2024)
Contrastive Divergence (CD)	$O(n^{-1/2})$ (parametric rate, near-optimal)	(Glaser et al., 15 Oct 2025)