Wasserstein Convergence Guarantees

Updated 23 November 2025

Wasserstein convergence guarantees are quantitative bounds defining the rate at which empirical measures converge using adapted, kernel-smoothed techniques.
The framework leverages bi-causal couplings and adapted projections to achieve nonasymptotic, dimension-sensitive convergence rates with exponential deviation bounds.
These guarantees underpin robust statistical estimation and high-dimensional stochastic modeling, enhancing optimal transport in path-dependent settings.

Wasserstein convergence guarantees refer to quantitative, rate-optimal bounds in the Wasserstein metric between probability measures arising in statistical estimation, stochastic processes, sampling algorithms, and robust learning. These guarantees underpin both classical and cutting-edge probabilistic models, establishing precise rates at which empirical distributions, statistical estimators, or generated samples approach target distributions under various regularity regimes. The following presents the central definitions, convergence theory, metric reductions, and statistical and algorithmic consequences elucidated in recent literature, with particular emphasis on deep path-dependent settings and high-dimensional constructions.

1. Definitions and Core Constructions

The Wasserstein distance of order $p\geq 1$ between probability measures $\mu, \nu$ on a Polish path space (typically $\mathbb{R}^{dT}$ for path-dependent processes) is

$W_p(\mu, \nu) = \left(\inf_{\pi\in\text{Cpl}(\mu,\nu)} \int \|x-y\|^p \, \pi(dx,dy)\right)^{1/p},$

with $\text{Cpl}(\mu,\nu)$ the set of all couplings. In stochastic optimization and financial applications requiring pathwise information, the adapted Wasserstein distance $AW_p$ restricts attention to bi-causal couplings, denoted $\Cpl^{\mathrm{bc}}(\mu,\nu)$. The adapted Wasserstein distance is thus

$AW_p(\mu, \nu) = \left(\inf_{\pi \in \Cpl^{\mathrm{bc}}(\mu, \nu)} \int_{\mathbb{R}^{dT}\times\mathbb{R}^{dT}} \|x-y\|^p \,\pi(dx,dy) \right)^{1/p}.$

Empirical measure convergence under $AW_p$ fails without smoothing due to the discontinuity of optimal transport maps in the path-adapted setting (Hou, 26 Jan 2024).

Remedies involve two fundamental smoothing and discretization strategies:

Kernel-smoothed empirical measures: For $K$ a smooth density and $h>0$ ,

$\hat\mu_n^{(h)} = \frac{1}{n}\sum_{i=1}^{n} \delta_{X^{(i)}} * K_h = (\mu^n) * K_h,$

where $K_h(dx) = h^{-dT}K(x/h)dx$ and $\mu^n$ is the empirical measure.

Adapted smoothed empirical measures: To preserve discreteness, project i.i.d. noise-shifted samples onto a vanishing mesh grid $\Delta_n$ : $\hat\mu_n^{\mathrm{ad},(h)} = \frac{1}{n} \sum_{i=1}^{n} \delta_{\hat\varphi^n(X^{(i)} + h \epsilon^{(i)})},$ with $\hat\varphi^n$ the projection and $\epsilon^{(i)}$ i.i.d. Gaussian noise.

2. Main Wasserstein Convergence Theorems: Nonasymptotic Rates

For general empirical processes or estimation settings, the following results state dimension-dependent convergence rates in the adapted Wasserstein distance under mild moment and smoothness conditions (Hou, 26 Jan 2024):

(A) Kernel-Smoothed Measures

For $h_n = n^{-r}$ with $r=(dT+2)^{-1}$ ,

$\mathbb{E}\left[AW_1(\mu, \hat\mu_n^{(h_n)})\right] \leq C n^{-r}$ ,
$\mathbb{P}\left(AW_1(\mu, \hat\mu_n^{(h_n)}) \geq x + C n^{-r} \right) \leq \exp(-c n^{1-2r} x^2)$ ,
$AW_1(\mu, \hat\mu_n^{(h_n)})\to 0$ almost surely.

(B) Adapted Smoothed Empirical Measures

With the mesh $h_n = n^{-r'}$ , $r' = 1/(dT)$ ( $dT\geq 3$ ), $r' = 1/((d+1)T)$ ( $dT=1,2$ ),

$\mathbb{E}\left[AW_1(\mu, \hat\mu_n^{\mathrm{ad},(h_n)})\right] \leq C n^{-r'}$ ,
$\mathbb{P}\left(AW_1(\mu, \hat\mu_n^{\mathrm{ad},(h_n)}) \geq x \right) \leq C M \exp(-c n x^2)$ ,
$AW_1(\mu, \hat\mu_n^{\mathrm{ad},(h_n)})\to 0$ almost surely.

The bandwidth choices $h_n \asymp n^{-r}$ or $n^{-r'}$ optimize the bias-variance tradeoff, with smoothing error $O(h_n)$ and statistical error $O(n^{-1/2})$ intersecting at these rates, which are dimension-free in the exponents given the adapted context.

3. Metric Domination and Total Variation Reduction

A central technical tool is metric domination, which bounds the adapted Wasserstein distance in terms of a weighted total variation: $AW_1(\mu, \nu) \leq \bigl((3+4\alpha)^T -1\bigr)\, TV_1(\mu, \nu),$ where $TV_1(\mu, \nu)=\int(\|x\|+\tfrac12)|\mu-\nu|(dx)$ , and $\alpha$ is a higher conditional moment bound (Hou, 26 Jan 2024).

Applying smoothing with a fixed kernel $K_\sigma$ ( $\sigma>0$ ),

$\mathbb{E}[TV_1(\mu*K_\sigma, \mu^n*K_\sigma)] = O(n^{-1/2}),$

$\mathbb{E}[AW_1(\mu*K_\sigma, \mu^n*K_\sigma)] = O(n^{-1/2}).$

This metric reduction is key for concentration inequalities and enables coupling arguments at each path time-layer, leveraging McDiarmid’s inequality and concentration of measure results for empirical processes.

4. Proof Architecture: Bandwidth Tradeoff and Bi-causal Projections

Convergence analysis proceeds by:

Bandwidth stability: Under a Lipschitz kernel, $AW_1(\mu, \mu*K_h)\leq C_L h$ with $C_L$ independent of dimension, and general kernels yield $AW_1(\mu, \mu*K_h)\to 0$ as $h\to 0$ (Hou, 26 Jan 2024).
Adapted projection: Adding noise then projecting onto a finely spaced grid ensures the support of the measure remains discrete while maintaining bi-causality and convexity of the set of adapted measures. Broader averaging over independent grid shifts eliminates support collisions.
Almost-sure convergence: Exponential deviation bounds, with exponents scaling in $n^{1-2r}$ (or $n$ ), yield summable tails and a.s. rates via Borel–Cantelli.

The resulting rates generalize empirical-Wasserstein theory to adapted, path-dependent problems, with convergence orders controlled by the joint path-dimension $dT$ .

5. Statistical Implications and Connections to Broader Theory

These results establish sharp Wasserstein convergence guarantees for (i) stochastic optimization, (ii) pricing and hedging under uncertainty, and (iii) sequential learning, where path-dependent structures and information constraints are critical. Empirical measures without smoothing admit no general convergence under $AW_1$ , but the smoothed/bi-causal procedures restore classical empirical process rates with explicit, nonasymptotic dimension dependence.

Compared to classical $W_p$ bounds (see e.g., (Goldfeld et al., 2020, Nietert et al., 2022))—which in high-dimensions suffer curse-of-dimensionality rates $n^{-1/d}$ unless smoothed—these results yield dimension-free or nearly dimension-free exponents, leveraging the structure of adaptedness.

Furthermore, the metric domination bridge allows adaptation of total variation and entropy-based statistical machinery to pathwise optimal transport, opening new avenues for quantitative analysis of robust and sequential models that rely on adapted couplings.

6. Adapted Wasserstein Convergence in Context

Method	Convergence Rate	Regularity requirements
Kernel-smoothed empirical $AW_1$	$O(n^{-1/(dT+2)})$	Finite moments, Lipschitz kernel
Adapted-projected smoothed empirical	$O(n^{-1/(dT)})$ (for $dT \geq 3$ )	Compactness, exponential moments, smoothing
Unsmooth empirical (pathwise) $AW_1$	No general convergence	—

The approach generalizes:

Classic $W_1$ empirical measure convergence ( $O(n^{-1/d})$ ) to path-dependent and bi-causal settings.
Sliced/smoothed Wasserstein and robust estimation regimes, as in (Nietert et al., 2022), by quantifying the smoothing-variance interplay and restoring high-dimensional reliability.

7. Broader Significance and Future Directions

The analysis established in (Hou, 26 Jan 2024) resolves longstanding bottlenecks related to statistical non-convergence of empirical measures in the adapted Wasserstein framework and rigorously quantifies the effectiveness of kernel smoothing and adapted discretizations. The results:

Enable robust quantitative calibration and uncertainty quantification in time-adapted stochastic control, optimal stopping, and financial risk assessment.
Provide a blueprint for extending empirical process theory to increasingly complex, high-dimensional, and non-Markovian dynamical systems.
Suggest future research directions in adaptivity-aware optimal transport, including adaptive grid construction, bandwidth selection, and sequential empirical process control.
Generalize to scalability contexts relevant for high-dimensional generative modeling, distributional reinforcement learning, and robust statistics.

These developments establish foundational, nonasymptotic, and algorithmically meaningful rates for Wasserstein convergence in adapted and smoothed empirical analysis, integrating classical statistical convergence, modern pathwise transport, and high-dimensional probability.

References:

"Convergence of the Adapted Smoothed Empirical Measures" (Hou, 26 Jan 2024).
Statistical, computational, and robust guarantees for sliced or smoothed Wasserstein distances (Nietert et al., 2022, Goldfeld et al., 2020).
General Wasserstein convergence in empirical/robust estimation (Azizian et al., 2023, Le et al., 19 Feb 2024).
CLT and ergodicity implications for Markov chains and drift-diffusion processes in Wasserstein metrics (Jin et al., 2020, Chizat et al., 16 Jul 2025).