Papers
Topics
Authors
Recent
2000 character limit reached

Wasserstein Convergence Guarantees

Updated 23 November 2025
  • Wasserstein convergence guarantees are quantitative bounds defining the rate at which empirical measures converge using adapted, kernel-smoothed techniques.
  • The framework leverages bi-causal couplings and adapted projections to achieve nonasymptotic, dimension-sensitive convergence rates with exponential deviation bounds.
  • These guarantees underpin robust statistical estimation and high-dimensional stochastic modeling, enhancing optimal transport in path-dependent settings.

Wasserstein convergence guarantees refer to quantitative, rate-optimal bounds in the Wasserstein metric between probability measures arising in statistical estimation, stochastic processes, sampling algorithms, and robust learning. These guarantees underpin both classical and cutting-edge probabilistic models, establishing precise rates at which empirical distributions, statistical estimators, or generated samples approach target distributions under various regularity regimes. The following presents the central definitions, convergence theory, metric reductions, and statistical and algorithmic consequences elucidated in recent literature, with particular emphasis on deep path-dependent settings and high-dimensional constructions.

1. Definitions and Core Constructions

The Wasserstein distance of order p1p\geq 1 between probability measures μ,ν\mu, \nu on a Polish path space (typically RdT\mathbb{R}^{dT} for path-dependent processes) is

Wp(μ,ν)=(infπCpl(μ,ν)xypπ(dx,dy))1/p,W_p(\mu, \nu) = \left(\inf_{\pi\in\text{Cpl}(\mu,\nu)} \int \|x-y\|^p \, \pi(dx,dy)\right)^{1/p},

with Cpl(μ,ν)\text{Cpl}(\mu,\nu) the set of all couplings. In stochastic optimization and financial applications requiring pathwise information, the adapted Wasserstein distance AWpAW_p restricts attention to bi-causal couplings, denoted $\Cpl^{\mathrm{bc}}(\mu,\nu)$. The adapted Wasserstein distance is thus

$AW_p(\mu, \nu) = \left(\inf_{\pi \in \Cpl^{\mathrm{bc}}(\mu, \nu)} \int_{\mathbb{R}^{dT}\times\mathbb{R}^{dT}} \|x-y\|^p \,\pi(dx,dy) \right)^{1/p}.$

Empirical measure convergence under AWpAW_p fails without smoothing due to the discontinuity of optimal transport maps in the path-adapted setting (Hou, 26 Jan 2024).

Remedies involve two fundamental smoothing and discretization strategies:

  • Kernel-smoothed empirical measures: For KK a smooth density and h>0h>0,

μ^n(h)=1ni=1nδX(i)Kh=(μn)Kh,\hat\mu_n^{(h)} = \frac{1}{n}\sum_{i=1}^{n} \delta_{X^{(i)}} * K_h = (\mu^n) * K_h,

where Kh(dx)=hdTK(x/h)dxK_h(dx) = h^{-dT}K(x/h)dx and μn\mu^n is the empirical measure.

  • Adapted smoothed empirical measures: To preserve discreteness, project i.i.d. noise-shifted samples onto a vanishing mesh grid Δn\Delta_n: μ^nad,(h)=1ni=1nδφ^n(X(i)+hϵ(i)),\hat\mu_n^{\mathrm{ad},(h)} = \frac{1}{n} \sum_{i=1}^{n} \delta_{\hat\varphi^n(X^{(i)} + h \epsilon^{(i)})}, with φ^n\hat\varphi^n the projection and ϵ(i)\epsilon^{(i)} i.i.d. Gaussian noise.

2. Main Wasserstein Convergence Theorems: Nonasymptotic Rates

For general empirical processes or estimation settings, the following results state dimension-dependent convergence rates in the adapted Wasserstein distance under mild moment and smoothness conditions (Hou, 26 Jan 2024):

(A) Kernel-Smoothed Measures

For hn=nrh_n = n^{-r} with r=(dT+2)1r=(dT+2)^{-1},

  • E[AW1(μ,μ^n(hn))]Cnr\mathbb{E}\left[AW_1(\mu, \hat\mu_n^{(h_n)})\right] \leq C n^{-r},
  • P(AW1(μ,μ^n(hn))x+Cnr)exp(cn12rx2)\mathbb{P}\left(AW_1(\mu, \hat\mu_n^{(h_n)}) \geq x + C n^{-r} \right) \leq \exp(-c n^{1-2r} x^2),
  • AW1(μ,μ^n(hn))0AW_1(\mu, \hat\mu_n^{(h_n)})\to 0 almost surely.

(B) Adapted Smoothed Empirical Measures

With the mesh hn=nrh_n = n^{-r'}, r=1/(dT)r' = 1/(dT) (dT3dT\geq 3), r=1/((d+1)T)r' = 1/((d+1)T) (dT=1,2dT=1,2),

  • E[AW1(μ,μ^nad,(hn))]Cnr\mathbb{E}\left[AW_1(\mu, \hat\mu_n^{\mathrm{ad},(h_n)})\right] \leq C n^{-r'},
  • P(AW1(μ,μ^nad,(hn))x)CMexp(cnx2)\mathbb{P}\left(AW_1(\mu, \hat\mu_n^{\mathrm{ad},(h_n)}) \geq x \right) \leq C M \exp(-c n x^2),
  • AW1(μ,μ^nad,(hn))0AW_1(\mu, \hat\mu_n^{\mathrm{ad},(h_n)})\to 0 almost surely.

The bandwidth choices hnnrh_n \asymp n^{-r} or nrn^{-r'} optimize the bias-variance tradeoff, with smoothing error O(hn)O(h_n) and statistical error O(n1/2)O(n^{-1/2}) intersecting at these rates, which are dimension-free in the exponents given the adapted context.

3. Metric Domination and Total Variation Reduction

A central technical tool is metric domination, which bounds the adapted Wasserstein distance in terms of a weighted total variation: AW1(μ,ν)((3+4α)T1)TV1(μ,ν),AW_1(\mu, \nu) \leq \bigl((3+4\alpha)^T -1\bigr)\, TV_1(\mu, \nu), where TV1(μ,ν)=(x+12)μν(dx)TV_1(\mu, \nu)=\int(\|x\|+\tfrac12)|\mu-\nu|(dx), and α\alpha is a higher conditional moment bound (Hou, 26 Jan 2024).

Applying smoothing with a fixed kernel KσK_\sigma (σ>0\sigma>0),

E[TV1(μKσ,μnKσ)]=O(n1/2),\mathbb{E}[TV_1(\mu*K_\sigma, \mu^n*K_\sigma)] = O(n^{-1/2}),

E[AW1(μKσ,μnKσ)]=O(n1/2).\mathbb{E}[AW_1(\mu*K_\sigma, \mu^n*K_\sigma)] = O(n^{-1/2}).

This metric reduction is key for concentration inequalities and enables coupling arguments at each path time-layer, leveraging McDiarmid’s inequality and concentration of measure results for empirical processes.

4. Proof Architecture: Bandwidth Tradeoff and Bi-causal Projections

Convergence analysis proceeds by:

  1. Bandwidth stability: Under a Lipschitz kernel, AW1(μ,μKh)CLhAW_1(\mu, \mu*K_h)\leq C_L h with CLC_L independent of dimension, and general kernels yield AW1(μ,μKh)0AW_1(\mu, \mu*K_h)\to 0 as h0h\to 0 (Hou, 26 Jan 2024).
  2. Adapted projection: Adding noise then projecting onto a finely spaced grid ensures the support of the measure remains discrete while maintaining bi-causality and convexity of the set of adapted measures. Broader averaging over independent grid shifts eliminates support collisions.
  3. Almost-sure convergence: Exponential deviation bounds, with exponents scaling in n12rn^{1-2r} (or nn), yield summable tails and a.s. rates via Borel–Cantelli.

The resulting rates generalize empirical-Wasserstein theory to adapted, path-dependent problems, with convergence orders controlled by the joint path-dimension dTdT.

5. Statistical Implications and Connections to Broader Theory

These results establish sharp Wasserstein convergence guarantees for (i) stochastic optimization, (ii) pricing and hedging under uncertainty, and (iii) sequential learning, where path-dependent structures and information constraints are critical. Empirical measures without smoothing admit no general convergence under AW1AW_1, but the smoothed/bi-causal procedures restore classical empirical process rates with explicit, nonasymptotic dimension dependence.

Compared to classical WpW_p bounds (see e.g., (Goldfeld et al., 2020, Nietert et al., 2022))—which in high-dimensions suffer curse-of-dimensionality rates n1/dn^{-1/d} unless smoothed—these results yield dimension-free or nearly dimension-free exponents, leveraging the structure of adaptedness.

Furthermore, the metric domination bridge allows adaptation of total variation and entropy-based statistical machinery to pathwise optimal transport, opening new avenues for quantitative analysis of robust and sequential models that rely on adapted couplings.

6. Adapted Wasserstein Convergence in Context

Method Convergence Rate Regularity requirements
Kernel-smoothed empirical AW1AW_1 O(n1/(dT+2))O(n^{-1/(dT+2)}) Finite moments, Lipschitz kernel
Adapted-projected smoothed empirical O(n1/(dT))O(n^{-1/(dT)}) (for dT3dT \geq 3) Compactness, exponential moments, smoothing
Unsmooth empirical (pathwise) AW1AW_1 No general convergence

The approach generalizes:

  • Classic W1W_1 empirical measure convergence (O(n1/d)O(n^{-1/d})) to path-dependent and bi-causal settings.
  • Sliced/smoothed Wasserstein and robust estimation regimes, as in (Nietert et al., 2022), by quantifying the smoothing-variance interplay and restoring high-dimensional reliability.

7. Broader Significance and Future Directions

The analysis established in (Hou, 26 Jan 2024) resolves longstanding bottlenecks related to statistical non-convergence of empirical measures in the adapted Wasserstein framework and rigorously quantifies the effectiveness of kernel smoothing and adapted discretizations. The results:

  • Enable robust quantitative calibration and uncertainty quantification in time-adapted stochastic control, optimal stopping, and financial risk assessment.
  • Provide a blueprint for extending empirical process theory to increasingly complex, high-dimensional, and non-Markovian dynamical systems.
  • Suggest future research directions in adaptivity-aware optimal transport, including adaptive grid construction, bandwidth selection, and sequential empirical process control.
  • Generalize to scalability contexts relevant for high-dimensional generative modeling, distributional reinforcement learning, and robust statistics.

These developments establish foundational, nonasymptotic, and algorithmically meaningful rates for Wasserstein convergence in adapted and smoothed empirical analysis, integrating classical statistical convergence, modern pathwise transport, and high-dimensional probability.

References:

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Wasserstein Convergence Guarantees.