Papers
Topics
Authors
Recent
Search
2000 character limit reached

Concentration Inequalities in Wasserstein Distance

Updated 23 January 2026
  • Wasserstein concentration inequalities are statistical bounds that quantify how empirical distributions deviate from underlying laws using transport metrics.
  • They leverage transport–entropy and metric entropy techniques to derive sub-Gaussian tail bounds and optimal finite-sample rates across various regimes.
  • Applications span statistical learning, random matrix theory, quantum information, and stochastic processes, providing rigorous error guarantees in complex systems.

Concentration inequalities in Wasserstein distance quantify the probability that random probability measures—typically empirical distributions arising from finite samples—deviate from their underlying population law, with the deviation measured in the Wasserstein metric. This framework, originating in the study of the “concentration of measure” phenomenon, underpins theoretical guarantees and finite-sample error control across probability, statistics, learning theory, random matrix theory, quantum information, and statistical physics. Multiple regimes, model classes, and metrics (including W1W_1, WpW_p for p1p\geq 1, sliced/projected Wasserstein, and quantum generalizations) exhibit sharp non-asymptotic bounds, which can depend on ambient or intrinsic dimension, support geometry, tail behavior, and process structure.

1. Definitions and Frameworks

The pp–Wasserstein distance between probability measures μ\mu and ν\nu on a Polish metric space (E,d)(E, d) is defined as

Wp(μ,ν):=(infπC(μ,ν)E×Ed(x,y)pπ(dx,dy))1/pW_p(\mu, \nu) := \left( \inf_{\pi \in \mathcal{C}(\mu, \nu)} \int_{E \times E} d(x, y)^p\, \pi(dx, dy) \right)^{1/p}

for p1p \geq 1, where C(μ,ν)\mathcal{C}(\mu, \nu) denotes the set of couplings of μ\mu and ν\nu (Fournier et al., 2013, Dedecker et al., 16 Jan 2026, Chafai et al., 2016). The $1$-Wasserstein admits a dual formulation via the Kantorovich–Rubinstein theorem, making it an integral metric over 1-Lipschitz functions.

Transport–entropy inequalities, such as the Tp(C)T_p(C) (Talagrand) inequality:

Wp(ν,μ)2CH(νμ)W_p(\nu, \mu) \leq \sqrt{2C H(\nu\mid\mu)}

where H(νμ)H(\nu|\mu) denotes relative entropy, serve as the pivot for deriving measure concentration—sub-Gaussian tail bounds for Lipschitz observables under μ\mu (Khoshnevisan et al., 2017, Boissard, 2011, Park, 25 Jul 2025).

2. Classical Concentration Bounds: Rates, Regimes, and Dimensionality

For the empirical measure Ln=n1i=1nδXiL_n = n^{-1} \sum_{i=1}^n \delta_{X_i} from i.i.d. samples XiμX_i \sim \mu, concentration inequalities in WpW_p fall into several regimes, with rates determined by moment conditions and (ambient or intrinsic) dimension (Fournier et al., 2013, Dedecker et al., 16 Jan 2026, Lei, 2018):

  • Sub-Gaussian regime (p>d/2p > d/2): EWp(Ln,μ)=O(n1/2)\mathbb{E} W_p(L_n, \mu) = O(n^{-1/2}) up to logarithmic factors.
  • Critical regime (p=d/2p = d/2): EWp(Ln,μ)=O(n1/2logn)\mathbb{E} W_p(L_n, \mu) = O(n^{-1/2} \log n).
  • Curse-of-dimensionality regime (p<d/2p < d/2): EWp(Ln,μ)=O(np/d)\mathbb{E} W_p(L_n, \mu) = O(n^{-p/d}) (the classical quantization rate).

High-probability (tail) inequalities mirror these rates, yielding

P(Wp(Ln,μ)t)exp(cntβ),\mathbb{P}(W_p(L_n, \mu) \geq t) \leq \exp(- c n t^{\beta}),

where the exponent β\beta interpolates between $2$ and d/pd/p depending on pp and dd (Fournier et al., 2013, Chafai et al., 2016).

By refining metric entropy arguments, these bounds extend to intrinsic (covering/Hausdorff) dimension α\alpha, so that—for empirical measures supported on sets with covering number N(S,δ)β(Δ/δ)αN(S, \delta) \leq \beta (\Delta/\delta)^{\alpha}—the same n1/αn^{-1/\alpha} rates and corresponding concentration inequalities hold (Dedecker et al., 16 Jan 2026).

3. Concentration Under Functional and Transport-Entropy Inequalities

When the law μ\mu satisfies a transport–entropy inequality (classically T1(C)T_1(C) or T2(C)T_2(C)), exponential concentration of W1(Ln,μ)W_1(L_n, \mu) arises. For example, Boissard (Boissard, 2011) proves:

P(W1(Ln,μ)t)C(t)exp(Knt2)\mathbb{P}\big(W_1(L_n, \mu) \geq t\big) \leq C(t) \exp\big(- K n t^2\big)

assuming only T1(C)T_1(C) and exponential integrability of μ\mu. The constant KK is explicitly related to the transport-entropy constant.

For measures on bounded domains or with strong exponential tails, the bound holds globally; for more general settings, additional double-exponential prefactors may enter but the sub-Gaussian exponent in nt2n t^2 persists.

Tensorization arguments and Laplace functional techniques (Herbst's argument) link TpT_p inequalities to concentration of 1-Lipschitz functionals and empirical measures. Such strategies underpin many advanced bounds, including those for Gaussian, product, and Markov chain measures (Park, 25 Jul 2025, Boissard, 2011, Barbour et al., 2019).

4. Advanced Variants: High Dimension, Intrinsic Geometry, and Non-Euclidean/Wasserstein Variants

Several refinements and variants address settings where classical bounds are suboptimal.

  • Intrinsic Dimension: For measures supported on lower-dimensional (e.g., mm-dim Riemannian manifold, fractal, or with covering dimension α\alpha), n1/αn^{-1/\alpha} rates are sharp, and all concentration regimes (small/large deviation, moderate/large deviations, almost-sure convergence) persist with α\alpha replacing dd (Dedecker et al., 16 Jan 2026).
  • Projected/Sliced Wasserstein: Projected or sliced Wasserstein distances bypass curse-of-dimensionality rates; for instance, for the Sliced W1W_1, one has Op(n1/2)O_p(n^{-1/2}) concentration with dimension-independent exponents under second moment assumptions (Xu et al., 2022, Wang et al., 2020). Projected Wasserstein distances in kk-dimensional subspaces allow interpolation between high-dimensional and low-dimensional rates, with explicit trade-offs between kk and nn.
  • Occupation Measures and Markov Chains: For ergodic Markov chains with contractivity in Wasserstein, empirical laws concentrate sharply about the invariant distribution, with the contraction rate Propagating directly into sub-Gaussian/Poissonian tail bounds (Barbour et al., 2019, Boissard, 2011).
  • Infinite-Dimensional/Functional Data: Extension to Banach/Hilbert space-valued data is accomplished via telescoping block decompositions and hierarchical coupling, yielding tight rates for functional classes with polynomial or exponential decay (Lei, 2018). For Gaussian processes/ellipsoidal moment classes and their empirical measures, the mean WpW_p bias decays at rates determined by the coordinate decay.
  • Quantum Wasserstein: Quantum Markov semigroups admit analogues of classical TC1TC_1, TC2TC_2, logarithmic Sobolev, and Poincaré inequalities in the quantum setting, with the corresponding concentration bounds for quantum states (e.g., depolarizing semigroup) controlled via quantum Wasserstein metrics and noncommutative Lipschitz norms (Rouzé et al., 2017).

5. Functional Inequalities, Stein Discrepancy, and Information Geometry

Improved concentration inequalities, which relate entropy, Fisher information, and Stein discrepancy to Wasserstein distance, have been established—e.g., the HSI (entropy–Stein discrepancy–information) and WS (Wasserstein–Stein) inequalities (Cheng et al., 2021). These can yield strictly sharper bounds compared to classical Talagrand/log-Sobolev inequalities:

  • For a measure μ=eVvol\mu = e^{-V} \mathrm{vol} on a Riemannian manifold (M,g)(M,g),

W2(ν,μ)S(νμ)0Ψ(t)dt,W_2(\nu, \mu) \leq S(\nu|\mu) \int_0^\infty \sqrt{\Psi(t)} dt,

where SS is the Stein discrepancy and Ψ\Psi encodes curvature. Additional HWSI inequalities improve upon W222H/KW_2^2 \leq 2H/K (Talagrand) by exploiting the nontrivial geometry of μ\mu and MM (Cheng et al., 2021).

6. Applications in Statistical Learning, High-Dimensional Inference, and Random Matrices

  • Statistical Learning: Concentration in Wasserstein is foundational to statistical consistency and finite-sample precision of learning algorithms based on integral probability metrics (e.g., WGANs), with generalization bounds scaling as either n1/dn^{-1/d} (bounded metric-entropy), n1/2n^{-1/2} (finite-moment), and associated exponential tails controlled by Rademacher complexity (Birrell, 2024).
  • Gaussian Approximation: Recent advances use Stein’s method and exchangeable pairs to produce computable, non-asymptotic W1W_1 bounds between sample mean and its Gaussian target, achieving sub-Gaussian tails and optimal O(1/n)O(1/\sqrt{n}) rates uniformly in nn (Austern et al., 2022).
  • Random Matrix Theory and Coulomb Gases: In multi-particle Coulomb systems, the empirical spectral law exhibits sub-Gaussian concentration in W1W_1 at rate ecN2r2e^{-cN^2r^2}, matching nonasymptotic large deviations and improving earlier results with eO(N)re^{-O(N)r} scaling (Chafai et al., 2016).

7. Extensions, Limitations, and Future Directions

  • Infinite-Dimensional Processes and SPDEs: Extension of T2T_2 inequalities and concentration in Wasserstein to measure-valued laws of SPDEs is established for the 1D parabolic case with space-time white noise. Coupling and Girsanov techniques replace classical log-Sobolev functional arguments (Khoshnevisan et al., 2017).
  • Quantum and Noncommutative Regimes: Quantum analogues of Wasserstein distance, transport, and concentration inequalities rely on recent noncommutative metric and entropy constructs, and are active areas of investigation for quantum state tomography and parameter estimation (Rouzé et al., 2017).
  • Adapted Wasserstein and Stochastic Processes: For discrete-time stochastic processes, adapted Wasserstein distances and their transport-entropy inequalities extend concentration to path-space, with process-level causal constraints resulting in optimal O(T)O(\sqrt{T}) dependence on time horizon (Park, 25 Jul 2025).

A plausible implication is that concentration in Wasserstein distance—when properly localized to intrinsic geometry, support regularity, or process structure—achieves sub-Gaussian or optimal sample-complexity rates in a diverse array of models, encompassing both classical, modern high-dimensional, quantum, and infinite-dimensional regimes. The machinery is thus central to understanding the behavior of empirical measures, statistical estimators, Markov processes, and many-body systems across mathematical and applied disciplines.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Concentration Inequalities in Wasserstein Distance.