Concentration Inequalities in Wasserstein Distance

Updated 23 January 2026

Wasserstein concentration inequalities are statistical bounds that quantify how empirical distributions deviate from underlying laws using transport metrics.
They leverage transport–entropy and metric entropy techniques to derive sub-Gaussian tail bounds and optimal finite-sample rates across various regimes.
Applications span statistical learning, random matrix theory, quantum information, and stochastic processes, providing rigorous error guarantees in complex systems.

Concentration inequalities in Wasserstein distance quantify the probability that random probability measures—typically empirical distributions arising from finite samples—deviate from their underlying population law, with the deviation measured in the Wasserstein metric. This framework, originating in the study of the “concentration of measure” phenomenon, underpins theoretical guarantees and finite-sample error control across probability, statistics, learning theory, random matrix theory, quantum information, and statistical physics. Multiple regimes, model classes, and metrics (including $W_1$ , $W_p$ for $p\geq 1$ , sliced/projected Wasserstein, and quantum generalizations) exhibit sharp non-asymptotic bounds, which can depend on ambient or intrinsic dimension, support geometry, tail behavior, and process structure.

1. Definitions and Frameworks

The $p$ –Wasserstein distance between probability measures $\mu$ and $\nu$ on a Polish metric space $(E, d)$ is defined as

$W_p(\mu, \nu) := \left( \inf_{\pi \in \mathcal{C}(\mu, \nu)} \int_{E \times E} d(x, y)^p\, \pi(dx, dy) \right)^{1/p}$

for $p \geq 1$ , where $\mathcal{C}(\mu, \nu)$ denotes the set of couplings of $\mu$ and $\nu$ (Fournier et al., 2013, Dedecker et al., 16 Jan 2026, Chafai et al., 2016). The $1$-Wasserstein admits a dual formulation via the Kantorovich–Rubinstein theorem, making it an integral metric over 1-Lipschitz functions.

Transport–entropy inequalities, such as the $T_p(C)$ (Talagrand) inequality:

$W_p(\nu, \mu) \leq \sqrt{2C H(\nu\mid\mu)}$

where $H(\nu|\mu)$ denotes relative entropy, serve as the pivot for deriving measure concentration—sub-Gaussian tail bounds for Lipschitz observables under $\mu$ (Khoshnevisan et al., 2017, Boissard, 2011, Park, 25 Jul 2025).

2. Classical Concentration Bounds: Rates, Regimes, and Dimensionality

For the empirical measure $L_n = n^{-1} \sum_{i=1}^n \delta_{X_i}$ from i.i.d. samples $X_i \sim \mu$ , concentration inequalities in $W_p$ fall into several regimes, with rates determined by moment conditions and (ambient or intrinsic) dimension (Fournier et al., 2013, Dedecker et al., 16 Jan 2026, Lei, 2018):

Sub-Gaussian regime ( $p > d/2$ ): $\mathbb{E} W_p(L_n, \mu) = O(n^{-1/2})$ up to logarithmic factors.
Critical regime ( $p = d/2$ ): $\mathbb{E} W_p(L_n, \mu) = O(n^{-1/2} \log n)$ .
Curse-of-dimensionality regime ( $p < d/2$ ): $\mathbb{E} W_p(L_n, \mu) = O(n^{-p/d})$ (the classical quantization rate).

High-probability (tail) inequalities mirror these rates, yielding

$\mathbb{P}(W_p(L_n, \mu) \geq t) \leq \exp(- c n t^{\beta}),$

where the exponent $\beta$ interpolates between $2$ and $d/p$ depending on $p$ and $d$ (Fournier et al., 2013, Chafai et al., 2016).

By refining metric entropy arguments, these bounds extend to intrinsic (covering/Hausdorff) dimension $\alpha$ , so that—for empirical measures supported on sets with covering number $N(S, \delta) \leq \beta (\Delta/\delta)^{\alpha}$ —the same $n^{-1/\alpha}$ rates and corresponding concentration inequalities hold (Dedecker et al., 16 Jan 2026).

3. Concentration Under Functional and Transport-Entropy Inequalities

When the law $\mu$ satisfies a transport–entropy inequality (classically $T_1(C)$ or $T_2(C)$ ), exponential concentration of $W_1(L_n, \mu)$ arises. For example, Boissard (Boissard, 2011) proves:

$\mathbb{P}\big(W_1(L_n, \mu) \geq t\big) \leq C(t) \exp\big(- K n t^2\big)$

assuming only $T_1(C)$ and exponential integrability of $\mu$ . The constant $K$ is explicitly related to the transport-entropy constant.

For measures on bounded domains or with strong exponential tails, the bound holds globally; for more general settings, additional double-exponential prefactors may enter but the sub-Gaussian exponent in $n t^2$ persists.

Tensorization arguments and Laplace functional techniques (Herbst's argument) link $T_p$ inequalities to concentration of 1-Lipschitz functionals and empirical measures. Such strategies underpin many advanced bounds, including those for Gaussian, product, and Markov chain measures (Park, 25 Jul 2025, Boissard, 2011, Barbour et al., 2019).

4. Advanced Variants: High Dimension, Intrinsic Geometry, and Non-Euclidean/Wasserstein Variants

Several refinements and variants address settings where classical bounds are suboptimal.

Intrinsic Dimension: For measures supported on lower-dimensional (e.g., $m$ -dim Riemannian manifold, fractal, or with covering dimension $\alpha$ ), $n^{-1/\alpha}$ rates are sharp, and all concentration regimes (small/large deviation, moderate/large deviations, almost-sure convergence) persist with $\alpha$ replacing $d$ (Dedecker et al., 16 Jan 2026).
Projected/Sliced Wasserstein: Projected or sliced Wasserstein distances bypass curse-of-dimensionality rates; for instance, for the Sliced $W_1$ , one has $O_p(n^{-1/2})$ concentration with dimension-independent exponents under second moment assumptions (Xu et al., 2022, Wang et al., 2020). Projected Wasserstein distances in $k$ -dimensional subspaces allow interpolation between high-dimensional and low-dimensional rates, with explicit trade-offs between $k$ and $n$ .
Occupation Measures and Markov Chains: For ergodic Markov chains with contractivity in Wasserstein, empirical laws concentrate sharply about the invariant distribution, with the contraction rate Propagating directly into sub-Gaussian/Poissonian tail bounds (Barbour et al., 2019, Boissard, 2011).
Infinite-Dimensional/Functional Data: Extension to Banach/Hilbert space-valued data is accomplished via telescoping block decompositions and hierarchical coupling, yielding tight rates for functional classes with polynomial or exponential decay (Lei, 2018). For Gaussian processes/ellipsoidal moment classes and their empirical measures, the mean $W_p$ bias decays at rates determined by the coordinate decay.
Quantum Wasserstein: Quantum Markov semigroups admit analogues of classical $TC_1$ , $TC_2$ , logarithmic Sobolev, and Poincaré inequalities in the quantum setting, with the corresponding concentration bounds for quantum states (e.g., depolarizing semigroup) controlled via quantum Wasserstein metrics and noncommutative Lipschitz norms (Rouzé et al., 2017).

5. Functional Inequalities, Stein Discrepancy, and Information Geometry

Improved concentration inequalities, which relate entropy, Fisher information, and Stein discrepancy to Wasserstein distance, have been established—e.g., the HSI (entropy–Stein discrepancy–information) and WS (Wasserstein–Stein) inequalities (Cheng et al., 2021). These can yield strictly sharper bounds compared to classical Talagrand/log-Sobolev inequalities:

For a measure $\mu = e^{-V} \mathrm{vol}$ on a Riemannian manifold $(M,g)$ ,

$W_2(\nu, \mu) \leq S(\nu|\mu) \int_0^\infty \sqrt{\Psi(t)} dt,$

where $S$ is the Stein discrepancy and $\Psi$ encodes curvature. Additional HWSI inequalities improve upon $W_2^2 \leq 2H/K$ (Talagrand) by exploiting the nontrivial geometry of $\mu$ and $M$ (Cheng et al., 2021).

6. Applications in Statistical Learning, High-Dimensional Inference, and Random Matrices

Statistical Learning: Concentration in Wasserstein is foundational to statistical consistency and finite-sample precision of learning algorithms based on integral probability metrics (e.g., WGANs), with generalization bounds scaling as either $n^{-1/d}$ (bounded metric-entropy), $n^{-1/2}$ (finite-moment), and associated exponential tails controlled by Rademacher complexity (Birrell, 2024).
Gaussian Approximation: Recent advances use Stein’s method and exchangeable pairs to produce computable, non-asymptotic $W_1$ bounds between sample mean and its Gaussian target, achieving sub-Gaussian tails and optimal $O(1/\sqrt{n})$ rates uniformly in $n$ (Austern et al., 2022).
Random Matrix Theory and Coulomb Gases: In multi-particle Coulomb systems, the empirical spectral law exhibits sub-Gaussian concentration in $W_1$ at rate $e^{-cN^2r^2}$ , matching nonasymptotic large deviations and improving earlier results with $e^{-O(N)r}$ scaling (Chafai et al., 2016).

7. Extensions, Limitations, and Future Directions

Infinite-Dimensional Processes and SPDEs: Extension of $T_2$ inequalities and concentration in Wasserstein to measure-valued laws of SPDEs is established for the 1D parabolic case with space-time white noise. Coupling and Girsanov techniques replace classical log-Sobolev functional arguments (Khoshnevisan et al., 2017).
Quantum and Noncommutative Regimes: Quantum analogues of Wasserstein distance, transport, and concentration inequalities rely on recent noncommutative metric and entropy constructs, and are active areas of investigation for quantum state tomography and parameter estimation (Rouzé et al., 2017).
Adapted Wasserstein and Stochastic Processes: For discrete-time stochastic processes, adapted Wasserstein distances and their transport-entropy inequalities extend concentration to path-space, with process-level causal constraints resulting in optimal $O(\sqrt{T})$ dependence on time horizon (Park, 25 Jul 2025).

A plausible implication is that concentration in Wasserstein distance—when properly localized to intrinsic geometry, support regularity, or process structure—achieves sub-Gaussian or optimal sample-complexity rates in a diverse array of models, encompassing both classical, modern high-dimensional, quantum, and infinite-dimensional regimes. The machinery is thus central to understanding the behavior of empirical measures, statistical estimators, Markov processes, and many-body systems across mathematical and applied disciplines.