Wasserstein Law of Large Numbers

Updated 22 February 2026

Wasserstein Law of Large Numbers is a framework that utilizes Wasserstein distances and optimal transport to rigorously measure convergence in empirical processes across various settings.
It provides quantitative convergence rates for time averages in Markov processes and empirical measures in interacting particle systems, achieving exponential and O(N⁻¹/²) rates.
The theory extends to infinite-dimensional and non-Euclidean spaces, offering insights into barycenters, operator means, and graphon particle systems via geometric convexity and contraction properties.

The Wasserstein Law of Large Numbers (LLN) encompasses a body of results establishing strong and weak law of large numbers, as well as propagation of chaos phenomena, for stochastic processes, random measures, empirical barycenters, and interacting particle systems, specifically phrased in probability measure spaces endowed with Wasserstein distances. These results employ optimal transport, contraction properties, and geometric convexity in Wasserstein spaces to quantitatively control the convergence of empirical spectral measures, time averages of Markov processes, or barycenters, often enabling sharp convergence rates and central limit theorems in non-Euclidean and infinite-dimensional settings.

1. Wasserstein Distances and Probabilistic Metric Geometry

Wasserstein distances $W_p$ metrize the space of Borel probability measures $\mathcal{P}_p(E)$ with finite $p$ th moment over a Polish metric space $(E,d)$ : $W_p(\mu,\nu) = \left( \inf_{\pi \in \Pi(\mu,\nu)} \int_{E \times E} d(x,y)^p \ \pi(dx,dy) \right)^{1/p}$ with $\Pi(\mu,\nu)$ the set of couplings of $\mu$ and $\nu$ . Critical specializations include $W_1$ (Kantorovich–Rubinstein) defined via duality with Lipschitz functions, and $W_2$ , which has geodesic-convexity properties crucial for barycentre theory and gradient flows.

In operator-theoretic contexts, one employs the Bures–Wasserstein distance for (possibly infinite-dimensional) positive definite operators, and in non-Euclidean metric geometry, the Wasserstein framework can be extended to CAT(0) and Gromov hyperbolic spaces, with appropriately generalized barycenters and contraction properties (Santoro et al., 2023, Ohta, 2022, Lim et al., 2019).

2. Law of Large Numbers via Contraction in Wasserstein Spaces

A canonical setting is the convergence of time averages for Markov processes. Let $(X_t)_{t \ge 0}$ be a Markov process on $(E,d)$ , with Feller transition semigroup $\{P_t\}_{t\ge0}$ and initial law $\mu_0 \in \mathcal{P}_1(E)$ . Assume exponential contraction in $W_1$ : $W_1(\mu P_t, \nu P_t) \le c e^{-\gamma t} W_1(\mu, \nu) \qquad\forall \mu,\nu \in \mathcal{P}_1(E).$ Any Lipschitz observable $\psi: E \to \mathbb{R}$ has time average

$A_T = \frac{1}{T} \int_0^T \psi(X_s) ds$

satisfying

$A_T \xrightarrow[T\to\infty]{\ \mathbb{P}\ }\int_E \psi\, d\mu_*$

where $\mu_*$ is the unique invariant measure (existence and uniqueness from Banach fixed-point in $(\mathcal{P}_1(E),W_1)$ ). Variances decay as $\operatorname{Var}(A_T)=O(T^{-1})$ . Full proofs and exponential mixing rates appear in (Komorowski et al., 2011).

This framework extends to functionals on non-stationary Markov processes and underpins ergodic theorems in Wasserstein geometry, replacing $L^2$ -spectral gap with $W_1$ -contraction in the absence of reversibility or additive structure.

3. Propagation of Chaos and Empirical Laws for Interacting Particles

For systems of $N$ interacting particles (e.g., generalized Dyson Brownian Motion, GDBM), empirical measures

$\mu_t^N = \frac{1}{N} \sum_{i=1}^N \delta_{\lambda_i(t)}$

under suitably regular SDEs and convex potentials $V$ are shown to satisfy a Wasserstein LLN: $\lim_{N\to\infty} \sup_{t\le T} W_p\left( \mathbb{E} \mu_t^N, \mu_t \right) = 0 \qquad (1 \le p < 2)$ where $(\mu_t)_{t \in [0,T]}$ solves a nonlinear McKean–Vlasov PDE as the $W_2$ -gradient flow of the Voiculescu free entropy. For quadratic $V$ , rates of $W_2(\mathbb{E}\mu_t^N, \mu_t) = O(N^{-1/2})$ hold. Uniqueness and contractivity via displacement convexity ensure well-posedness of the limiting dynamics (Li et al., 2014).

This setting subsumes mean-field particle systems, propagation of chaos, and the rigorous justification of hydrodynamic limits, with convergence in path-space $W_p$ , and applies in both spatially homogeneous and inhomogeneous (networked/graphon) interactions (Chen et al., 2024).

4. Barycenters, Operator Means, and Large-Sample Theory

Wasserstein spaces support well-posed barycenter (Fréchet mean) problems: $\bar{\mu} \in \arg\min_{\nu \in \mathcal{P}_p(E)} \mathbb{E} [ W_p^2(\mu, \nu) ].$ For Bures–Wasserstein barycenters of covariance operators in a separable Hilbert space, strong laws hold:

Empirical barycenters $\hat{\Sigma}_n$ are almost surely relatively compact;
Any subsequential limit is a population barycenter;
Under regularity and strict geodesic convexity, the empirical barycenter converges almost surely to the unique population barycenter (Santoro et al., 2023).

Key technicalities include compactness via the Loewner order in non-locally-compact operator spaces, strong operator convergence of optimal maps, and Fréchet/tangent-linearization for CLT-type results.

Parallel results are established in non-Euclidean and infinite-dimensional settings, notably in the strong LLN for Karcher means for positive-definite operators in Banach–Finsler geometry, with almost sure and $\mathbb{L}^1$ -rate convergence derived via Wasserstein contraction of the operator-valued resolvent/proximal flow (Lim et al., 2019).

5. Wasserstein LLN for Heterogeneous and Structured Systems

Graphon particle systems model large-scale weakly interacting dynamics on nonlinear network structures. Given a graphon $W:[0,1]^2\to [0,1]$ , particle SDEs yield empirical measures converging in $W_2$ to the law of a continuum graphon-driven McKean–Vlasov SDE. The limiting law serves as a spatio-temporal mean-field approximation for distributed stochastic learning over large heterogeneous networks. Explicit error estimates combine time-discretization, graphon-approximation, and particle-number effects, characterizing the LLN for networked stochastic systems (Chen et al., 2024).

6. Extensions: CAT(0), Gromov Hyperbolic Spaces, and Non-Euclidean Contexts

In CAT(0) and more generally Gromov hyperbolic spaces, barycentric laws of large numbers adapt to nonpositively curved or negatively curved geometric settings. The barycenter map $\mu \mapsto \operatorname{bar}(\mu)$ is 1-Lipschitz relative to $W_1$ in CAT(0), and up to $\delta^{1/4}$ additive error in $\delta$ -hyperbolic settings. Recursive stochastic proximal algorithms and deterministic "no-dice" approximations converge to barycenters with sample-path error controlled via Wasserstein distances and the space's curvature (Ohta, 2022).

Empirical law of large numbers for barycenters, deterministic approximations, and recursive updates (mirroring Karcher means and Proximal Point algorithms) have precise Wasserstein and metric error bounds depending on curvature and dimension, unifying and generalizing classical results from Euclidean and Hilbertian settings.

7. Significance, Key Properties, and Quantitative Rates

The Wasserstein Law of Large Numbers unifies diverse probabilistic limit theorems via the geometry of optimal transport:

It accommodates non-linear Markov semigroups, noncommutative operator means, interacting particle ensembles, and random processes on complex networks.
Contraction properties (spectral gap, displacement convexity) provide uniqueness and quantitative rates ( $O(T^{-1})$ variances for time averages, $O(N^{-1/2})$ convergence for empirical measures, $O(n^{-1})$ rates for mean operator iterates), often with explicit central limit theorems—see (Li et al., 2014, Santoro et al., 2023, Lim et al., 2019).
The generality extends to non-locally-compact and non-Euclidean spaces, with careful geometric control replacing classical Hilbert space tools.
These results underlie contemporary analysis in random matrix theory, stochastic process ergodicity, convex optimization in metric geometry, and mean-field approximation for high-dimensional stochastic systems.

A plausible implication is that convergence in Wasserstein spaces, controlled either via spectral gaps or gradient flows of displacement-convex functionals, acts as a universal mechanism for establishing quantitative LLNs and CLTs for empirical measures, averages, or barycenters in a wide range of settings, including infinite-dimensional and nonlinear geometric contexts (Komorowski et al., 2011, Li et al., 2014, Chen et al., 2024, Santoro et al., 2023, Lim et al., 2019, Ohta, 2022).