Empirical Wasserstein Distance

Updated 16 January 2026

Empirical Wasserstein distance is a metric that measures discrepancies between probability distributions using optimal transport techniques on both discrete and continuous spaces.
Recent research establishes explicit distributional theorems, finite-sample rates, and consistent bootstrap methodologies to support rigorous statistical inference.
Its practical applications span fingerprint verification, metagenomic analysis, and distributional robustness, making it a valuable tool in high-dimensional data analysis.

The empirical Wasserstein distance quantifies the discrepancy between probability distributions using observed data, forming a pivotal object for statistical inference, quantization, probabilistic modeling, machine learning, and distributional robustness. On a finite or discrete sample space, as well as for general continuous distributions, its definition, limiting behavior, and inferential framework rely on optimal transport theory and the structure of empirical processes. Recent research has established explicit distributional theorems, finite-sample rates, confidence interval constructions, and connections between empirical Wasserstein distances and linear programming, Hadamard differentiability, and alternative resampling methods.

1. Definition and Linear Programming Structure

On a finite metric space $X = \{x_1, \ldots, x_N\}$ with metric $d$ , any probability measure is a mass vector $r = (r_1, \ldots, r_N) \in \mathbb{R}^N$ , $r_i \geq 0$ , $\sum_i r_i = 1$ . The empirical Wasserstein distance of order $p \geq 1$ between two vectors $r, s$ is

$W_p(r, s) := \left( \min_{w \in \Pi(r, s)} \sum_{i, j = 1}^N d(x_i, x_j)^p w_{ij} \right)^{1/p}$

where $\Pi(r, s)$ is the set of joint distributions with prescribed marginals: $w_{ij} \geq 0$ , $\sum_j w_{ij} = r_i$ , $\sum_i w_{ij} = s_j$ .

This optimization is formulated as a linear program (LP), whose dual exposes the intrinsic geometry of Wasserstein balls:

Primal (transport plan): minimizes the total transport cost under marginal constraints.
Dual: maximizes the affine objective over dual variables $(u, v)$ , subject to $u_i + v_j \leq d_{ij}^p$ .

Solution sets $\Phi_p^*(r, s)$ contain all optimal dual pairs and are critical for sensitivity analysis and limit distribution formulations (Sommerfeld et al., 2016).

2. Asymptotic Distribution and Limit Theorems

Suppose samples $X_1, \ldots, X_n$ i.i.d. from $r$ , $Y_1, \ldots, Y_m$ i.i.d. from $s$ , giving empirical laws $r_n, s_m$ . Define multinomial covariance matrices and centered Gaussian random vectors: $G \sim N(0, \Sigma(r)), \quad H \sim N(0, \Sigma(s)), \quad \rho_{n,m} = \left( \frac{nm}{n+m} \right)^{1/2}$ The main theorem (Sommerfeld et al., 2016) characterizes weak convergence for all $p \geq 1$ :

One-sample null ( $r$ known): $n^{1/(2p)} W_p(r_n, r) \Rightarrow \{ \max_{u \in \Phi_p^*} \langle G, u \rangle \}^{1/p}$
One-sample alternative ( $r \neq s$ known): $\sqrt{n}(W_p(r_n, s) - W_p(r, s)) \Rightarrow p^{-1} W_p(r, s)^{1-p} \cdot \max_{(u, v) \in \Phi_p^*(r, s)} [ - (\langle u, G \rangle + \langle v, G \rangle) ]$
Two-sample null ( $r = s$ ): $\rho_{n,m}^{1/p} W_p(r_n, s_m) \Rightarrow \{ \max_{u \in \Phi_p^*} \langle G, u \rangle \}^{1/p}$
Two-sample alternative: $\rho_{n,m}(W_p(r_n, s_m) - W_p(r, s)) \Rightarrow p^{-1} W_p(r, s)^{1-p} \cdot \max_{(u, v) \in \Phi_p^*(r, s)} [ \sqrt{\lambda} \langle u, G \rangle + \sqrt{1-\lambda} \langle v, H \rangle ]$ , $\lambda = m/(n+m)$ .

This framework generalizes classical Donsker theory and reveals the optimal linear programming structure behind Wasserstein empirical processes.

3. Statistical Inference and Bootstrap Methodology

The non-linear, supreme-type Hadamard differentiability of $\phi(r,s) = W_p^p(r,s)$ (directional, but not full Fréchet) demands specialized approaches to produce valid inferential results:

Classical ("n-out-of-n") bootstrap fails: It does not capture the limiting law since the local empirical process is not sufficiently approximated (Sommerfeld et al., 2016).
Consistent alternatives:
- m-out-of-n bootstrap: Selects $m$ subsamples (with $m/n \rightarrow 0$ ), consistent but involves additional tuning.
- Directional bootstrap (Fang–Santos): Uses the empirical process vector, plugged into the explicit Hadamard derivative, yielding consistent inference.

This theoretical apparatus permits the construction of confidence intervals and hypothesis tests for empirical Wasserstein distances in various regimes.

4. Finite-Sample Confidence Regions and Quantile Calculations

Explicit formulas for finite sample inference are developed as follows (Sommerfeld et al., 2016, Cohen et al., 2019):

For null hypotheses ( $r = s$ $r = s$ ), compute quantiles of the limit law via simulation:
1. Estimate population vectors and covariances from $r_n, s_m$ .
2. Generate Gaussian samples for $G, H$ , solve the dual LP for each replicate, and power-transform as needed.
3. Empirical quantiles determine confidence intervals.

For one-dimensional cases, tight bounds on the normalized quantiles have been established (functional CLT; Brownian bridge), and explicit confidence regions are constructed via extremal distributions—mixtures of two-point and uniform distributions on the unit interval (Cohen et al., 2019). Extensions to higher dimensions are conjectured, supported by numerical evidence.

5. Examples and Applications

Empirical Wasserstein distance underpins applied analyses across multiple domains:

Fingerprint Verification: Empirical minutiae histograms compared via Wasserstein provide significant differentiations between real and synthetic datasets. Optimal transport plans visualize mass shifts among feature bins.
Metagenomic Communities: Wasserstein-based pairwise sample distances, along with confidence intervals, quantitatively distinguish intra- and inter-personal differences in gut microbiome composition.
Distributional Robustness and Goodness-of-Fit: The limit laws for empirical Wasserstein distances provide finite-sample guarantees for robust optimization, model validation, and two-sample or one-sample tests in generative and biological modeling.
Quantization: The empirical Wasserstein distance matches optimal rates of finite quantization in Euclidean and infinite-dimensional Gaussian spaces, demonstrating rate-optimality for empirical processes (Boissard et al., 2011).

6. Technical Underpinnings: Hadamard Differentiability and LP Sensitivity

Central to the limiting theory is the directional Hadamard differentiability of the Wasserstein cost functional on finite spaces. Sensitivity analysis of parametric LP yields the precise form of the derivative: $\phi'_{(r,s)}(h_1, h_2) = \max_{(u, v) \in \Phi_p^*(r, s)} [ - (\langle u, h_1 \rangle + \langle v, h_2 \rangle ) ]$ This sensitivity analysis enables direct application of the functional delta method for limiting distributions. For continuous underlying distributions (e.g., Gaussian), explicit closed-form derivatives and CLTs for the empirical Wasserstein distance have been derived, leveraging Spectral and Fréchet differentiability (Rippl et al., 2015).

7. Computational Implementation and Moderate Sample Validity

All inferential statements and confidence region calculations can be performed using standard LP solvers and multivariate normal samplers. Theoretical quantile formulas and simulation-based approximations support valid inference even for moderate sample sizes, with empirical evidence demonstrating the effectiveness and sharpness of derived finite-sample intervals (Sommerfeld et al., 2016).

In summary, the empirical Wasserstein distance is characterized, on finite metric spaces, by an explicit LP-optimization structure. Its distributional and inferential properties are governed by the geometry of the transport polyhedron and the Hadamard differentiability of the transport cost. The combination of theoretical insights, dual LP sensitivity, and computational tractability enables rigorous hypothesis testing, confidence interval construction, and broad application in complex, high-dimensional data analysis.