Empirical Wasserstein-1 Distance Overview
- Empirical Wasserstein-1 distance is a metric measuring discrepancies between distributions via optimal transport with linear cost.
- It provides precise asymptotic and finite-sample convergence rates that depend on support geometry and moment conditions.
- Practical computation leverages 1D sorting, multivariate linear programming, and approximations like tree-based methods and neural networks.
The empirical Wasserstein-1 distance, also known as the empirical earth mover’s distance, quantifies the discrepancy between empirical and population distributions or between two empirical distributions via optimal transport with linear cost. This metric plays a central role in probability, statistical inference, machine learning, and high-dimensional data analysis due to its mathematical tractability and direct connection to geometry and coupling. Both the theory and practice of empirical Wasserstein-1 distance are well-developed, with precise asymptotics, non-asymptotic deviation bounds, and efficient computation/accessibility in various settings.
1. Formal Definition and One-Dimensional Characterization
For probability measures on a Polish metric space , the 1-Wasserstein distance is defined as: where is the set of all couplings of and . The Kantorovich-Rubinstein duality gives: For empirical measures , , captures the optimal cost of transporting the empirical distribution to the true law.
On , a fundamental property is: for cumulative distribution functions , and quantile functions (Angelis et al., 2021). This quantile formula underpins both practical computation (requiring only sorting) and theoretical analysis.
2. Asymptotic and Finite-Sample Rates of Convergence
The rate at which as depends on the geometry of the support and moment conditions.
- General compact metric spaces: If the support has dimension (in a suitable sense), then (Weed et al., 2017, Boedihardjo, 4 Aug 2025, Fournier et al., 2013):
The optimality of these rates is established via dyadic partition coupling and metric entropy arguments.
- Moment conditions: With finite moment of order , these rates hold, with an additional term when is not large enough for the dimension (Fournier et al., 2013, Dedecker et al., 2018).
- Sub-Gaussian and exponential tails: Yield exponential deviations and matching (up to constants) moment bounds (A. et al., 2019, Fournier et al., 2013).
- Singular measures or low-dimensional support: Intrinsic dimension determines the empirical convergence rate. The theory covers pre-asymptotic regimes where effective dimensionality is lower than the ambient (Weed et al., 2017, Boedihardjo, 4 Aug 2025).
3. Limit Distributions and Weak Convergence
In dimension one with smooth density and regular tails, the plug-in statistic satisfies functional central limit theorems:
- Two-sample case : Under regularity, (Berthet et al., 2019) proves
where is an explicit quadratic form involving Brownian bridges and the quantile process.
- Goodness-of-fit case : The standard -CLT fails. The limiting distribution is non-Gaussian and the scaling rate is slower (controlled by the regular variation at zero of the cost function ), specifically
where is a standard Brownian bridge (Berthet et al., 2019).
- Finite metric spaces: For discrete support, is the value of a random linear program. The asymptotic distribution is the maximum of linear forms in a Gaussian random vector over the dual constraint set, yielding a non-classical limit; naive bootstrap fails, and valid alternatives are derived (Sommerfeld et al., 2016).
- General dimensions: Functional delta method and empirical process theory remain technically challenging; for , strong regularity is often required for CLTs.
4. Non-Asymptotic Deviation, Concentration, and Sample Complexity
Sharp deviation inequalities for are available under moment or transport-entropy assumptions:
- Deviation bounds: If is sub-Gaussian, for any ,
with explicit constants. For sub-exponential and heavy-tailed distributions, similar, though possibly polynomial, concentration rates hold (A. et al., 2019, Boissard, 2011, Fournier et al., 2013).
- Transport-entropy inequalities: If satisfies a -inequality (Gaussian-type concentration in ), McDiarmid's bounded differences yield for all ,
- Sample complexity: Achieving with probability requires in sub-Gaussian settings (A. et al., 2019).
5. Computational Methods and Approximations
Efficient practical and approximate computation of empirical is critical in large-scale applications:
- 1D exact computation: time via sorting and using either the empirical CDF or quantile formula (Angelis et al., 2021).
- Multivariate exact computation: reduces to a linear program of size ; complexity is typically .
- Tree-based Approximation (TWD): Embeds the data in a tree metric and solves a convex Lasso problem (nonnegative -regularized regression) for optimal edge weights, yielding linear-time approximate computation with quantifiable accuracy. Variance is reduced via tree-slicing (averaging over trees) (Yamada et al., 2022).
- Deep Network Approximation: In high-dimensional settings, the Lipschitz function class is approximated by 1-Lipschitz neural networks; the supremum in the dual representation is optimized over networks, enabling scalable hypothesis tests and confidence intervals via Gaussian multiplier bootstrap (Imaizumi et al., 2019).
| Method | Dimension | Computational Cost |
|---|---|---|
| 1D Sort+Pairing | 1 | |
| LP Solver | (network flow) | |
| Tree-Wasserstein | (tree nodes) | |
| ReLU Network Dual | (SGD/ADAM; bootstraps) |
6. Statistical Inference, Hypothesis Testing, and Confidence Bands
Empirical Wasserstein-1 distance underpins a variety of inference schemes:
- Hypothesis testing: Empirical -based one- and two-sample tests with Gaussian process bootstrap calibration have correct Type I error and comparable or superior performance to alternatives, even on singular supports (Imaizumi et al., 2019).
- Confidence intervals: Bootstrap quantiles of the supremum of Gaussian processes (approximating the empirical process indexed by 1-Lipschitz functions) yield valid CIs for and for functionals that are -Lipschitz (Imaizumi et al., 2019, Sommerfeld et al., 2016, A. et al., 2019).
- Applications: Used to rigorously quantify inter/intra-group distances in metagenomics and other high-dimensional histogram data, providing interpretable intervals and robust significance estimates even in challenging regimes (e.g., partially overlapping supports) (Sommerfeld et al., 2016).
7. Optimality, Quantization, and Theoretical Extensions
Empirical measures are, up to polylogarithmic factors, as effective as optimal uniform quantizers for 1-Wasserstein approximation:
- Quantization error: The expected empirical Wasserstein-1 distance nearly matches the minimal error over all -point uniform quantizers up to a factor , the gap being sharply characterized with multiscale decomposition, chaining arguments, and metric entropy bounds (Boedihardjo, 4 Aug 2025).
- Non-uniform quantizers: In many settings (e.g., absolutely continuous measures on ), the empirical quantization rate matches optimal non-uniform quantizers. However, for measures with small-mass fine-structure, polynomial factors may appear (Boedihardjo, 4 Aug 2025).
- Transport-entropy connections: -concentration unifies analysis of risk measures, generalizing CVaR and other quantile-related risk bounds to arbitrary -Lipschitz functionals (A. et al., 2019).
- Open questions: The necessity of polylogarithmic rate gaps in general spaces, precise characterization for -Wasserstein, and performance for strongly singular or heavy-tailed distributions remain subjects of active research (Boedihardjo, 4 Aug 2025).
References
- (Angelis et al., 2021) Why the 1-Wasserstein distance is the area between the two marginal CDFs.
- (Berthet et al., 2019) Weak convergence of empirical Wasserstein type distances.
- (Weed et al., 2017) Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance.
- (Fournier et al., 2013) On the rate of convergence in Wasserstein distance of the empirical measure.
- (Boedihardjo, 4 Aug 2025) Optimality of empirical measures as quantizers.
- (A. et al., 2019) A Wasserstein distance approach for concentration of empirical risk estimates.
- (Boissard, 2011) Simple bounds for the convergence of empirical and occupation measures in 1-Wasserstein distance.
- (Yamada et al., 2022) Approximating 1-Wasserstein Distance with Trees.
- (Imaizumi et al., 2019) Hypothesis Test and Confidence Analysis with Wasserstein Distance on General Dimension.
- (Sommerfeld et al., 2016) Inference for Empirical Wasserstein Distances on Finite Spaces.
- (Dedecker et al., 2018) Behavior of the empirical Wasserstein distance in under moment conditions.
- (Divol, 2021) A short proof on the rate of convergence of the empirical measure for the Wasserstein distance.