Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wasserstein-1 Distance Overview

Updated 10 April 2026
  • Wasserstein-1 Distance is a metric that measures the minimal linear cost to transport probability mass between distributions, using both coupling and Lipschitz dual formulations.
  • It enables efficient computation through methods like tree and sliced approximations, as well as parallel GPU solvers, making it practical for high-dimensional data analysis.
  • The metric underpins rigorous statistical inference, convergence results, and extensions in quantum and topological data analysis, highlighting its broad research impact.

The 1-Wasserstein distance, also known as the Earth Mover’s Distance (EMD), is a fundamental metric on the space of probability measures that quantifies the minimal cost required to transport mass from one distribution to another when the cost is measured linearly with respect to the distance. It is central in optimal transport theory and has widespread applications in probability, statistics, machine learning, signal processing, and quantum information. The mathematical structure of W1W_1 enables both primal (coupling-based) and dual (Lipschitz-test-function-based) characterizations, facilitates efficient computations in specific cases, and supports generalizations to structured data and quantum settings.

1. Fundamental Definitions and Duality

Let (X,d)(X,d) be a Polish metric space and μ,ν\mu,\nu Borel probability measures on XX. The 1-Wasserstein distance is defined by the optimal transport formulation: W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y), where Π(μ,ν)\Pi(\mu,\nu) is the set of all couplings of μ\mu and ν\nu. The Kantorovich-Rubinstein duality gives

W1(μ,ν)=supfLip1(X){fdμfdν},W_1(\mu,\nu) = \sup_{f\in\mathrm{Lip}_1(X)} \left\{ \int f\,d\mu - \int f\,d\nu \right\},

where Lip1(X)\mathrm{Lip}_1(X) consists of all real-valued functions with Lipschitz constant at most 1 with respect to (X,d)(X,d)0 (Angelis et al., 2021, Stéphanovitch et al., 2022, Coutin et al., 2019, Imaizumi et al., 2019).

On (X,d)(X,d)1, (X,d)(X,d)2 admits equivalent expressions:

  • Area between CDFs: (X,d)(X,d)3,
  • Quantile formulation: (X,d)(X,d)4, where (X,d)(X,d)5 is the cumulative distribution function (CDF) of (X,d)(X,d)6 and (X,d)(X,d)7 its quantile function (Angelis et al., 2021, Chhachhi et al., 2023).

2. Properties, Metric Structure, and Geometric Interpretation

(X,d)(X,d)8 is a true metric on (X,d)(X,d)9, the space of probability measures with finite first moment. The basic properties include:

  • Metric axioms: non-negativity, identity of indiscernibles, symmetry, and triangle inequality (Stéphanovitch et al., 2022, Duvenhage et al., 2022).
  • Topological implications: μ,ν\mu,\nu0 metrizes the weak convergence of probability measures augmented by convergence of first moments.
  • Geometric intuition: On μ,ν\mu,\nu1, μ,ν\mu,\nu2 is the area between CDFs; the optimal transport plan is realized by coupling quantiles, i.e., matching each μ,ν\mu,\nu3 between μ,ν\mu,\nu4 and μ,ν\mu,\nu5 (Angelis et al., 2021).
  • Explicit forms for location-scale families: For independent location-scale random variables μ,ν\mu,\nu6, μ,ν\mu,\nu7, specializing to explicit folded-distribution means for Gaussians (Chhachhi et al., 2023).

3. Algorithmic Aspects: Efficient Computation and Approximations

Computing μ,ν\mu,\nu8 exactly is tractable for small, low-dimensional discrete problems—typically as a linear program scaling cubically in the number of support points. For high-dimensional or large-scale applications, efficient approximations are essential:

  • Tree-Wasserstein approximation: The 1-Wasserstein distance is approximated via shortest-path metrics on tree structures, with the tree-Wasserstein distance providing closed-form and efficient (μ,ν\mu,\nu9) computation once edge weights are learned via convex L1-regularized regression (Yamada et al., 2022).
  • Randomly-shifted quadtree methods: For persistence diagrams, the 1-Wasserstein distance is approximated in near-linear time using quadtree-based OT-sketches, providing logarithmic approximation guarantees in the spread of the data (Chen et al., 2021).
  • Sliced and max-Sliced XX0: The Sliced 1-Wasserstein is the average over projected one-dimensional XX1 distances, retaining a dimension-free sample complexity and permitting fast Monte Carlo evaluation with explicit convergence guarantees (Xu et al., 2022).
  • Parallel and GPU-based flow solvers: For large-scale bipartite matching problems in topological data analysis, graph sparsification and parallelism are combined to scale XX2 computation to persistence diagrams with tens of thousands of points (Dey et al., 2021).

4. Limit Theorems, Statistical Inference, and Sample Complexity

The Wasserstein-1 distance supports a growing theory of limit results and statistical inference:

  • Empirical convergence: The central limit theorem holds under finite moment conditions for the Sliced XX3 and max-Sliced XX4, and empirical rates are XX5 in dimension XX6 for Sliced XX7 but are subject to the curse of dimensionality in the classical (non-sliced) case (Xu et al., 2022, Stéphanovitch et al., 2022, Jalowy, 2021).
  • Gaussian approximation for XX8-statistics: Statistical hypothesis tests and confidence intervals for XX9 can be constructed using DNN-approximated Lipschitz function classes and non-asymptotic Gaussian coupling, balancing approximation bias and variance to achieve near-optimal rates for multivariate empirical W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),0 (Imaizumi et al., 2019).
  • Distributional limits in stochastic processes: The W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),1 metric serves as a tool to quantify quantitative rates in functional limit theorems beyond the Kolmogorov–Smirnov setting, such as in pathwise Donsker-type theorems for random walks approximating Brownian motion in strong topologies (Coutin et al., 2019).

5. Generalizations and Quantum Extensions

The 1-Wasserstein distance admits natural generalizations:

  • Persistence diagrams and combinatorial structures: The W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),2 metric serves as the canonical distance between persistence diagrams, crucial in topological data analysis, where it is calculated via matching points in the plane to the diagonal at linear cost (Chen et al., 2021).
  • Matrix-valued and quantum analogues: The matricial W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),3 extends optimal transport to Hermitian matrix-valued densities using operator-norm and nuclear-norm formulations and gradient/divergence operators defined via commutators, with dual and dual-of-dual (flux) formulations providing computationally efficient convex programs (Chen et al., 2017).
  • Quantum channels: In the operator-algebraic context, a quantum W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),4 is defined on the space of unital completely positive (UCP) maps (channels) via a noncommutative gauge construction that reduces to the trace norm in the single-system case. The metric inherits additivity, stability, and is compatible with marginal reductions, enabling robust comparison of quantum channels (Duvenhage et al., 2022).

6. Asymptotics, Bounds, and Practical Implications

Several sharp quantitive results and bounds are established:

  • Rate of convergence: For empirical measures, W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),5 rates for W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),6 convergence hold in dimension W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),7; the convergence rate for the empirical spectral distribution of Ginibre matrices to the circular law in W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),8 is W1(μ,ν)=infπΠ(μ,ν)X×Xd(x,y)dπ(x,y),W_1(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{X \times X} d(x,y)\, d\pi(x,y),9 (Jalowy, 2021, Stéphanovitch et al., 2022).
  • Parameter-based bounds: For location-scale distributions, Π(μ,ν)\Pi(\mu,\nu)0 is bounded above by Π(μ,ν)\Pi(\mu,\nu)1; for Gaussians, this specializes to Π(μ,ν)\Pi(\mu,\nu)2 (Chhachhi et al., 2023).
  • Differential privacy impact: Gaussian or Laplace mechanisms increase Π(μ,ν)\Pi(\mu,\nu)3 by the expected norm of the added noise, providing explicit formulas for privacy-preserving data releases (Chhachhi et al., 2023).
  • Robustness comparisons: In high-dimensional limit settings, Π(μ,ν)\Pi(\mu,\nu)4 avoids logarithmic factors present in i.i.d. matching problems due to repulsion phenomena in random matrix eigenvalue distributions (Jalowy, 2021).

7. Applications and Significance in Contemporary Research

Π(μ,ν)\Pi(\mu,\nu)5 and its variants permeate diverse areas:

  • Generative modeling: The geometry of Π(μ,ν)\Pi(\mu,\nu)6 underlies Wasserstein GANs, where optimization in the space of 1-Lipschitz discriminators enables stable learning and captures geometry between data and generative distributions (Stéphanovitch et al., 2022).
  • Statistical methodology: Π(μ,ν)\Pi(\mu,\nu)7-based tests and confidence sets exploit the dual structure for robust, interpretable analysis of high-dimensional and structured data (Imaizumi et al., 2019).
  • Random matrix theory: Π(μ,ν)\Pi(\mu,\nu)8 quantitatively captures convergence to universal spectral laws beyond total variation or KL divergence (Jalowy, 2021).
  • Topological data analysis: As the canonical metric between persistence diagrams, Π(μ,ν)\Pi(\mu,\nu)9 enables scalable computational pipelines for understanding shape in data (Chen et al., 2021).
  • Quantum information: Noncommutative analogues of μ\mu0 provide tools for channel discrimination and quantum resource quantification, reflecting structural properties absent in scalar distances (Duvenhage et al., 2022, Chen et al., 2017).

The 1-Wasserstein distance thus functions as a central object in modern mathematical, statistical, and computational sciences, balancing structural rigor, metric interpretability, and versatility of application.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein-1 Distance.