Wasserstein Distance: Theory & Applications
- Wasserstein distance is a metric that quantifies discrepancies between probability measures by solving an optimal mass transport problem.
- It encompasses both deterministic (Monge) and relaxed (Kantorovich) formulations, providing a geometric framework for comparing distributions.
- Its applications span imaging, machine learning, PDE analysis, and data science, addressing issues like mass fluctuations and stability.
The Wasserstein distance, also known as the optimal transport (OT) distance or Earth Mover's Distance (EMD) in specific cases, is a fundamental metric that quantifies the discrepancy between probability measures by solving a mass transportation problem. It provides a geometric framework for comparing distributions and has found wide application across mathematics, probability theory, statistics, machine learning, computer vision, and the analysis of partial differential equations.
1. Mathematical Definition and Foundational Principles
Let be a complete separable metric space, and let denote the set of Borel probability measures on with finite moment. For , the –Wasserstein distance is defined as
where is the set of all transport plans (couplings) with marginals and .
The case admits the Kantorovich–Rubinstein dual representation: where denotes the Lipschitz constant of (Piccoli et al., 2013).
The Wasserstein distance is a bona fide metric on , satisfying nonnegativity, symmetry, identity of indiscernibles, and the triangle inequality (Panaretos et al., 2018). For to be finite, both measures must have finite –moment.
2. Interpretation, Variants, and Duality
Monge and Kantorovich Formulations
The OT formulation seeks the least-cost way of transporting one distribution to another. The Monge problem requires a deterministic transport map minimizing , while Kantorovich's relaxation allows for couplings and always achieves a minimum (Piccoli et al., 2013).
Benamou–Brenier Dynamical Characterization
For , there is a dynamic fluid-mechanical representation: subject to the continuity equation
The Flat Metric
For arbitrary (possibly unequal mass) Radon measures, the generalized Wasserstein distance coincides with the flat (bounded-Lipschitz) metric: (Piccoli et al., 2013).
3. Extensions and Computational Methods
Generalized Wasserstein Distance
For measures of possibly differing total mass and parameters , the generalized Wasserstein distance is defined as
where is the total variation of the "removed" mass, and the infimum is over pairs with equal total mass. The term penalizes creation/removal, the transport, and controls aggregation (Piccoli et al., 2013).
Generalized Benamou–Brenier Formula
A dynamic formulation extends to : where encodes sources/sinks and the continuity equation has a source term . This subsumes pure mass transport and allows for creation/removal (Piccoli et al., 2013).
Existence and Homogeneity
is a metric on the cone of nonnegative Radon measures, is homogeneous for any , and attains its infimum for each pair of measures (Piccoli et al., 2013).
4. Analytical and Practical Properties
Mass Mismatch and Total Variation
When and , reduces to the pure , and when , it reduces to the total variation norm. For , , the equality (flat metric) holds (Piccoli et al., 2013). Explicitly, for , ,
exhibiting the tradeoff between removal/addition and transportation costs.
Connection to Partial Differential Equations
Wasserstein distances and their generalizations are especially relevant for evolution equations such as the continuity equation with source, where one typically needs to compare measures of variable mass. The framework is adapted to these contexts and yields contraction or stability estimates even for solutions that do not preserve total mass (Piccoli et al., 2013).
Limits and Interpolations
provides a continuous interpolation between distance (as , penalizing all transport) and the classical Wasserstein distance (as , no penalty for creation/removal). This is particularly valuable in applications such as comparing histograms of unequal mass—common in imaging and statistical data analysis.
5. Theoretical and Algorithmic Framework
Fenchel–Legendre Duality
The proof of the equivalence between the and the flat metric relies on convex analysis and Fenchel–Legendre duality: the sum of convex indicators for and leads, via a theorem of Rockafellar, to a dual representation that exactly matches the primal definition (Piccoli et al., 2013).
Algorithmic Considerations
For , the dynamic programming Benamou–Brenier approach yields an explicit minimization over velocity fields and source terms. The infimum is realized, and the action can be constructed explicitly through "sample-and-hold" schemes that alternate between mass removal, transport, and creation in small time intervals. Convexity and stability under flow are key technical lemmas supporting these constructions.
Examples of Computation
For measures concentrated on points with different masses, optimal decomposition may entail only mass removal/addition, only transport, or a mixture, determined by the ratio . If , it's optimal to remove/add all; otherwise, it pays to transport part of the mass.
6. Applications and Implications
Imaging, Data Analysis, and Beyond
The generalized Wasserstein metric allows meaningful comparison of data distributions (histograms, point clouds) with mass fluctuations. This is essential in image processing and vision, where illumination or occlusion can alter total mass, and in statistical analysis of data sets with missing data or over-sampling.
PDE Theory and Contractivity
has enabled new existence and stability results for evolution equations with source terms, accommodating solutions where total mass is not preserved, and guaranteeing meaningful contractivity in this extended framework (Piccoli et al., 2013).
Hierarchical Relation to
recovers and the total variation metric in limits and thus underlies a unifying theory for purely geometric transport and purely mass error terms.
References:
- Piccoli, B. & Rossi, F. "On properties of the Generalized Wasserstein distance" (Piccoli et al., 2013)
- Benamou, J.-D. & Brenier, Y. "A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem"
- Villani, C. "Optimal Transport: Old and New," Springer
This summarization encapsulates the structure, properties, dualities, analytical formulations, and key application domains of the classical and generalized Wasserstein distances as rigorously delineated in (Piccoli et al., 2013).