Generalized Wasserstein Distance
- Generalized Wasserstein Distance is a metric extension that relaxes the constraints of classical optimal transport by allowing mass variation and accommodating complex data types.
- It introduces costs for mass creation and removal while incorporating dynamic formulations such as the generalized Benamou–Brenier scheme to model transport with source terms.
- Efficient computational methods like linear programming, sliced Wasserstein, and kernel approaches are developed to enhance modeling fidelity and scalability in applications.
A generalized Wasserstein distance is any distance-like functional extending the classical optimal transport (OT) distance to broader settings—such as measures of unequal mass, measures on complex structures, sets of measures, mixed or structured data, noncommutative probability, or by relaxing geometric or dual constraints. Its development is motivated by foundational, computational, and modeling needs in analysis, machine learning, physics, and beyond. The term encompasses a range of metrics and pseudo-metrics, each supported by rigorous analytic, geometric, and algorithmic frameworks.
1. Classical and Generalized Wasserstein Distance: Definitions and Core Properties
The classical -Wasserstein distance between probability measures of equal mass on is
where is the set of transport plans with marginals .
A major limitation is the requirement . The generalized Wasserstein distance resolves this by introducing a cost for mass addition/removal, and for transport: where means is absolutely continuous with respect to , with density in . Intuitively, an optimal plan may erase a portion of each measure at cost , then transport the matched mass at cost (Piccoli et al., 2013, Piccoli et al., 2012).
Key metric properties include:
- Non-negativity, symmetry, triangle inequality
- Scaling:
- Completeness: the metric space is complete
- Characterization of convergence: iff converges weakly to and is tight
In the special case , the dual representation recovers the flat (bounded-Lipschitz) metric: (Piccoli et al., 2013).
2. Dynamic Formulations: Generalized Benamou–Brenier and Source Terms
The classical Benamou–Brenier formula connects to minimal kinetic energy subject to the continuity equation. For generalized distances, the dynamic interpretation is extended to include source terms, i.e., mass creation and annihilation: with an extended action functional
and the minimal action coincides with (Piccoli et al., 2013).
Unbalanced OT generalizations for matrix-valued measures (e.g., the weighted Wasserstein–Bures distance) fit this framework via matrix-valued continuity equations (e.g., for measures in ) and convex action functionals, further generalizing OT to quantum/probabilistic matrix flows (Li et al., 2020).
3. Extensions: Measures of Sets, Networks, and Complex Data Structures
For robust applications and uncertainty quantification, generalized Wasserstein metrics for sets of measures, mixed-type data, or structured outputs have been developed.
- Wasserstein Distance for Sets: For (weakly compact, convex subsets of ), define
This Hausdorff-type distance metrizes weak convergence of sublinear expectations and is relevant to robust probability and financial mathematics (Li et al., 2015).
- Hybrid Norms for Mixed (Continuous-Categorical) Spaces: For random vectors with continuous and categorical components, a hybrid norm quantifies discrepancies, and OT is solved over joint couplings in the product space (Xia et al., 7 Jul 2025).
- -Gromov–Wasserstein (Z-GW) Distance: For "networks" with a general -valued kernel (e.g., graphs, shape graphs), the -GW framework yields a metric that subsumes many variants of GW by choice of , extending the notion of pairwise-structured optimal transport (Bauer et al., 2024).
4. Algorithmic and Computational Approaches
Different generalizations necessitate advanced solvers:
- Linear Programming (LP): Unbalanced OT variants (e.g., Piccoli–Rossi, Figalli–Gigli, Hellinger–Kantorovich) are formulated as finite-dimensional LPs or convex programs amenable to efficient solution via simplex, IPM, or dual methods with known convergence rates (Briani et al., 2024).
- Sliced and Generalized Sliced Wasserstein: These methods reduce high-dimensional OT to repeated 1D problems along projections (linear or nonlinear). The generalized sliced Wasserstein (GSW) distance applies the Radon or generalized Radon transform:
where parameterizes the family of projections, possibly nonlinear. Max-GSW further takes the supremum instead of average, facilitating data-adaptive slicing (Kolouri et al., 2019). Efficient approximation exploits random projections, moment methods, and high-dimensional concentration (Le et al., 2022).
- Kernel Wasserstein: OT computations in RKHS via kernel-lifting, with closed-form solutions for Gaussian measures in Hilbert space, and empirical Gram matrix algorithms; used for nonlinear data in imaging and structured domains (Oh et al., 2019).
5. Theoretical Developments: Dualities, Metric Geometry, and Generalization
- Duality: For , the dual problem recovers the flat metric, constraints are expressed via bounded-Lipschitz functions. More generally, dualities such as the Sobolev dual relax the Lipschitz requirement to integral constraints, broadening admissible test functions while maintaining metric and optimization properties, with implications for GANs and learning theory (Xu et al., 2020).
- Geodesic and Cone Structures: Generalized metrics often retain or extend geodesic, contractibility, or conic properties of classical OT. The weighted Wasserstein–Bures space for matrix measures, for instance, is a metric cone over the unit-trace submanifold, reflecting non-linear and non-uniform growth of distance with additive mixture (Li et al., 2020).
- Metrization of Topologies: Many generalizations preserve the ability to metrize weak convergence (possibly together with tightness or moment conditions), ensuring that convergence in the metric matches probabilistic convergence of sequences—crucial for applications in analysis and stochastic modeling (Piccoli et al., 2012, Li et al., 2015, Afham et al., 2024).
6. Special Forms: Generalized Bures, Earth-Mover, and Pivot-based Distances
- Generalized Bures–Wasserstein: On the manifold of positive-definite matrices, the Bures–Wasserstein structure is generalized via Riemannian geometry, defining distances via linearization (Riemannian log) at varying basepoints, and inducing a family of generalized fidelities. These recover and unify Uhlmann, Holevo, and Matsumoto fidelities in quantum information (Afham et al., 2024).
- Generalized Earth Mover's Distance (EMD): For multiple distributions (e.g., measures on the simplex), the "generalized EMD" computes minimal transport cost over joint couplings such that each marginal is respected, with cost per sample vector given by a combinatorial dispersion formula. Explicit formulas for the expectation and generalizations of the Cayley–Menger determinant relate EMD to multi-variate barycentric geometry (Erickson, 2024).
- Generalized Geodesics and Sliced Proxies: Generalized OT using pivot (base) measures creates Wasserstein geodesics in "hybrid" or non-Euclidean spaces. Sliced-Wasserstein generalized geodesic (SWGG) frameworks define distances as the minimum (over projections) of 1D OT costs, providing explicit transport plans and proxies for computational efficiency (Mahey et al., 2023).
7. Applications and Empirical Performance
Generalized Wasserstein distances underpin sensitivity analysis for dynamical models, random field reconstruction, shape analysis, robust deep learning (e.g., GANs), graph matching, quantum information, and uncertainty quantification in hybrid data settings.
- Traffic Modeling: Figalli–Gigli, Piccoli–Rossi, and Hellinger–Kantorovich distances capture and quantify boundary-driven or model-induced changes in macroscopic densities in traffic flow and other PDE-based models (Briani et al., 2024).
- Random Field Models and Mixed Data: Generalized enables direct learning of stochastic neural networks conforming to mixed-type structure, outperforming non-OT baselines for uncertainty quantification and dynamical system emulation (Xia et al., 7 Jul 2025).
- Machine Learning and Imaging: Sliced, GSW, and their fast deterministic counterparts enable scalable OT computations in generative modeling, autoencoders, and artifact/structure detection (Kolouri et al., 2019, Le et al., 2022, Oh et al., 2019).
Empirical results consistently show improved computational efficiency and modeling fidelity compared to classical OT, especially in high-dimensional or structurally complex domains.
Table: Selected Generalized Wasserstein Distances
| Name/Type | Key Definition/Feature | Reference |
|---|---|---|
| Allows for mass addition/removal at cost , transport at cost | (Piccoli et al., 2013, Piccoli et al., 2012) | |
| GSW, max-GSW | Integrates (or maximizes) $1$D Wasserstein distances over generalized Radon projections | (Kolouri et al., 2019) |
| Z-GW | Distance between -valued kernel networks | (Bauer et al., 2024) |
| Hybrid | OT for mixed continuous-categorical variables | (Xia et al., 7 Jul 2025) |
| Generalized Bures | Riemannian-geometric metric/fidelity for quantum states | (Afham et al., 2024) |
| Kernel Wasserstein | OT in RKHS, leveraging kernel trick | (Oh et al., 2019) |
| SWGG (min-SWGG) | Sliced proxy for generalized Wasserstein geodesics via pivot measure | (Mahey et al., 2023) |
| (Sobolev) Dualities | OT duality with integral (Sobolev) constraints on test functions | (Xu et al., 2020) |
Summary and Outlook
Generalized Wasserstein distances form a broad theoretical and algorithmic framework extending optimal transport to accommodate practical requirements—such as unbalanced mass, structured and mixed data, quantum and noncommutative analysis, robust statistics, and scalable high-dimensional computation. Fundamental properties (metricity, duality, geodesic structure, connection with weak convergence) are preserved or appropriately extended in each setting, underpinning diverse modern applications and continuing theoretical innovations.