Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generalized Wasserstein Distance

Updated 2 March 2026
  • Generalized Wasserstein Distance is a metric extension that relaxes the constraints of classical optimal transport by allowing mass variation and accommodating complex data types.
  • It introduces costs for mass creation and removal while incorporating dynamic formulations such as the generalized Benamou–Brenier scheme to model transport with source terms.
  • Efficient computational methods like linear programming, sliced Wasserstein, and kernel approaches are developed to enhance modeling fidelity and scalability in applications.

A generalized Wasserstein distance is any distance-like functional extending the classical optimal transport (OT) distance to broader settings—such as measures of unequal mass, measures on complex structures, sets of measures, mixed or structured data, noncommutative probability, or by relaxing geometric or dual constraints. Its development is motivated by foundational, computational, and modeling needs in analysis, machine learning, physics, and beyond. The term encompasses a range of metrics and pseudo-metrics, each supported by rigorous analytic, geometric, and algorithmic frameworks.

1. Classical and Generalized Wasserstein Distance: Definitions and Core Properties

The classical pp-Wasserstein distance between probability measures μ~,ν~\tilde\mu, \tilde\nu of equal mass mm on Rd\mathbb R^d is

Wp(μ~,ν~)=(infπΠ(μ~,ν~)Rd×Rdxypdπ(x,y))1/pW_p(\tilde\mu,\tilde\nu) = \left( \inf_{\pi\in\Pi(\tilde\mu,\tilde\nu)} \int_{\mathbb R^d\times\mathbb R^d} |x-y|^p\, d\pi(x,y) \right)^{1/p}

where Π(μ~,ν~)\Pi(\tilde\mu, \tilde\nu) is the set of transport plans with marginals μ~,ν~\tilde\mu, \tilde\nu.

A major limitation is the requirement μ~=ν~|\tilde\mu| = |\tilde\nu|. The generalized Wasserstein distance Wpa,bW_p^{a,b} resolves this by introducing a cost aa for mass addition/removal, and bb for transport: Wpa,b(μ,ν)=infμ~μ,ν~νμ~=ν~(ap(μμ~+νν~)p+bpWp(μ~,ν~)p)1/pW_p^{a,b}(\mu,\nu) = \inf_{\substack{\tilde\mu\le\mu,\;\tilde\nu\le\nu\|\tilde\mu| = |\tilde\nu|}} \left( a^p \left(|\mu-\tilde\mu| + |\nu-\tilde\nu|\right)^p + b^p W_p(\tilde\mu,\tilde\nu)^p \right)^{1/p} where μ~μ\tilde\mu\leq\mu means μ~\tilde\mu is absolutely continuous with respect to μ\mu, with density in [0,1][0, 1]. Intuitively, an optimal plan may erase a portion of each measure at cost aa, then transport the matched mass at cost bb (Piccoli et al., 2013, Piccoli et al., 2012).

Key metric properties include:

  • Non-negativity, symmetry, triangle inequality
  • Scaling: Wpa,b(cμ,cν)=c1/pWpa,b(μ,ν)W_p^{a,b}(c\mu, c\nu) = c^{1/p} W_p^{a,b}(\mu, \nu)
  • Completeness: the metric space (M(Rd),Wpa,b)(\mathcal M(\mathbb R^d), W_p^{a,b}) is complete
  • Characterization of convergence: Wpa,b(μn,μ)0W_p^{a,b}(\mu_n, \mu) \to 0 iff μn\mu_n converges weakly to μ\mu and is tight

In the special case (a,b,p)=(1,1,1)(a,b,p)=(1,1,1), the dual representation recovers the flat (bounded-Lipschitz) metric: W11,1(μ,ν)=sup{φd(μν):φ1,φLip1}W_1^{1,1}(\mu,\nu) = \sup \left\{ \int \varphi\,d(\mu-\nu): \|\varphi\|_\infty \leq 1, |\varphi|_{\text{Lip}} \leq 1 \right\} (Piccoli et al., 2013).

2. Dynamic Formulations: Generalized Benamou–Brenier and Source Terms

The classical Benamou–Brenier formula connects W2W_2 to minimal kinetic energy subject to the continuity equation. For generalized distances, the dynamic interpretation is extended to include source terms, i.e., mass creation and annihilation: tμt+(vtμt)=ht,μ0=μ,  μ1=ν\partial_t \mu_t + \nabla\cdot(v_t \mu_t) = h_t,\quad \mu_{0} = \mu,\; \mu_1 = \nu with an extended action functional

Ba,b[μ,v,h]=a2(01 ⁣Rddht)2+b201 ⁣Rdvt(x)2dμt(x)dt\mathcal{B}^{a,b}[\mu, v, h] = a^2\left(\int_0^1\!\int_{\mathbb R^d} d|h_t|\right)^2 + b^2 \int_0^1\!\int_{\mathbb R^d} |v_t(x)|^2\, d\mu_t(x)\, dt

and the minimal action coincides with W2a,b(μ,ν)2W_2^{a,b}(\mu, \nu)^2 (Piccoli et al., 2013).

Unbalanced OT generalizations for matrix-valued measures (e.g., the weighted Wasserstein–Bures distance) fit this framework via matrix-valued continuity equations (e.g., for measures in M(Ω,S+n)\mathcal M(\Omega, S_+^n)) and convex action functionals, further generalizing OT to quantum/probabilistic matrix flows (Li et al., 2020).

3. Extensions: Measures of Sets, Networks, and Complex Data Structures

For robust applications and uncertainty quantification, generalized Wasserstein metrics for sets of measures, mixed-type data, or structured outputs have been developed.

  • Wasserstein Distance for Sets: For P1,P2\mathcal{P}_1, \mathcal{P}_2 (weakly compact, convex subsets of P(Ω)\mathcal{P}(\Omega)), define

Wp(P1,P2)=max{supμP1infνP2Wp(μ,ν),  supνP2infμP1Wp(μ,ν)}W_p(\mathcal{P}_1, \mathcal{P}_2) = \max \left\{ \sup_{\mu\in\mathcal{P}_1} \inf_{\nu\in\mathcal{P}_2} W_p(\mu, \nu),\; \sup_{\nu\in\mathcal{P}_2} \inf_{\mu\in\mathcal{P}_1} W_p(\mu, \nu) \right\}

This Hausdorff-type distance metrizes weak convergence of sublinear expectations and is relevant to robust probability and financial mathematics (Li et al., 2015).

  • Hybrid Norms for Mixed (Continuous-Categorical) Spaces: For random vectors yRd1×Sdd1y\in \mathbb{R}^{d_1} \times S_{d-d_1} with continuous and categorical components, a hybrid norm quantifies discrepancies, and OT is solved over joint couplings in the product space (Xia et al., 7 Jul 2025).
  • ZZ-Gromov–Wasserstein (Z-GW) Distance: For "networks" with a general ZZ-valued kernel (e.g., graphs, shape graphs), the ZZ-GW framework yields a metric that subsumes many variants of GW by choice of ZZ, extending the notion of pairwise-structured optimal transport (Bauer et al., 2024).

4. Algorithmic and Computational Approaches

Different generalizations necessitate advanced solvers:

  • Linear Programming (LP): Unbalanced OT variants (e.g., Piccoli–Rossi, Figalli–Gigli, Hellinger–Kantorovich) are formulated as finite-dimensional LPs or convex programs amenable to efficient solution via simplex, IPM, or dual methods with known convergence rates (Briani et al., 2024).
  • Sliced and Generalized Sliced Wasserstein: These methods reduce high-dimensional OT to repeated 1D problems along projections (linear or nonlinear). The generalized sliced Wasserstein (GSW) distance applies the Radon or generalized Radon transform:

GSWp(μ,ν)=(ΩWpp(μω,νω)dω)1/p\mathrm{GSW}_p(\mu, \nu) = \left( \int_{\Omega} W_p^p(\mu_\omega, \nu_\omega)\, d\omega \right)^{1/p}

where ω\omega parameterizes the family of projections, possibly nonlinear. Max-GSW further takes the supremum instead of average, facilitating data-adaptive slicing (Kolouri et al., 2019). Efficient approximation exploits random projections, moment methods, and high-dimensional concentration (Le et al., 2022).

  • Kernel Wasserstein: OT computations in RKHS via kernel-lifting, with closed-form solutions for Gaussian measures in Hilbert space, and empirical Gram matrix algorithms; used for nonlinear data in imaging and structured domains (Oh et al., 2019).

5. Theoretical Developments: Dualities, Metric Geometry, and Generalization

  • Duality: For W11,1W_1^{1,1}, the dual problem recovers the flat metric, constraints are expressed via bounded-Lipschitz functions. More generally, dualities such as the Sobolev dual relax the Lipschitz requirement to integral constraints, broadening admissible test functions while maintaining metric and optimization properties, with implications for GANs and learning theory (Xu et al., 2020).
  • Geodesic and Cone Structures: Generalized metrics often retain or extend geodesic, contractibility, or conic properties of classical OT. The weighted Wasserstein–Bures space for matrix measures, for instance, is a metric cone over the unit-trace submanifold, reflecting non-linear and non-uniform growth of distance with additive mixture (Li et al., 2020).
  • Metrization of Topologies: Many generalizations preserve the ability to metrize weak convergence (possibly together with tightness or moment conditions), ensuring that convergence in the metric matches probabilistic convergence of sequences—crucial for applications in analysis and stochastic modeling (Piccoli et al., 2012, Li et al., 2015, Afham et al., 2024).

6. Special Forms: Generalized Bures, Earth-Mover, and Pivot-based Distances

  • Generalized Bures–Wasserstein: On the manifold Pd\mathbb P_d of positive-definite matrices, the Bures–Wasserstein structure is generalized via Riemannian geometry, defining distances via linearization (Riemannian log) at varying basepoints, and inducing a family of generalized fidelities. These recover and unify Uhlmann, Holevo, and Matsumoto fidelities in quantum information (Afham et al., 2024).
  • Generalized Earth Mover's Distance (EMD): For multiple distributions (e.g., dd measures on the simplex), the "generalized EMD" computes minimal transport cost over joint couplings such that each marginal is respected, with cost per sample vector given by a combinatorial dispersion formula. Explicit formulas for the expectation and generalizations of the Cayley–Menger determinant relate EMD to multi-variate barycentric geometry (Erickson, 2024).
  • Generalized Geodesics and Sliced Proxies: Generalized OT using pivot (base) measures creates Wasserstein geodesics in "hybrid" or non-Euclidean spaces. Sliced-Wasserstein generalized geodesic (SWGG) frameworks define distances as the minimum (over projections) of 1D OT costs, providing explicit transport plans and proxies for computational efficiency (Mahey et al., 2023).

7. Applications and Empirical Performance

Generalized Wasserstein distances underpin sensitivity analysis for dynamical models, random field reconstruction, shape analysis, robust deep learning (e.g., GANs), graph matching, quantum information, and uncertainty quantification in hybrid data settings.

  • Traffic Modeling: Figalli–Gigli, Piccoli–Rossi, and Hellinger–Kantorovich distances capture and quantify boundary-driven or model-induced changes in macroscopic densities in traffic flow and other PDE-based models (Briani et al., 2024).
  • Random Field Models and Mixed Data: Generalized W2W_2 enables direct learning of stochastic neural networks conforming to mixed-type structure, outperforming non-OT baselines for uncertainty quantification and dynamical system emulation (Xia et al., 7 Jul 2025).
  • Machine Learning and Imaging: Sliced, GSW, and their fast deterministic counterparts enable scalable OT computations in generative modeling, autoencoders, and artifact/structure detection (Kolouri et al., 2019, Le et al., 2022, Oh et al., 2019).

Empirical results consistently show improved computational efficiency and modeling fidelity compared to classical OT, especially in high-dimensional or structurally complex domains.


Table: Selected Generalized Wasserstein Distances

Name/Type Key Definition/Feature Reference
Wpa,bW_p^{a,b} Allows for mass addition/removal at cost aa, transport at cost bb (Piccoli et al., 2013, Piccoli et al., 2012)
GSW, max-GSW Integrates (or maximizes) $1$D Wasserstein distances over generalized Radon projections (Kolouri et al., 2019)
Z-GW Distance between ZZ-valued kernel networks (Bauer et al., 2024)
Hybrid W2W_2 OT for mixed continuous-categorical variables (Xia et al., 7 Jul 2025)
Generalized Bures Riemannian-geometric metric/fidelity for quantum states (Afham et al., 2024)
Kernel Wasserstein OT in RKHS, leveraging kernel trick (Oh et al., 2019)
SWGG (min-SWGG) Sliced proxy for generalized Wasserstein geodesics via pivot measure (Mahey et al., 2023)
(Sobolev) Dualities OT duality with integral (Sobolev) constraints on test functions (Xu et al., 2020)

Summary and Outlook

Generalized Wasserstein distances form a broad theoretical and algorithmic framework extending optimal transport to accommodate practical requirements—such as unbalanced mass, structured and mixed data, quantum and noncommutative analysis, robust statistics, and scalable high-dimensional computation. Fundamental properties (metricity, duality, geodesic structure, connection with weak convergence) are preserved or appropriately extended in each setting, underpinning diverse modern applications and continuing theoretical innovations.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Wasserstein Distance.