Generalized Wasserstein Geometries

Updated 2 April 2026

Generalized Wasserstein Geometries are a family of metric structures that expand classical optimal transport to include unbalanced measures, nonlinear projections, and operator-valued data.
They leverage innovative approaches such as slicing, Gromov–Wasserstein, and Bregman divergences to compute distances efficiently in complex and non-Euclidean settings.
This framework enables robust variational analysis and scalable algorithm design for applications in machine learning, functional data analysis, and statistical inference.

Generalized Wasserstein geometries consist of a broad family of metric structures and optimization frameworks that extend the classical Wasserstein space of probability measures. These generalizations encompass unbalanced transport (allowing for mass creation/destruction), Bregman divergences, slicing-based metrics that exploit nonlinear projections, Gromov–Wasserstein frameworks that are invariant to isometries and reflect structural information, geometric analysis on spaces of SPD matrices and operators, synthetic and barycentric curvature-dimension conditions on abstract measure-metric spaces—including infinite-dimensional and non-smooth settings—as well as iterated and hierarchical constructions. This article surveys major constructions, geometric and variational properties, and recent advances in such generalized optimal transport geometries.

1. Unbalanced and Source-Regularized Wasserstein Geometries

A primary generalization over classical Wasserstein space addresses the limitation that $W_p$ is only defined between measures with equal mass. The Piccoli–Rossi metric $W_p^{a,b}$ on finite measures introduces two nonnegative weights $a$ (mass creation/removal) and $b$ (transport) and, for $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ ,

$W_p^{a,b}(\mu,\nu) = \left[ \inf_{\substack{\tilde\mu \leq \mu,\,\tilde\nu \leq \nu\ |\tilde\mu|=|\tilde\nu|}}\, a^p\big(\|\mu-\tilde\mu\|_{\rm TV}+\|\nu-\tilde\nu\|_{\rm TV}\big)^p + b^p W_p(\tilde\mu,\tilde\nu)^p\, \right]^{1/p}.$

This metric interpolates between strict mass-conservation transport ( $a\to\infty$ ) and total variation ( $b\to\infty$ ), and is a complete metric on measure spaces, admitting a generalized Benamou–Brenier formula for $W_2^{a,b}$ involving both velocities and signed source measures $s_t$ in the continuity equation,

$W_p^{a,b}$ 0

The corresponding action is

$W_p^{a,b}$ 1

For $W_p^{a,b}$ 2 and $W_p^{a,b}$ 3, $W_p^{a,b}$ 4 coincides with the flat (bounded-Lipschitz) metric, i.e.,

$W_p^{a,b}$ 5

This structure underlies well-posedness for nonlinear transport equations with source terms and extends classical duality and stability—the metric space $W_p^{a,b}$ 6 is geodesic and stable under Gromov–Hausdorff convergence (Piccoli et al., 2013, Chung et al., 2019).

2. Sliced, Generalized Sliced, and Differentiable Sliced Wasserstein Distances

Metrics based on slicing project high-dimensional distributions onto lower-dimensional spaces to exploit the efficiency of 1D transport. The sliced Wasserstein metric,

$W_p^{a,b}$ 7

where $W_p^{a,b}$ 8 is projection onto direction $W_p^{a,b}$ 9, is generalized by replacing $a$ 0 with nonlinear or learnable functions $a$ 1. The generalized sliced Wasserstein distance (GSW) for nonlinear $a$ 2 is

$a$ 3

Recent developments provide deterministic and learnable function approximations (polynomials, neural nets), exploiting concentration of high-dimensional random projections, yielding scalability for high $a$ 4 and facilitating moment-based approximations. Differentiable generalized sliced Wasserstein plans (DGSWP) employ a bilevel scheme to select optimal nonlinear projections $a$ 5,

$a$ 6

with gradients efficiently estimated by Gaussian smoothing over $a$ 7 (Le et al., 2022, Chapel et al., 28 May 2025). Both Sliced and Generalized Sliced Wasserstein metrics are true metrics (injectivity conditions on $a$ 8) and admit low-complexity computation.

The min-SWGG proxy leverages generalized Wasserstein geodesics with a line-supported pivot, computing

$a$ 9

which provides a metric that metrizes weak convergence and yields efficient computation and explicit couplings (Mahey et al., 2023).

3. Gromov–Wasserstein and Linearized/Inner-Product Generalizations

Gromov–Wasserstein (GW) distances generalize $b$ 0 to compare metric-measure spaces up to measure-preserving isometry, defined by

$b$ 1

Linearized GW (LGW) and inner-product GW (IGW) geometries are proposed for computational tractability, e.g., LGW leverages barycentric projections and lot-based tangent embeddings, reducing quadratic complexity and retaining isometry-invariance properties (Beier et al., 2021). The IGW metric,

$b$ 2

is analyzed with an associated gradient flow and Riemannian structure; the induced mobility operator modifies the local Wasserstein gradients to encode global structure, with a Benamou–Brenier-like dynamic reformulation and an Otto-calculus-type gradient (Zhang et al., 2024).

4. Generalized Wasserstein Geometries on Metric-Measure and Infinite-Dimensional Spaces

The classical $b$ 3 geometry on probability measures can be generalized to extended metric-measure spaces $b$ 4, including abstract Wiener spaces and configuration spaces over Riemannian manifolds. A new barycentric curvature-dimension condition BCD $b$ 5 is imposed via Jensen-type variational inequalities for entropy at barycenters,

$b$ 6

with $b$ 7. This condition encompasses curvature-dimension properties of the Lott–Sturm–Villani and Ambrosio–Gigli–Savaré theories but is designed to handle branching/non-geodesic and infinite-dimensional settings. Stability under measured Gromov–Hausdorff convergence holds, and existence, uniqueness, and absolute continuity of barycenters are established under mild integrability conditions. Geometric and functional inequalities, including multi-marginal Brunn–Minkowski and functional Blaschke–Santaló inequalities, are obtained directly from the barycentric Jensen inequalities (Han et al., 2024).

Variational structures are lifted via category-theoretic functors to iterated Wasserstein spaces $b$ 8, with velocity plans and geodesics at every level, and gradient flows defined for suitable functionals (Vauthier, 3 Dec 2025). In spaces of all signed Radon measures, the quotient structure under group actions descends naturally to Wasserstein distances and is compatible with Gromov–Hausdorff stability (Chung et al., 2019).

5. Bregman–Wasserstein and Dualistic Information-Geometric Extensions

The Bregman–Wasserstein divergence arises from replacing the quadratic cost in classical OT by a Bregman divergence (generated by a strictly convex function $b$ 9) on $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 0: $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 1 where $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 2 is the canonical Bregman divergence. This framework induces displacement interpolations corresponding to so-called primal and dual geodesics in information geometry, recovers the classical Wasserstein geometry for $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 3, and transports the dualistic (Amari) geometric structure to infinite-dimensional statistical manifolds. An associated generalized Pythagorean theorem, dual connections (primal, dual, Levi-Civita), and corresponding JKO gradient flows are established (Kainth et al., 2023).

6. SPD Matrix and Operator-Valued Wasserstein Geometries

On the manifold of symmetric positive-definite (SPD) matrices $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 4, the $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 5-Wasserstein geometry is given by the metric tensor

$\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 6

where $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 7 solves $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 8. The geodesics, exponential and logarithm maps, and explicit positive curvature properties are available (Luo et al., 2020). The Bures–Wasserstein geometry, further generalized as GBW, introduces a metric tensor parameterized by $\mu,\nu \in \mathcal M(\mathbb{R}^d)$ 9, leading to geodesics and distances incorporating a Mahalanobis-type precision weighting. This anisotropic structure permits improved statistical efficiency and conditioning, and is generalized to infinite-dimensional and operator-valued settings—e.g., covariance operators acting on Hilbert spaces—via unitized Hilbert–Schmidt operators and an extended Mahalanobis norm. Operator-valued Procrustes geodesics, learnable regularization parameters, and tractable computational schemes for high-dimensional inference are provided (Han et al., 2021, Goomanee et al., 12 Nov 2025).

7. Geometry on Special Structures and Embeddings

In ultrametric spaces, the $W_p^{a,b}(\mu,\nu) = \left[ \inf_{\substack{\tilde\mu \leq \mu,\,\tilde\nu \leq \nu\ |\tilde\mu|=|\tilde\nu|}}\, a^p\big(\|\mu-\tilde\mu\|_{\rm TV}+\|\nu-\tilde\nu\|_{\rm TV}\big)^p + b^p W_p(\tilde\mu,\tilde\nu)^p\, \right]^{1/p}.$ 0 Wasserstein geometry collapses to an affine form, admitting an isometric embedding into a convex subset of an $W_p^{a,b}(\mu,\nu) = \left[ \inf_{\substack{\tilde\mu \leq \mu,\,\tilde\nu \leq \nu\ |\tilde\mu|=|\tilde\nu|}}\, a^p\big(\|\mu-\tilde\mu\|_{\rm TV}+\|\nu-\tilde\nu\|_{\rm TV}\big)^p + b^p W_p(\tilde\mu,\tilde\nu)^p\, \right]^{1/p}.$ 1 Banach space. Geodesics exist only for $W_p^{a,b}(\mu,\nu) = \left[ \inf_{\substack{\tilde\mu \leq \mu,\,\tilde\nu \leq \nu\ |\tilde\mu|=|\tilde\nu|}}\, a^p\big(\|\mu-\tilde\mu\|_{\rm TV}+\|\nu-\tilde\nu\|_{\rm TV}\big)^p + b^p W_p(\tilde\mu,\tilde\nu)^p\, \right]^{1/p}.$ 2, and otherwise connectivity is via Hölder arcs of exponent $W_p^{a,b}(\mu,\nu) = \left[ \inf_{\substack{\tilde\mu \leq \mu,\,\tilde\nu \leq \nu\ |\tilde\mu|=|\tilde\nu|}}\, a^p\big(\|\mu-\tilde\mu\|_{\rm TV}+\|\nu-\tilde\nu\|_{\rm TV}\big)^p + b^p W_p(\tilde\mu,\tilde\nu)^p\, \right]^{1/p}.$ 3 (Kloeckner, 2013). For time series and signed measures, the generalized Wasserstein geometry leverages Jordan decompositions and signed Cumulative Distribution Transforms (SCDT), embedding signals into a flat Hilbert space $W_p^{a,b}(\mu,\nu) = \left[ \inf_{\substack{\tilde\mu \leq \mu,\,\tilde\nu \leq \nu\ |\tilde\mu|=|\tilde\nu|}}\, a^p\big(\|\mu-\tilde\mu\|_{\rm TV}+\|\nu-\tilde\nu\|_{\rm TV}\big)^p + b^p W_p(\tilde\mu,\tilde\nu)^p\, \right]^{1/p}.$ 4 and providing interpretability and straight-line geodesics in classes generated by template deformations (Li et al., 2022).

8. Outlook and Applications

Generalized Wasserstein geometries enable the handling of mass transfer beyond strict conservation, greater expressivity in comparing complex distributions (e.g., via nonlinear projections or matrix-valued data), and robust analysis in non-Euclidean, infinite-dimensional, or branching settings. They are now central to algorithm design in scalable transport, functional data analysis, statistical learning with manifold constraints, configuration and Wiener spaces, and information-geometric optimization. Recent advances unify these constructions into a broader synthetic and categorical framework, facilitating further extension to abstract settings while retaining the core variational, geometric, and computational underpinnings of optimal transport (Piccoli et al., 2013, Han et al., 2024, Kainth et al., 2023, Goomanee et al., 12 Nov 2025, Chapel et al., 28 May 2025, Le et al., 2022).