Gromov–Wasserstein Optimal Transport

Updated 26 April 2026

GWOT is a generalization of optimal transport that compares metric measure spaces by aligning their intrinsic relational geometries.
It employs techniques like entropic regularization and Sinkhorn iterations to overcome nonconvex optimization challenges and enhance scalability.
GWOT has significant applications in graph matching, shape analysis, computational biology, and optimal control through advanced theoretical and algorithmic developments.

Gromov–Wasserstein Optimal Transport (GWOT) is a generalization of optimal transport (OT) that quantifies the “distance” between two metric measure spaces (mm-spaces), even when the underlying spaces are not directly comparable. Instead of transporting mass according to a ground cost between source and target points, GWOT transports the relational structure by minimizing discrepancies between intra-domain distances. This property grants GWOT invariance to isometries and enables matching of entities such as graphs, point clouds, distributions on manifolds, or heterogeneous datasets lacking shared coordinate systems. The GWOT framework forms the basis for a rich theory that extends to entropic and fused variants, multi-marginal and barycenter problems, algorithmic advances for scalability, and semidefinite relaxations, with impactful applications in shape analysis, graph matching, computational biology, computer vision, and data-driven optimal control.

1. Mathematical Foundations

Let $(X, d_X, \mu_X)$ and $(Y, d_Y, \mu_Y)$ denote two metric measure spaces, where $d_X: X \times X \rightarrow \mathbb{R}_+$ and $d_Y: Y \times Y \rightarrow \mathbb{R}_+$ are distance matrices (or more generally, symmetric cost functions), and $\mu_X$ , $\mu_Y$ are probability measures. GWOT measures the discrepancy by comparing all pairwise distances amongst points in $X$ to those in $Y$ via a coupling $\pi \in \Pi(\mu_X, \mu_Y)$ (joint probability with marginals $\mu_X, \mu_Y$ ):

$(Y, d_Y, \mu_Y)$ 0

For discrete measures on finite supports, this becomes a quadratic optimization over the transport plan $(Y, d_Y, \mu_Y)$ 1:

$(Y, d_Y, \mu_Y)$ 2

where $(Y, d_Y, \mu_Y)$ 3 are distance matrices and $(Y, d_Y, \mu_Y)$ 4 are marginals (Zhang et al., 2024, Tran et al., 13 Feb 2025, Tran et al., 20 Apr 2025).

GWOT is a metric on isomorphism classes of metric–measure spaces, invariant under measure-preserving isometries (Mémoli et al., 2022). The relaxation to couplings instead of deterministic maps distinguishes Kantorovich GW from the Monge-type Gromov–Monge (GM) distance; for non-atomic spaces, GW and GM coincide (Mémoli et al., 2022).

2. Computational Algorithms and Scalability

The underlying GW objective is a nonconvex quadratic program, rendering global optimization NP-hard even for finite supports (Tran et al., 13 Feb 2025, Chen et al., 2023). Couplings are typically optimized via iterative solvers with linearization:

Gradient Linearization: Given current coupling $(Y, d_Y, \mu_Y)$ 5, form the cost matrix $(Y, d_Y, \mu_Y)$ 6.
Entropic Regularization: To mitigate nonconvexity and enhance proximity to convex OT, an entropy penalty is included:

$(Y, d_Y, \mu_Y)$ 7

where $(Y, d_Y, \mu_Y)$ 8 (Zhang et al., 2024, Houry et al., 6 Feb 2026, Seyedi et al., 4 Sep 2025).

Sinkhorn-type Inner Steps: Each iteration alternates between (a) updating the cost matrix (gradient step) and (b) solving a regularized OT problem using Sinkhorn scaling (Seyedi et al., 4 Sep 2025).
Complexity: Standard GW solvers have per-iteration complexity $(Y, d_Y, \mu_Y)$ 9 due to dense $d_X: X \times X \rightarrow \mathbb{R}_+$ 0 operations. Recent advances employ dynamic programming to reduce this to $d_X: X \times X \rightarrow \mathbb{R}_+$ 1 for 1D grids and certain structured cases, yielding up to $d_X: X \times X \rightarrow \mathbb{R}_+$ 2 empirical speed-up at $d_X: X \times X \rightarrow \mathbb{R}_+$ 3 with no loss of accuracy (Zhang et al., 2024). New quadratic-memory, quadratic-time solvers exploit low-rank or feature-space lifting of distortion penalties and are scalable to $d_X: X \times X \rightarrow \mathbb{R}_+$ 4 points (Houry et al., 6 Feb 2026).
Multi-initialization: Because GW is nonconvex, practical solvers use multiple random restarts (GW-MultiInit) to avoid poor local minima (Seyedi et al., 4 Sep 2025).
Algorithmic Variants:
- Unbalanced and Conic GW: Semi-couplings and conic metrics enable robust comparison when measures have different total masses (Oliver et al., 14 Aug 2025).
- Fused GW (FGW): Optimizes a convex combination of feature and structure matching for objects carrying both attributes (Vayer et al., 2018, Seyedi et al., 4 Sep 2025).

3. Theoretical Advances and Relaxations

GWOT admits several theoretically motivated relaxations and extensions:

Semidefinite Programming (SDP) and Sum-of-Squares (SOS) Hierarchies: The quadratic GW objective over the coupling polytope is naturally lifted via moment–SOS relaxations, leading to a sequence of tractable SDPs converging to the GW optimum. The first level matches classical metric relaxations, while higher levels guarantee convergence with explicit rates ( $d_X: X \times X \rightarrow \mathbb{R}_+$ 5), and each level defines a genuine pseudo-metric satisfying symmetry and the triangle inequality (Tran et al., 13 Feb 2025, Tran et al., 20 Apr 2025, Chen et al., 2023).
Variational Scalings and Robustness: The CGW (Conic GW) metric is robust to total variation perturbations and interpolates between balanced GW and unbalanced OT with provable convergence and scaling properties (Oliver et al., 14 Aug 2025).
Linearized GW (LGW): GW can be linearized in the tangent space at a reference space, enabling rapid pairwise computations in large collections of mm-spaces (Beier et al., 2022).
Monge-Knothe and Subspace Detours: GW can be efficiently approximated by constructing optimal plans in strategically chosen subspaces and lifting them back to the full space, often yielding remarkably accurate matchings for high-dimensional data (Bonet et al., 2021).

4. Extensions: Barycenters, Multi-marginal, and Sliced GW

Barycenters and Fréchet Means

Given multiple mm-spaces $d_X: X \times X \rightarrow \mathbb{R}_+$ 6 with weights $d_X: X \times X \rightarrow \mathbb{R}_+$ 7, a GW barycenter is any minimizer $d_X: X \times X \rightarrow \mathbb{R}_+$ 8 of:

$d_X: X \times X \rightarrow \mathbb{R}_+$ 9

Computing barycenters induces a multi-marginal GW problem over a common support, with complexity mitigated through geodesic linearization, tangential fixpoint iterations, or multi-marginal Sinkhorn solvers (Beier et al., 2024, Beier et al., 2022). Tangential iterations guarantee monotonic descent and empirically yield accurate barycentric representations and scalable barycenter computation for thousands of points (Beier et al., 2024).

Multi-marginal and Sliced GW

Multi-marginal GW supports simultaneous matching over multiple spaces and is extensible via entropic and sliced approaches, achieving tractable higher-order alignments (Beier et al., 2022).
Sliced GW leverages 1D projections to obtain closed-form or accelerated solutions for special cases, particularly in 1D or for uniform grids (Zhang et al., 2024).

Fused Gromov–Wasserstein

FGW interpolates between alignment by features (classical OT) and structures (GW) via a weight $d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 0, admitting metric and geodesic properties, and is practically effective for mesh, graph, and time series analysis (Vayer et al., 2018).

5. Applications and Empirical Validation

GWOT and its variants underpin state-of-the-art algorithms in a variety of domains:

Semantic matching, keypoint correspondence, and shape analysis: Incorporation of GW as a spatial consistency prior in computer vision yields improved semantic correspondences, outperforming ground costs based on features alone and with orders-of-magnitude faster runtime than diffusion-based approaches (Snelgar et al., 3 Feb 2026).
Barycenter-based shape interpolation and multi-graph matching: Tangential fixpoint and multi-marginal formulations yield plausible barycenters and superior matching rates in graph-based tasks (Beier et al., 2024).
Graph and point cloud alignment: GWOT provides robust structural matching for graphs, including under the quadratic assignment paradigm (Seyedi et al., 4 Sep 2025).
Cross-modal and perturbation-response alignment: Extension to labeled GW promotes biologically informed many-to-many mapping in high-throughput biology (Ryu et al., 2024, Oliver et al., 14 Aug 2025).
Data-driven optimal control: Embedding closed-form Gaussian GW as a target distribution cost in density steering yields rotation-invariant and shape-aware optimal control policies (Nakashima et al., 8 Aug 2025).

Empirical regimes span hundreds of thousands of points (CNT-EGW (Houry et al., 6 Feb 2026)), dense images, real-world molecular data, and biomedical applications. Quadratic to near-linear memory and computational costs have been reported for large-scale tasks under suitable structural assumptions.

6. Continuous GWOT and Neural Approaches

While discrete GWOT is mature, continuous GWOT remains theoretically and computationally challenging:

Sample-based neural GW: Parameterizes the GW map via neural nets and recasts GWOT as a minimax saddle problem, allowing stochastic gradient-based scalability and possibly out-of-sample generalization (Carrasco et al., 2023).
Neural entropic GW: Recovers the GW cost and coupling at minimax-optimal parametric rates, using neural dual variables and unrolling Sinkhorn updates, verified with non-asymptotic error guarantees (Wang et al., 2023).
Challenges: No known solver exists with guaranteed global convergence or satisfactory performance for general costs, though closed-form solutions exist for special cases (e.g., Gaussian-to-Gaussian inner-product GW) (Carrasco et al., 2023, Wang et al., 2023). Practical applications rely on discrete, finite-support approximations and minibatch alternatives.

7. Practical Recommendations and Limitations

Initialization and Regularization: Entropic GW variants smooth nonconvexity but require careful tuning; multi-initialization substantially improves reliability (Seyedi et al., 4 Sep 2025).
Complexity and Scalability: Dynamic programming and feature-space lifting break cubic bottlenecks for regular structures or CNT-type costs (Zhang et al., 2024, Houry et al., 6 Feb 2026).
SOS and SDP Relaxations: Provide certifiable lower bounds and sometimes global optima for small instances, but scalability remains limited to $d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 1 (Tran et al., 13 Feb 2025, Chen et al., 2023).
Robustness and Extensions: Conic GW and unbalanced formulations enhance robustness to mass-mismatched and outlier-rich settings (Oliver et al., 14 Aug 2025).
Open Problems: Reliable solvers for continuous GW, rigorous convergence guarantees for neural and minimax approaches, and understanding when convex relaxations are tight in practice remain active areas of research (Tran et al., 20 Apr 2025, Carrasco et al., 2023).

Summary Table: GWOT Algorithmic Complexity

Method	Per-iteration Complexity	Memory	Scalability	Reference
Classic Entropic GW	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 2	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 3	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 4	(Zhang et al., 2024)
Fast Gradient Comp. (FGC)	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 5 (1D, grids)	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 6	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 7	(Zhang et al., 2024)
Feature-lifted CNT GW	$d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 8 (small $d_Y: Y \times Y \rightarrow \mathbb{R}_+$ 9)	$\mu_X$ 0	$\mu_X$ 1 (CNT-costs)	(Houry et al., 6 Feb 2026)
SOS / SDP Hierarchy	$\mu_X$ 2, $\mu_X$ 3	high	$\mu_X$ 4	(Tran et al., 13 Feb 2025, Chen et al., 2023)
Neural GW (mini-batch)	linear (batch size, epochs)	model size	high-dim, sample-based	(Wang et al., 2023, Carrasco et al., 2023)

References

Fast Gradient Computation for Gromov-Wasserstein Distance (Zhang et al., 2024)
Gromov-Wasserstein at Scale, Beyond Squared Norms (Houry et al., 6 Feb 2026)
Sum-of-Squares Hierarchy for the Gromov Wasserstein Problem (Tran et al., 13 Feb 2025)
Moment Sum-of-Squares Hierarchy for Gromov Wasserstein: Continuous Extensions and Sample Complexity (Tran et al., 20 Apr 2025)
Semidefinite Relaxations of the Gromov-Wasserstein Distance (Chen et al., 2023)
Comparison Results for Gromov-Wasserstein and Gromov-Monge Distances (Mémoli et al., 2022)
Neural Entropic Gromov-Wasserstein Alignment (Wang et al., 2023)
Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem (Carrasco et al., 2023)
Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties (Vayer et al., 2018)
Tangential Fixpoint Iterations for Gromov-Wasserstein Barycenters (Beier et al., 2024)
Multi-marginal Approximation of the Linear Gromov-Wasserstein Distance (Beier et al., 2022)
Conic Formulations of Transport Metrics for Unbalanced Measure Networks and Hypernetworks (Oliver et al., 14 Aug 2025)
Data-Driven Density Steering via the Gromov-Wasserstein Optimal Transport Distance (Nakashima et al., 8 Aug 2025)
Gromov-Wasserstein and optimal transport: from assignment problems to probabilistic numeric (Seyedi et al., 4 Sep 2025)
Subspace Detours Meet Gromov-Wasserstein (Bonet et al., 2021)
Gromov Wasserstein Optimal Transport for Semantic Correspondences (Snelgar et al., 3 Feb 2026)
Cross-modality Matching and Prediction of Perturbation Responses with Labeled Gromov-Wasserstein Optimal Transport (Ryu et al., 2024)