Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gromov–Wasserstein Optimal Transport

Updated 26 April 2026
  • GWOT is a generalization of optimal transport that compares metric measure spaces by aligning their intrinsic relational geometries.
  • It employs techniques like entropic regularization and Sinkhorn iterations to overcome nonconvex optimization challenges and enhance scalability.
  • GWOT has significant applications in graph matching, shape analysis, computational biology, and optimal control through advanced theoretical and algorithmic developments.

Gromov–Wasserstein Optimal Transport (GWOT) is a generalization of optimal transport (OT) that quantifies the “distance” between two metric measure spaces (mm-spaces), even when the underlying spaces are not directly comparable. Instead of transporting mass according to a ground cost between source and target points, GWOT transports the relational structure by minimizing discrepancies between intra-domain distances. This property grants GWOT invariance to isometries and enables matching of entities such as graphs, point clouds, distributions on manifolds, or heterogeneous datasets lacking shared coordinate systems. The GWOT framework forms the basis for a rich theory that extends to entropic and fused variants, multi-marginal and barycenter problems, algorithmic advances for scalability, and semidefinite relaxations, with impactful applications in shape analysis, graph matching, computational biology, computer vision, and data-driven optimal control.

1. Mathematical Foundations

Let (X,dX,μX)(X, d_X, \mu_X) and (Y,dY,μY)(Y, d_Y, \mu_Y) denote two metric measure spaces, where dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+ and dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+ are distance matrices (or more generally, symmetric cost functions), and μX\mu_X, μY\mu_Y are probability measures. GWOT measures the discrepancy by comparing all pairwise distances amongst points in XX to those in YY via a coupling πΠ(μX,μY)\pi \in \Pi(\mu_X, \mu_Y) (joint probability with marginals μX,μY\mu_X, \mu_Y):

(Y,dY,μY)(Y, d_Y, \mu_Y)0

For discrete measures on finite supports, this becomes a quadratic optimization over the transport plan (Y,dY,μY)(Y, d_Y, \mu_Y)1:

(Y,dY,μY)(Y, d_Y, \mu_Y)2

where (Y,dY,μY)(Y, d_Y, \mu_Y)3 are distance matrices and (Y,dY,μY)(Y, d_Y, \mu_Y)4 are marginals (Zhang et al., 2024, Tran et al., 13 Feb 2025, Tran et al., 20 Apr 2025).

GWOT is a metric on isomorphism classes of metric–measure spaces, invariant under measure-preserving isometries (Mémoli et al., 2022). The relaxation to couplings instead of deterministic maps distinguishes Kantorovich GW from the Monge-type Gromov–Monge (GM) distance; for non-atomic spaces, GW and GM coincide (Mémoli et al., 2022).

2. Computational Algorithms and Scalability

The underlying GW objective is a nonconvex quadratic program, rendering global optimization NP-hard even for finite supports (Tran et al., 13 Feb 2025, Chen et al., 2023). Couplings are typically optimized via iterative solvers with linearization:

  1. Gradient Linearization: Given current coupling (Y,dY,μY)(Y, d_Y, \mu_Y)5, form the cost matrix (Y,dY,μY)(Y, d_Y, \mu_Y)6.
  2. Entropic Regularization: To mitigate nonconvexity and enhance proximity to convex OT, an entropy penalty is included:

(Y,dY,μY)(Y, d_Y, \mu_Y)7

where (Y,dY,μY)(Y, d_Y, \mu_Y)8 (Zhang et al., 2024, Houry et al., 6 Feb 2026, Seyedi et al., 4 Sep 2025).

  1. Sinkhorn-type Inner Steps: Each iteration alternates between (a) updating the cost matrix (gradient step) and (b) solving a regularized OT problem using Sinkhorn scaling (Seyedi et al., 4 Sep 2025).
  2. Complexity: Standard GW solvers have per-iteration complexity (Y,dY,μY)(Y, d_Y, \mu_Y)9 due to dense dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+0 operations. Recent advances employ dynamic programming to reduce this to dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+1 for 1D grids and certain structured cases, yielding up to dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+2 empirical speed-up at dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+3 with no loss of accuracy (Zhang et al., 2024). New quadratic-memory, quadratic-time solvers exploit low-rank or feature-space lifting of distortion penalties and are scalable to dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+4 points (Houry et al., 6 Feb 2026).
  3. Multi-initialization: Because GW is nonconvex, practical solvers use multiple random restarts (GW-MultiInit) to avoid poor local minima (Seyedi et al., 4 Sep 2025).
  4. Algorithmic Variants:

3. Theoretical Advances and Relaxations

GWOT admits several theoretically motivated relaxations and extensions:

  • Semidefinite Programming (SDP) and Sum-of-Squares (SOS) Hierarchies: The quadratic GW objective over the coupling polytope is naturally lifted via moment–SOS relaxations, leading to a sequence of tractable SDPs converging to the GW optimum. The first level matches classical metric relaxations, while higher levels guarantee convergence with explicit rates (dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+5), and each level defines a genuine pseudo-metric satisfying symmetry and the triangle inequality (Tran et al., 13 Feb 2025, Tran et al., 20 Apr 2025, Chen et al., 2023).
  • Variational Scalings and Robustness: The CGW (Conic GW) metric is robust to total variation perturbations and interpolates between balanced GW and unbalanced OT with provable convergence and scaling properties (Oliver et al., 14 Aug 2025).
  • Linearized GW (LGW): GW can be linearized in the tangent space at a reference space, enabling rapid pairwise computations in large collections of mm-spaces (Beier et al., 2022).
  • Monge-Knothe and Subspace Detours: GW can be efficiently approximated by constructing optimal plans in strategically chosen subspaces and lifting them back to the full space, often yielding remarkably accurate matchings for high-dimensional data (Bonet et al., 2021).

4. Extensions: Barycenters, Multi-marginal, and Sliced GW

Barycenters and Fréchet Means

Given multiple mm-spaces dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+6 with weights dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+7, a GW barycenter is any minimizer dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+8 of:

dX:X×XR+d_X: X \times X \rightarrow \mathbb{R}_+9

Computing barycenters induces a multi-marginal GW problem over a common support, with complexity mitigated through geodesic linearization, tangential fixpoint iterations, or multi-marginal Sinkhorn solvers (Beier et al., 2024, Beier et al., 2022). Tangential iterations guarantee monotonic descent and empirically yield accurate barycentric representations and scalable barycenter computation for thousands of points (Beier et al., 2024).

Multi-marginal and Sliced GW

  • Multi-marginal GW supports simultaneous matching over multiple spaces and is extensible via entropic and sliced approaches, achieving tractable higher-order alignments (Beier et al., 2022).
  • Sliced GW leverages 1D projections to obtain closed-form or accelerated solutions for special cases, particularly in 1D or for uniform grids (Zhang et al., 2024).

Fused Gromov–Wasserstein

FGW interpolates between alignment by features (classical OT) and structures (GW) via a weight dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+0, admitting metric and geodesic properties, and is practically effective for mesh, graph, and time series analysis (Vayer et al., 2018).

5. Applications and Empirical Validation

GWOT and its variants underpin state-of-the-art algorithms in a variety of domains:

  • Semantic matching, keypoint correspondence, and shape analysis: Incorporation of GW as a spatial consistency prior in computer vision yields improved semantic correspondences, outperforming ground costs based on features alone and with orders-of-magnitude faster runtime than diffusion-based approaches (Snelgar et al., 3 Feb 2026).
  • Barycenter-based shape interpolation and multi-graph matching: Tangential fixpoint and multi-marginal formulations yield plausible barycenters and superior matching rates in graph-based tasks (Beier et al., 2024).
  • Graph and point cloud alignment: GWOT provides robust structural matching for graphs, including under the quadratic assignment paradigm (Seyedi et al., 4 Sep 2025).
  • Cross-modal and perturbation-response alignment: Extension to labeled GW promotes biologically informed many-to-many mapping in high-throughput biology (Ryu et al., 2024, Oliver et al., 14 Aug 2025).
  • Data-driven optimal control: Embedding closed-form Gaussian GW as a target distribution cost in density steering yields rotation-invariant and shape-aware optimal control policies (Nakashima et al., 8 Aug 2025).

Empirical regimes span hundreds of thousands of points (CNT-EGW (Houry et al., 6 Feb 2026)), dense images, real-world molecular data, and biomedical applications. Quadratic to near-linear memory and computational costs have been reported for large-scale tasks under suitable structural assumptions.

6. Continuous GWOT and Neural Approaches

While discrete GWOT is mature, continuous GWOT remains theoretically and computationally challenging:

  • Sample-based neural GW: Parameterizes the GW map via neural nets and recasts GWOT as a minimax saddle problem, allowing stochastic gradient-based scalability and possibly out-of-sample generalization (Carrasco et al., 2023).
  • Neural entropic GW: Recovers the GW cost and coupling at minimax-optimal parametric rates, using neural dual variables and unrolling Sinkhorn updates, verified with non-asymptotic error guarantees (Wang et al., 2023).
  • Challenges: No known solver exists with guaranteed global convergence or satisfactory performance for general costs, though closed-form solutions exist for special cases (e.g., Gaussian-to-Gaussian inner-product GW) (Carrasco et al., 2023, Wang et al., 2023). Practical applications rely on discrete, finite-support approximations and minibatch alternatives.

7. Practical Recommendations and Limitations

  • Initialization and Regularization: Entropic GW variants smooth nonconvexity but require careful tuning; multi-initialization substantially improves reliability (Seyedi et al., 4 Sep 2025).
  • Complexity and Scalability: Dynamic programming and feature-space lifting break cubic bottlenecks for regular structures or CNT-type costs (Zhang et al., 2024, Houry et al., 6 Feb 2026).
  • SOS and SDP Relaxations: Provide certifiable lower bounds and sometimes global optima for small instances, but scalability remains limited to dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+1 (Tran et al., 13 Feb 2025, Chen et al., 2023).
  • Robustness and Extensions: Conic GW and unbalanced formulations enhance robustness to mass-mismatched and outlier-rich settings (Oliver et al., 14 Aug 2025).
  • Open Problems: Reliable solvers for continuous GW, rigorous convergence guarantees for neural and minimax approaches, and understanding when convex relaxations are tight in practice remain active areas of research (Tran et al., 20 Apr 2025, Carrasco et al., 2023).

Summary Table: GWOT Algorithmic Complexity

Method Per-iteration Complexity Memory Scalability Reference
Classic Entropic GW dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+2 dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+3 dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+4 (Zhang et al., 2024)
Fast Gradient Comp. (FGC) dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+5 (1D, grids) dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+6 dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+7 (Zhang et al., 2024)
Feature-lifted CNT GW dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+8 (small dY:Y×YR+d_Y: Y \times Y \rightarrow \mathbb{R}_+9) μX\mu_X0 μX\mu_X1 (CNT-costs) (Houry et al., 6 Feb 2026)
SOS / SDP Hierarchy μX\mu_X2, μX\mu_X3 high μX\mu_X4 (Tran et al., 13 Feb 2025, Chen et al., 2023)
Neural GW (mini-batch) linear (batch size, epochs) model size high-dim, sample-based (Wang et al., 2023, Carrasco et al., 2023)

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gromov–Wasserstein Optimal Transport (GWOT).