Riemannian Optimal Transport

Updated 17 April 2026

Riemannian Optimal Transport is a framework that generalizes classical OT by incorporating Riemannian geometry to tackle non-Euclidean spaces and structured data.
It leverages geometric foundations like the Benamou–Brenier formulation and coupling matrix manifolds, implementing algorithms such as Riemannian gradient descent and trust-region methods.
The approach enables practical applications in machine learning, domain adaptation, and transport of matrix-valued data, offering both theoretical insights and efficient computation.

Riemannian Optimal Transport (OT) is the study and computation of optimal transport problems where the underlying search spaces—such as the set of transport plans, measures, or ground metrics—inherit and exploit a Riemannian manifold structure. This approach generalizes classical OT, particularly for non-Euclidean domains, non-linear costs, structured data (matrix-valued measures), and learning applications. The Riemannian viewpoint endows OT with formal geometric and algorithmic structures that enable both theoretical analysis and efficient computation in domains with intricate or constrained geometry.

1. Geometric Foundations of Riemannian OT

Classical OT seeks the most efficient way to transport mass from a source to a target probability distribution, often under a linear cost. The formalization of OT as a metric geometry on the space of measures is achieved by the Benamou–Brenier dynamical formulation, which endows the space of probability measures $\mathcal{P}_2(M)$ on a Riemannian manifold $(M,g)$ with a (formal) infinite-dimensional Riemannian structure. Specifically, the squared Wasserstein-2 distance is given by

$W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$

The tangent space at a measure $\mu$ consists of vector fields $v$ with zero mean under $\mu$ , and the induced metric is $\int_M g_x(v(x), v(x)) \, d\mu(x)$ (Sarrazin et al., 2023).

On discrete domains, such as meshes, this structure is preserved by suitable discretization, yielding a finite-dimensional Riemannian metric on the simplex of mesh-supported distributions (Lavenant et al., 2018).

For discrete OT problems (with discrete marginals $\mu_1$ , $\mu_2$ ), the search space is the interior of the transportation polytope,

$\mathcal{M}(\mu_1, \mu_2) \triangleq \{ X \in \mathbb{R}_+^{m\times n} : X\mathbf{1}_n = \mu_1,\, X^\top\mathbf{1}_m = \mu_2,\, X_{ij} > 0 \},$

which forms a smooth open submanifold of $(M,g)$ 0 with dimension $(M,g)$ 1 and is endowed with a Fisher metric (Mishra et al., 2021, Shi et al., 2019).

2. Manifolds of Coupling Matrices and Structured OT

A major theme is the systematic treatment of the set of admissible transport plans as a Riemannian manifold. The coupling matrix manifold (CMM) (Shi et al., 2019) is defined as

$(M,g)$ 2

for given marginals $(M,g)$ 3. This is a manifold of dimension $(M,g)$ 4. The Fisher–Rao metric,

$(M,g)$ 5

is used throughout practical algorithms as it makes the manifold complete and penalizes approaches to the boundary $(M,g)$ 6. The tangent space consists of matrices with vanishing row and column sums.

Generalizations handle couplings with structured entries, notably the block SPD coupling manifold for matrix-valued marginals (Han et al., 2022). If $(M,g)$ 7 (the manifold of $(M,g)$ 8 symmetric positive definite matrices) and the coupling $(M,g)$ 9 with each block $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 0 satisfies

$W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 1

then the set of such $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 2 forms a smooth submanifold of $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 3 under the affine-invariant metric. Projection and retraction operations, as well as Riemannian gradients, are expressed in terms of SPD matrix operations.

3. Riemannian Optimization Algorithms for OT

Riemannian OT exploits manifold optimization techniques that respect the geometry of the search space:

Riemannian Gradient Descent (RGD): For the manifold $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 4 of transport plans, the update is

$W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 5

where $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 6 is a retraction (e.g., geometric scaling followed by Sinkhorn normalization) and grad $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 7 is the metric-projected gradient (Mishra et al., 2021, Shi et al., 2019).

Riemannian Trust-Region Methods (RTR): These use both the Riemannian gradient and the Hessian (via metric-projected directional derivatives) to define a local quadratic model and solve a constrained subproblem in the tangent space (Mishra et al., 2021).
Retraction and Vector Transport: Retraction maps tangent vectors back to the manifold while preserving local geometry. For coupling matrices, retraction may use the multiplicative exponential structure, followed by Sinkhorn-projection (Shi et al., 2019). For block SPD couplings, the exponential map on $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 8 and suitable SPD balancing are used (Han et al., 2022).
Complexity and Convergence: These methods typically require per-iteration costs of $W_2^2(\mu_0, \mu_1) = \inf_{\substack{\partial_t \mu_t + \nabla\cdot (\mu_t v_t) = 0 \\mu_{0}=\mu_0,\,\mu_{1}=\mu_1}} \int_0^1 \int_M \frac{1}{2}\|v_t(x)\|^2 \, d\mu_t(x) \, dt.$ 9 for scalar problems and $\mu$ 0 for block SPD problems. Convergence rates mirror classical manifold optimization: sublinear for first-order, locally superlinear for trust-region under standard regularity (Mishra et al., 2021, Shi et al., 2019).

4. Extensions: Nonlinear, Unbalanced, and Constrained OT on Manifolds

Riemannian OT frameworks generalize seamlessly to a variety of settings:

Nonlinear OT Costs: The cost function $\mu$ 1 may be non-linear, as in robust OT, Gromov–Wasserstein distances, co-optimal transport, or regularized models. These cases fit directly into the manifold optimization paradigm by defining $\mu$ 2 with appropriate smoothness (Mishra et al., 2021).
Unbalanced OT: By relaxing the marginal constraints, for example via KL-divergence penalties on the row/column sums, the feasible set becomes $\mu$ 3. The same Riemannian recipe (with a tailored metric) applies (Mishra et al., 2021). For continuous measures, the Wasserstein–Fisher–Rao (WFR) metric provides a cone geometry, unifying balanced and unbalanced OT under the Riemannian framework (Bauer et al., 2024, Sarrazin et al., 2023).
Path Constraints and Constrained Geometries: Path-constrained OT imposes additional restrictions such as support constraints, prescribed moments, or fixed intermediate marginals on the interpolating curve of measures. The resulting metric space of constrained measures inherits a formal Riemannian structure, with geodesics given by energy-minimizing flows subject to these constraints (Bauer et al., 2024).
High-dimensional and Matrix-valued Measures: The block SPD OT setting treats distributions of SPD matrices; the Riemannian framework allows robust optimization and barycenter computations in machine learning and vision (Han et al., 2022).

5. Learning, Sliced Approximations, and Computational Advances

Riemannian OT has spawned algorithmic and theoretical developments for scalable, data-driven learning:

Learning Riemannian Ground Metrics: When the ground cost is parameterized (e.g., Mahalanobis distances via a learnable SPD matrix), the metric structure of the set of ground metrics (the SPD manifold under the affine-invariant metric) is exploited. Joint OT/metric learning alternates Riemannian updates of the metric with Sinkhorn-based OT updates, typically admitting closed-form expressions for the metric update via geometric means (Jawanpuria et al., 2024).
Neural Riemannian OT Maps: Riemannian Neural OT constructs continuous neural parameterizations of $\mu$ 4-concave transport potentials on arbitrary manifolds, thus systematically avoiding the exponential parameter blow-up suffered by discretization-based schemes in high dimensions (the curse of dimensionality). Approximation theorems guarantee sub-exponential complexity in dimension, with the core architecture respecting the manifold structure via exponential/logarithm maps (Micheli et al., 3 Feb 2026).
Sliced-Wasserstein Distances on Manifolds: On Cartan–Hadamard manifolds (complete, simply-connected, non-positively curved), the sliced-Wasserstein distance is defined intrinsically by projecting measures along geodesics and integrating 1D Wasserstein distances. This construction admits efficient computation schemes and gradient flows that respect the underlying Riemannian geometry (Bonet et al., 2024).
Discrete and Linearized OT: For triangulated surfaces or high-dimensional problems, dynamical and linearized approximations furnish fast, robust computations. These include structure-preserving discretizations that retain all metric and geometric properties of continuous Riemannian OT on meshes (Lavenant et al., 2018, Sarrazin et al., 2023).

6. Implementation, Applications, and Empirical Insights

Practical Riemannian OT algorithms have been implemented in modular libraries for Python and Matlab (e.g., MOT (Mishra et al., 2021)), exposing generic manifold objects with methods for projection, retraction, Riemannian gradient, and Hessian computation.

Applications and empirical results include:

Domain Adaptation and Learning: OT with Riemannian metric-learning improves transfer and classification performance by jointly optimizing the transportation plan and the underlying geometry (Jawanpuria et al., 2024).
SPD-valued Data: Block SPD OT enables better matching and averaging of functional descriptors, shape data, and tensor-valued images (Han et al., 2022).
Flows and Gradient Interpolation: Riemannian OT yields smooth displacement interpolations, conformal flows, and non-diffusive probability transport on curved or discrete domains (Lavenant et al., 2018, Bonet et al., 2024).
Scalable OT on Manifolds: Neural Riemannian OT permits out-of-sample map evaluation and asymptotically avoids the curse of dimensionality seen in discretizations (Micheli et al., 3 Feb 2026).
Path-constrained and Unbalanced Models: Population evolution, transport with obstacles, and surface-area–constrained flows are addressed via affine submanifold geometry within the cone-metric picture (Bauer et al., 2024).

These tools are employed in machine learning tasks involving structured data, manifold-valued signals, geometry-aware generative models, and domains in image analysis, vision, and shape correspondence.

7. Summary and Outlook

Riemannian Optimal Transport generalizes OT to domains and models where the geometry of the probability space, cost function, or ground metric is curved, structured, or otherwise non-Euclidean. The Riemannian approach provides both theoretical insight—by revealing the geodesic structure, curvature, and metric geometry of transport problems—and practical benefits—by enabling efficient, convergent algorithms for large-scale, structured OT and learning tasks. The unifying geometric framework allows seamless transitions among balanced, unbalanced, nonlinear, sparse, or matrix-valued OT models and paves the way for further extensions to product manifolds, higher-order tensor spaces, and dynamics on infinite-dimensional quotients (Mishra et al., 2021, Shi et al., 2019, Bauer et al., 2024, Han et al., 2022, Jawanpuria et al., 2024).