Optimal Transport Theory

Updated 8 December 2025

Optimal Transport Theory is a rigorous framework that quantifies the cost-efficient reallocation of measures via Monge and Kantorovich formulations.
Duality theory underpins key concepts like c-transforms and Wasserstein distances, providing essential metrics and geodesic interpretations.
Efficient computational methods such as the Sinkhorn algorithm and Newton-type solvers enable practical applications in machine learning, geometry, and control.

Optimal transport (OT) theory provides a rigorous mathematical framework to quantify the cost and structure of transporting measures (mass, probability distributions, geometric shapes, etc.) in a manner that minimizes a prescribed cost function. OT underlies a vast range of applications in mathematics, computational science, data science, economics, control, and physics, while offering deep connections to convex analysis, geometry, and the theory of partial differential equations (Levy et al., 2017).

1. Fundamental Problems: Monge and Kantorovich Formulations

The foundational objects are two measures μ, ν on spaces X, Y (often probability measures on compact or Polish spaces), and a prescribed cost function $c:X \times Y \to [0, \infty]$ . The goal is to “reconfigure” μ into ν by transporting mass, incurring total cost minimized over all admissible transport mechanisms.

Monge’s Problem

Monge (1781) posed the problem as finding a measurable map $T:X \to Y$ such that $T_\#\mu = \nu$ (push-forward of μ by T equals ν), seeking

$\inf_{T: T_\#\mu = \nu} \int_X c(x, T(x)) \, d\mu(x),$

where each $x \in X$ maps to a unique target $T(x)$ , so mass is never split. In general, existence is delicate: if μ is atomic and ν is not, or if c is irregular, there may be no solution (Levy et al., 2017, Vandegriffe, 2020).

Kantorovich’s Relaxation

Kantorovich (1942) relaxed the Obstruction by optimizing over couplings (joint measures) $\pi$ on $X \times Y$ with given marginals μ, ν: $\inf_{\pi \in \Pi(\mu, \nu)} \int_{X \times Y} c(x, y) \, d\pi(x, y),$ where $\Pi(\mu, \nu) = \{\pi \geq 0 : (\mathrm{P}_X)_\#\pi = \mu,\, (\mathrm{P}_Y)_\#\pi = \nu\}$ (Levy et al., 2017, Vandegriffe, 2020, Santambrogio, 2010). This is a convex problem over measures and always attains minima under mild tightness and lower-semicontinuity hypotheses.

The Kantorovich program is linear in π and thus admits powerful duality theory and guarantees of minimizer existence via the direct method in the calculus of variations (Vandegriffe, 2020, Gover, 23 Jan 2025).

2. Duality Theory and Wasserstein Distances

Formulating and analyzing OT problems relies crucially on convex duality.

Kantorovich Duality

For lower-semicontinuous costs (and suitable integrability), the dual problem is

$\sup_{\substack{\phi \in L^1(\mu) \ \psi \in L^1(\nu)}} \left[ \int_X \phi \, d\mu + \int_Y \psi \, d\nu \right] \quad\text{s.t.}\quad \phi(x) + \psi(y) \leq c(x, y)\;\forall x, y,$

where the supremum is attained by a pair of potentials related via the c-transform (e.g., $\phi = \psi^c$ ) and c-concave structure (Levy et al., 2017, Santambrogio, 2010).

Strong duality holds (no gap), with the optimizer π supported on the contact set $\{(x, y) : \phi(x) + \psi(y) = c(x, y)\}$ .

p-Wasserstein Distances

For $c(x, y) = \|x - y\|^p$ , $p \geq 1$ , the resulting “optimal transport cost” defines the p-Wasserstein distance: $W_p(\mu, \nu) = \left( \inf_{\pi \in \Pi(\mu, \nu)} \int \|x - y\|^p \, d\pi(x, y) \right)^{1/p},$ which is a metric on the space of probability measures with finite p-th moment (Levy et al., 2017, Vandegriffe, 2020, Santambrogio, 2010).

Characteristic properties:

$W_p$ metrizes weak convergence (plus moment convergence for non-compact spaces).
$W_p$ induces geodesics: displacement interpolation between μ and ν is itself a geodesic in the metric space of measures (Santambrogio, 2010).

3. Algorithmic and Computational Frameworks

Implementing OT requires discretizing measures, formulating tractable finite-dimensional versions, and exploiting convex structure or geometric properties.

Semi-Discrete and Discrete OT

For “semi-discrete” settings (continuous source transported to Dirac sums), the dual problem reduces to a finite-dimensional concave maximization over vector weights ψ, structurally linked to power diagrams (generalized Voronoi/Laguerre diagrams) (Levy et al., 2017). The Newton-type algorithm is as follows:

Initialize ψ.
Iterate:
- Construct the power diagram (cells $C_j(\psi)$ ).
- Compute cell volumes and gradients.
- Build/solve the Hessian for a search direction.
- Update ψ.

This framework achieves high efficiency, especially in geometry, physics, and high-dimensional statistics.

Entropic Regularization and Sinkhorn Algorithm

Adding an entropy penalty $-\varepsilon H(\pi)$ yields a strictly convex objective and unique smooth solution: $\min_{\pi \in \Pi(\mu, \nu)} \int c(x, y) d\pi(x, y) + \varepsilon \int \log \left( \frac{d\pi}{d\mu \otimes d\nu} \right) d\pi,$ enabling rapid approximation via Sinkhorn scaling—iterative proportional fitting in the dual variables (Galichon, 2021). Entropic regularization plays a crucial role in large-scale OT, or when differentiability is desired for optimization or machine learning.

4. Modern Generalizations and Specializations

Constrained and Moment-Constrained OT

Recent developments enable more flexible transport plans with elementwise constraints (blocked mass, supervised OT), moment constraints, or unbalanced masses.

Supervised OT (sOT): Sought when marginals may have unequal total mass and some entries of the plan are prohibited (C matrix entries infinite). sOT is equivalent to an $L_1$ -penalized problem, admitting fast Sinkhorn-style solvers via KL-Bregman Dykstra projection (Cang et al., 2022).

Moment-constrained OT: Only the first marginal is fixed; the second must satisfy prescribed generalized moment conditions (set intersections with affine spaces). The primal and dual theory adjusts accordingly, and entropic regularization again yields Sinkhorn-type scalable algorithms (Corre et al., 2022).

OT in Control, Differential Equations, and Physics

OT is tightly coupled to control and time-dependent applications. For instance, ensemble controllability (matching output distributions of a dynamical system) is directly reformulated as a sequence of finite-horizon OT linear programs, with costs arising from minimum-energy (LQG or LTV) criteria (Hadadi, 17 Dec 2024).

Kinetic OT: Generalizes W₂ geometry by penalizing squared acceleration instead of velocity, built upon cubic spline interpolation in phase space and leading to a new hypo-Riemannian geometry for probability distributions on cotangent bundles, with applications to kinetic theory and optimal steering (Brigati et al., 21 Feb 2025).

Distorted and Relative OT

“Distorted OT” replaces the expected cost by a nonlinear expectation (distortion function), with applications in decision theory and risk management, yielding new classes of extremal couplings (comonotonic, counter-monotonic, and two-phase) (Liu et al., 2023). “Relative OT” considers measures modulo a mass reservoir, producing a theory of transport with mass inflow/outflow at distinguished subsets, along with relative versions of KR-norms and duality (Bubenik et al., 8 Nov 2024).

5. Applications Across Mathematical and Applied Fields

OT provides a foundational structure for problems in:

Machine Learning: Domain adaptation, generative modeling (Wasserstein GANs), barycenter estimation, and robustification of learning through distributional uncertainty.
Geometry Processing: Mesh remeshing, blue-noise sampling, shape matching, and field interpolation (using displacement interpolation along OT geodesics) (Levy et al., 2017).
Economics: Market equilibrium with transferable utility, random coefficient models (e.g., BLP), optimal allocation with gross substitutes structure, and gravity models in trade (Galichon, 2021).
Control Theory: Mean-field and ensemble control, especially with output-only observations (reconstruction via OT), and optimal steering under PDE-induced constraints (Hadadi, 17 Dec 2024).
Physics (General Relativity): A recent bridge shows Einstein’s equations are equivalent, in local coordinates, to convexity/concavity properties of the relative entropy along Lorentz-Wasserstein OT geodesics (Mondino et al., 2018).

6. Theoretical and Computational Advances

Area	Conceptual Advance	Computational Method
Duality	c-transform, c-concavity	Newton, Sinkhorn, Dykstra
Semi-discrete OT	Power/Laguerre diagrams	Mesh clipping + Newton solver
Entropic regularization	Fast approximate OT	Sinkhorn algorithm
Blocked/Constrained	supervised OT	Generalized Sinkhorn, Dykstra
Moment constraints	One-sided/relaxed OT	Block coordinate ascent
Vector measures	Duality for vector-OT	Functional-analytic LP

Duality and convexity provide geometric and computational structure throughout, with efficient algorithms available for large-scale instances or nonstandard settings. Modern OT increasingly leverages advances in computational geometry, large-scale convex optimization, and probabilistic programming.

7. Summary and Outlook

Optimal transport theory synthesizes probabilistic, geometric, and analytic tools to model the reallocation of mass in cost-minimizing ways, bridging theory and computation. The classical Monge–Kantorovich–Wasserstein framework is extended and adapted to tackle constrained transport, moment optimization, vector-valued mass flows, and non-Euclidean geometries. Across fields, OT reveals structure in fundamental problems—allocation, matching, interpolation, dynamics, and inference—yielding deep theoretical results and practical, scalable algorithms (Levy et al., 2017, Galichon, 2021, Gover, 23 Jan 2025, Hadadi, 17 Dec 2024). Continued development of duality theory, geometric algorithms, and application-specific computational techniques defines the current and future frontiers of optimal transport research.