Optimal Transport: Theory & Applications
- Optimal Transport is a mathematical framework focused on minimizing cost when reallocating mass between probability measures, emphasizing geometric and analytic structure.
- Its formulation leverages Monge’s maps and Kantorovich's duality to define metrics like the Wasserstein distance, with extensive applications in PDEs, machine learning, and economics.
- Numerical methods including linear programming, entropic regularization, and Sinkhorn algorithms enable efficient computation in high-dimensional and discrete settings.
Optimal transport (OT) is the mathematical theory concerned with the least-cost reallocation of mass between two probability measures, typically formulated on geometric domains or abstract measurable spaces. OT formalizes the coupling of probability distributions via the minimization of transport cost functionals, thereby providing a geometric and metric structure to the space of probability measures. The theory underpins a diverse array of applications, ranging from mathematical analysis, PDEs, and geometry to machine learning, image processing, economics, and quantum information.
1. Mathematical Foundations: Monge, Kantorovich, and Wasserstein Structure
Let and be probability spaces, and a Borel-measurable cost. The classical Monge problem seeks a measurable map pushing onto (i.e., ) that minimizes the transport cost . Except under restrictive geometric and regularity conditions (e.g., atomlessness of and convexity/strict monotonicity in ), a minimizer may not exist (McCann, 2012, Levy et al., 2017, Solomon, 2018).
Kantorovich's relaxation broadens the admissible class to all couplings , the set of probability measures on with marginals and , minimizing
Duality theory reveals deep structural properties: strong duality holds under measurability and lower-semicontinuity in , and dual potentials satisfy
(Vandegriffe, 2020, Solomon, 2018, McCann, 2012, Levy et al., 2017).
If and , , the -Wasserstein distance is defined as
This distance metrizes weak convergence plus moment constraints and provides the canonical metric geometry on probability measures (Vandegriffe, 2020, McCann, 2012, Solomon, 2018).
2. Regularity, Structure, and Duality: Geometry of Optimal Plans
Optimal couplings are characterized by geometric and measure-theoretic properties of their support and associated dual potentials. The “cross-difference” encodes -monotonicity: for an optimal , any two points in its support satisfy (McCann, 2012). On smooth manifolds with differentiable cost, dimension and regularity of optimal couplings are controlled by the Hessian of the cross-difference—its signature constrains the support dimension, and Ma-Trudinger-Wang curvature controls regularity. Under twist and nondegeneracy conditions, optimal plans concentrate on graphs of maps, leading to uniqueness and further regularity of Monge solutions (McCann, 2012).
In the quadratic cost case (), Brenier’s theorem asserts that if is absolutely continuous and convex, the optimal map is the gradient of a convex potential and solves the Monge-Ampère equation
providing a direct link between OT and fully nonlinear elliptic PDEs (Lindsey et al., 2016).
3. Algorithmic and Numerical Methods
Discrete and Semidiscrete Formulations
Finite-dimensional analogues reduce OT to linear programming: where is the pairwise cost (Solomon, 2018). For discrete-to-continuous (semidiscrete) problems, Newton-type algorithms exploit power diagram (Laguerre cell) geometry, maximizing a concave objective whose gradient and Hessian are expressed in terms of cell measures, enabling efficient solution in time in practice (Levy et al., 2017).
Entropic Regularization and Sinkhorn Algorithms
Computational challenges in large-scale and high-dimensional settings are addressed via entropic regularization,
where is negative entropy. The resulting problem admits a unique strictly positive solution, efficiently computable via Sinkhorn-Knopp matrix scaling: with alternating updates of to enforce the prescribed marginals. Convergence is geometric in the Hilbert metric, and per-iteration cost is (Tupitsa et al., 2022, Solomon, 2018).
Accelerated primal-dual algorithms and Nesterov smoothing further improve scaling for high-accuracy demands, with complexities such as for the Kantorovich dual smoothed via log-sum-exp approximations (An et al., 2021, Tupitsa et al., 2022).
Barycenters, Distributed, and Large-Scale Algorithms
Wasserstein barycenter problems, multi-marginal variants, and decentralized algorithms use iterative Bregman projections, primal-dual accelerated methods, and communication-efficient distributed schemes (Tupitsa et al., 2022). Comparative complexities for all mainstream methods are summarized below:
| Problem | Method | Arithmetic Cost |
|---|---|---|
| Classical OT | LP / simplex | |
| Sinkhorn | ||
| Fast primal–dual | ||
| Entropic OT | Sinkhorn | |
| Entropic barycenter | IBP |
4. Extensions: Variants and Generalizations
Folded and Quantum Optimal Transport
Folded optimal transport extends cost functions defined on the extreme boundary of a compact convex to the whole set via Choquet theory. The folded Kantorovich cost minimizes over all representing measures, leading to the folded Wasserstein metric . When specialized to the simplex, this recovers classical OT; in the quantum setting (density matrices), it leads to a separable quantum Wasserstein distance, unifying classical and separable quantum OT (Borsoni, 1 Dec 2025).
Relative and Unbalanced Transport
Relative OT introduces a reservoir set and defines generalized Wasserstein distances allowing comparison of unbalanced measures by incurring a cost for transferring mass to . The associated Kantorovich-Rubinstein norm and Wasserstein metrics are extended, along with duality and existence theorems, to accommodate these reservoir effects (Bubenik et al., 8 Nov 2024).
Structured, Constrained, and Supervised OT
Constrained versions, including capacity-limited, moment-constrained, and supervised OT impose application-specific structural or marginal constraints, sometimes expressed via linear inequalities, entropy penalties, or indicator functions. These include the structured “Latent OT” for robustness (anchor-based), moment-constrained OT for mean-field control (with Lagrange multiplier-based Gibbs kernels), supervised OT for elementwise constraints (blocking prohibited mass transfers), and related applications (Lin et al., 2020, Corre et al., 2022, Cang et al., 2022, Kerrache et al., 2022).
Quadratic-Form OT and Beyond
Quadratic-form OT (QOT) replaces the linear objective by a quadratic functional over couplings, yielding new mathematical structures. In the discrete case, QOT reduces to the quadratic assignment problem and admits explicit optimizers (e.g., comonotone, antimonotone, or diamond transport) depending on the cost structure. Applications include variance minimization, Kendall’s tau optimization, and Gromov–Wasserstein metrics (Wang et al., 8 Jan 2025).
5. Applications and Theoretical Impact
OT has become a cornerstone in fields such as computational geometry, statistical machine learning, computer vision, and economics. In partial differential equations and geometric measure theory, the connection to Monge-Ampère equations and displacement interpolation illuminates deep structural properties (McCann, 2012, Lindsey et al., 2016). In economics, OT frameworks model matching markets, quantile regression, discrete choice models, and trade gravity equations, translating microfoundations into convex optimization over distributions (Galichon, 2021).
In machine learning and data science, OT underlies distributional alignment, domain adaptation, generative modeling, and adversarial regularization, with numerical schemes featuring prominently in large-scale implementations. Neural OT, Meta OT, and graph-based/dynamically perturbed OT further extend applicability to high-dimensional, temporally-evolving, and meta-learning-laden settings (Korotin et al., 2022, Amos et al., 2022, Grover et al., 2016).
6. Open Problems and Research Directions
Despite comprehensive theory and practice, active research directions include:
- Combinatorial and structure-preserving discretizations of PDE-based OT formulations (Solomon, 2018).
- Scalability and theoretical guarantees for unbalanced, multi-marginal, or Gromov–Wasserstein OT (Solomon, 2018, Wang et al., 8 Jan 2025).
- Extension of quantum and tensor-valued OT to matrix and operator-valued couplings (Borsoni, 1 Dec 2025, Solomon, 2018).
- Capacity-constrained and congestion-aware models for practical applications (Solomon, 2018, Cang et al., 2022).
- Unified frameworks for integrating machine learning and optimal transport with amortized, neural, or anchor-based architectures (Korotin et al., 2022, Amos et al., 2022, Lin et al., 2020).
- Analytical and computational exploration of QOT, including NP-hard special cases and diamond/X-transport structures (Wang et al., 8 Jan 2025).
These developments continue to expand the breadth and power of optimal transport methodologies across mathematical sciences and applications.