Papers
Topics
Authors
Recent
2000 character limit reached

Optimal Transport: Theory & Applications

Updated 15 December 2025
  • Optimal Transport is a mathematical framework focused on minimizing cost when reallocating mass between probability measures, emphasizing geometric and analytic structure.
  • Its formulation leverages Monge’s maps and Kantorovich's duality to define metrics like the Wasserstein distance, with extensive applications in PDEs, machine learning, and economics.
  • Numerical methods including linear programming, entropic regularization, and Sinkhorn algorithms enable efficient computation in high-dimensional and discrete settings.

Optimal transport (OT) is the mathematical theory concerned with the least-cost reallocation of mass between two probability measures, typically formulated on geometric domains or abstract measurable spaces. OT formalizes the coupling of probability distributions via the minimization of transport cost functionals, thereby providing a geometric and metric structure to the space of probability measures. The theory underpins a diverse array of applications, ranging from mathematical analysis, PDEs, and geometry to machine learning, image processing, economics, and quantum information.

1. Mathematical Foundations: Monge, Kantorovich, and Wasserstein Structure

Let (X,μ)(X, \mu) and (Y,ν)(Y, \nu) be probability spaces, and c:X×Y[0,)c : X \times Y \rightarrow [0, \infty) a Borel-measurable cost. The classical Monge problem seeks a measurable map T:XYT : X \rightarrow Y pushing μ\mu onto ν\nu (i.e., T#μ=νT_\# \mu = \nu) that minimizes the transport cost Xc(x,T(x))dμ(x)\int_X c(x, T(x))\,d\mu(x). Except under restrictive geometric and regularity conditions (e.g., atomlessness of μ\mu and convexity/strict monotonicity in cc), a minimizer may not exist (McCann, 2012, Levy et al., 2017, Solomon, 2018).

Kantorovich's relaxation broadens the admissible class to all couplings πΠ(μ,ν)\pi \in \Pi(\mu, \nu), the set of probability measures on X×YX \times Y with marginals μ\mu and ν\nu, minimizing

minπΠ(μ,ν)X×Yc(x,y)dπ(x,y).\min_{\pi \in \Pi(\mu, \nu)} \int_{X \times Y} c(x, y)\,d\pi(x, y).

Duality theory reveals deep structural properties: strong duality holds under measurability and lower-semicontinuity in cc, and dual potentials (φ,ψ)(\varphi, \psi) satisfy

supφL1(μ),ψL1(ν){Xφdμ+Yψdν:φ(x)+ψ(y)c(x,y)}\sup_{\varphi\in L^1(\mu),\,\psi\in L^1(\nu)} \left\{ \int_X \varphi\,d\mu + \int_Y \psi\,d\nu : \varphi(x) + \psi(y) \leq c(x, y) \right\}

(Vandegriffe, 2020, Solomon, 2018, McCann, 2012, Levy et al., 2017).

If X=Y=RdX = Y = \mathbb{R}^d and c(x,y)=xypc(x, y) = \|x - y\|^p, p1p \geq 1, the pp-Wasserstein distance is defined as

Wp(μ,ν)=(infπΠ(μ,ν)xypdπ(x,y))1/p.W_p(\mu, \nu) = \left( \inf_{\pi \in \Pi(\mu, \nu)} \int \|x - y\|^p\,d\pi(x, y) \right)^{1/p}.

This distance metrizes weak convergence plus moment constraints and provides the canonical metric geometry on probability measures (Vandegriffe, 2020, McCann, 2012, Solomon, 2018).

2. Regularity, Structure, and Duality: Geometry of Optimal Plans

Optimal couplings are characterized by geometric and measure-theoretic properties of their support and associated dual potentials. The “cross-difference” δ(x,y;x0,y0)=c(x,y0)+c(x0,y)c(x,y)c(x0,y0)\delta(x, y; x_0, y_0) = c(x, y_0) + c(x_0, y) - c(x, y) - c(x_0, y_0) encodes cc-monotonicity: for an optimal π\pi, any two points in its support satisfy δ0\delta \geq 0 (McCann, 2012). On smooth manifolds with differentiable cost, dimension and regularity of optimal couplings are controlled by the Hessian of the cross-difference—its signature constrains the support dimension, and Ma-Trudinger-Wang curvature controls regularity. Under twist and nondegeneracy conditions, optimal plans concentrate on graphs of maps, leading to uniqueness and further regularity of Monge solutions (McCann, 2012).

In the quadratic cost case (c(x,y)=xy2c(x, y) = \|x - y\|^2), Brenier’s theorem asserts that if μ\mu is absolutely continuous and YY convex, the optimal map is the gradient of a convex potential ϕ\phi and solves the Monge-Ampère equation

det(D2ϕ(x))ρ(x)=ρ(ϕ(x)),xX,\det(D^2\phi(x)) \cdot \rho(x) = \rho^*(\nabla\phi(x)), \qquad x \in X,

providing a direct link between OT and fully nonlinear elliptic PDEs (Lindsey et al., 2016).

3. Algorithmic and Numerical Methods

Discrete and Semidiscrete Formulations

Finite-dimensional analogues reduce OT to linear programming: minTR+k1×k2i,jTijcijsubject to T1=v,  TT1=w,\min_{T\in\mathbb{R}^{k_1\times k_2}_+} \sum_{i,j} T_{ij} c_{ij} \quad\text{subject to } T\,\mathbf{1} = v,\; T^T \mathbf{1} = w, where cijc_{ij} is the pairwise cost (Solomon, 2018). For discrete-to-continuous (semidiscrete) problems, Newton-type algorithms exploit power diagram (Laguerre cell) geometry, maximizing a concave objective whose gradient and Hessian are expressed in terms of cell measures, enabling efficient solution in O(klogk)O(k\log k) time in practice (Levy et al., 2017).

Entropic Regularization and Sinkhorn Algorithms

Computational challenges in large-scale and high-dimensional settings are addressed via entropic regularization,

minπΠ(μ,ν)c(x,y)dπ(x,y)+ϵH(π),\min_{\pi \in \Pi(\mu, \nu)} \int c(x, y)\,d\pi(x, y) + \epsilon H(\pi),

where H(π)H(\pi) is negative entropy. The resulting problem admits a unique strictly positive solution, efficiently computable via Sinkhorn-Knopp matrix scaling: π=diag(u)Kdiag(v),K=exp(C/ϵ)\pi^* = \mathrm{diag}(u) K \mathrm{diag}(v),\quad K = \exp(-C/\epsilon) with alternating updates of u,vu, v to enforce the prescribed marginals. Convergence is geometric in the Hilbert metric, and per-iteration cost is O(nm)O(nm) (Tupitsa et al., 2022, Solomon, 2018).

Accelerated primal-dual algorithms and Nesterov smoothing further improve scaling for high-accuracy demands, with complexities such as O(n5/2logn/ϵ)O(n^{5/2} \sqrt{\log n} / \epsilon) for the Kantorovich dual smoothed via log-sum-exp approximations (An et al., 2021, Tupitsa et al., 2022).

Barycenters, Distributed, and Large-Scale Algorithms

Wasserstein barycenter problems, multi-marginal variants, and decentralized algorithms use iterative Bregman projections, primal-dual accelerated methods, and communication-efficient distributed schemes (Tupitsa et al., 2022). Comparative complexities for all mainstream methods are summarized below:

Problem Method Arithmetic Cost
Classical OT LP / simplex O~(n3)\tilde O(n^3)
Sinkhorn O~(n2C2/ϵ2)\tilde O(n^2 \|C\|^2_\infty/\epsilon^2)
Fast primal–dual O~(n5/2C/ϵ)\tilde O(n^{5/2}\|C\|_\infty/\epsilon)
Entropic OT Sinkhorn O~(n2C2/γ)\tilde O(n^2\|C\|^2_\infty/\gamma)
Entropic barycenter IBP O~(mn2/(γϵ))\tilde O(m n^2 / (\gamma \epsilon))

(Tupitsa et al., 2022)

4. Extensions: Variants and Generalizations

Folded and Quantum Optimal Transport

Folded optimal transport extends cost functions defined on the extreme boundary of a compact convex CC to the whole set via Choquet theory. The folded Kantorovich cost minimizes over all representing measures, leading to the folded Wasserstein metric DpD_p. When specialized to the simplex, this recovers classical OT; in the quantum setting (density matrices), it leads to a separable quantum Wasserstein distance, unifying classical and separable quantum OT (Borsoni, 1 Dec 2025).

Relative and Unbalanced Transport

Relative OT introduces a reservoir set AA and defines generalized Wasserstein distances allowing comparison of unbalanced measures by incurring a cost for transferring mass to AA. The associated Kantorovich-Rubinstein norm and Wasserstein metrics are extended, along with duality and existence theorems, to accommodate these reservoir effects (Bubenik et al., 8 Nov 2024).

Structured, Constrained, and Supervised OT

Constrained versions, including capacity-limited, moment-constrained, and supervised OT impose application-specific structural or marginal constraints, sometimes expressed via linear inequalities, entropy penalties, or indicator functions. These include the structured “Latent OT” for robustness (anchor-based), moment-constrained OT for mean-field control (with Lagrange multiplier-based Gibbs kernels), supervised OT for elementwise constraints (blocking prohibited mass transfers), and related applications (Lin et al., 2020, Corre et al., 2022, Cang et al., 2022, Kerrache et al., 2022).

Quadratic-Form OT and Beyond

Quadratic-form OT (QOT) replaces the linear objective by a quadratic functional over couplings, yielding new mathematical structures. In the discrete case, QOT reduces to the quadratic assignment problem and admits explicit optimizers (e.g., comonotone, antimonotone, or diamond transport) depending on the cost structure. Applications include variance minimization, Kendall’s tau optimization, and Gromov–Wasserstein metrics (Wang et al., 8 Jan 2025).

5. Applications and Theoretical Impact

OT has become a cornerstone in fields such as computational geometry, statistical machine learning, computer vision, and economics. In partial differential equations and geometric measure theory, the connection to Monge-Ampère equations and displacement interpolation illuminates deep structural properties (McCann, 2012, Lindsey et al., 2016). In economics, OT frameworks model matching markets, quantile regression, discrete choice models, and trade gravity equations, translating microfoundations into convex optimization over distributions (Galichon, 2021).

In machine learning and data science, OT underlies distributional alignment, domain adaptation, generative modeling, and adversarial regularization, with numerical schemes featuring prominently in large-scale implementations. Neural OT, Meta OT, and graph-based/dynamically perturbed OT further extend applicability to high-dimensional, temporally-evolving, and meta-learning-laden settings (Korotin et al., 2022, Amos et al., 2022, Grover et al., 2016).

6. Open Problems and Research Directions

Despite comprehensive theory and practice, active research directions include:

These developments continue to expand the breadth and power of optimal transport methodologies across mathematical sciences and applications.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Optimal Transport.