Kantorovich Optimal Transport Overview

Updated 10 April 2026

Kantorovich Optimal Transport (K-OT) is a convex relaxation of the Monge transport problem that computes minimal transport cost via probabilistic couplings and dual formulations.
K-OT employs advanced methodologies such as linear programming, entropic regularization, and stochastic optimization to efficiently solve high-dimensional transport problems.
Applications span machine learning, imaging, and statistical sampling, utilizing Wasserstein metrics for generative modeling, shape matching, and sequential allocation.

Kantorovich Optimal Transport (K-OT) is the foundational convex relaxation of the classical Monge transport problem. It quantifies the minimal cost of transporting mass between distributions while allowing for probabilistic couplings (or "transport plans") and admits a dual formulation central to modern computational and theoretical advancements in mathematics, statistics, machine learning, and related fields. The theory connects linear programming, convex analysis, probability, and geometry, and underpins modern scalable algorithms used in large-scale data analysis and scientific computing.

1. Mathematical Formulation

The Kantorovich formulation considers two probability measures, $\mu$ on a space $X$ and $\nu$ on $Y$ , and a lower-semicontinuous cost function $c: X \times Y \to [0,+\infty]$ . The space of couplings $\Pi(\mu,\nu)$ consists of all joint measures on $X \times Y$ with marginals $\mu, \nu$ . The primal K-OT problem is

$\inf_{\gamma \in \Pi(\mu, \nu)} \int_{X \times Y} c(x, y) \, d\gamma(x, y).$

In the discrete case ( $X = Y = \{1, \ldots, n\}$ ), with $X$ 0 probability vectors and $X$ 1, the feasible set is the transport polytope

$X$ 2

and the problem is a linear program: $X$ 3 Duality takes the form

$X$ 4

and optimal potentials (Kantorovich potentials) satisfy complementary slackness on the support of optimal $X$ 5 (Peyré, 10 May 2025, Moradi, 8 Jan 2025, Pistone et al., 2020).

2. Metric and Geometric Properties

Kantorovich OT induces the Wasserstein- $X$ 6 metrics $X$ 7, defined for $X$ 8, where $X$ 9 is a metric. $\nu$ 0 metrizes weak convergence plus $\nu$ 1-moment convergence on the space of probability measures with finite $\nu$ 2-th moment. Important properties include:

Metric structure: $\nu$ 3 is a true metric (Peyré, 10 May 2025, Peyré et al., 2018).
Geodesic convexity: Linear interpolations $\nu$ 4 define constant-speed geodesics for $\nu$ 5 in the finite case and yield Wasserstein geodesics in the continuous case (Pistone et al., 2020).
Dual representations: For $\nu$ 6, the Kantorovich–Rubinstein duality gives

$\nu$ 7

3. Algorithmic Paradigms

Linear Programming and Network Simplex

Classical discrete K-OT is a polynomial-time LP; the network simplex exploits problem structure for faster solutions for large $\nu$ 8 (Moradi, 8 Jan 2025, Peyré et al., 2018). Assignment and auction algorithms apply when $\nu$ 9 are uniform, with $Y$ 0 worst-case complexity but efficient in practice.

Entropic Regularization and Sinkhorn Scaling

Entropic regularization adds a penalty $Y$ 1 to the objective (where $Y$ 2 is the negative entropy), yielding a strictly convex program with unique solution

$Y$ 3

found efficiently via Sinkhorn-Knopp iterations: $Y$ 4 Convergence is geometric; complexity is $Y$ 5 per iteration, with provable rates for entropy parameter $Y$ 6 (Moradi, 8 Jan 2025, Peyré et al., 2018).

Primal–Dual, First-Order, and MCMC Methods

State-of-the-art large-scale approaches include stochastic mirror descent, primal-dual acceleration, coordinate descent ("Greenkhorn"), and fast smooth dual optimization via FISTA or Nesterov smoothing (An et al., 2021). For finite ground spaces, the "MCMC of table moves" approach samples the space of couplings using Markov bases from algebraic statistics, ensuring irreducibility and aperiodicity of the coupling-graph, and converges to optimal plans via simulated annealing (Pistone et al., 2020).

Solving Paradigm	Complexity (typical)	Notable Features
LP/Network Simplex	$Y$ 7	Exact, memory-intensive, limited scalability
Sinkhorn (entropic)	$Y$ 8	Highly parallel, GPU-suited, inexact by $Y$ 9
Primal–Dual/1st-order	$c: X \times Y \to [0,+\infty]$ 0	Fast for large-scale, flexible (unbalanced, etc.)
MCMC table moves	Empirical convergence	Approximates faces of near-optimal plans

4. Theoretical Foundations and Duality

Kantorovich duality admits broad generalizations, including:

Abstract duality: In Banach lattice frameworks, all classical and constrained OT problems are unified as convex-analytic duals between primal values over convex sets of normalized positive functionals and dual cones of hedges (Ekren et al., 2016).
Existence/uniqueness: Under lower semicontinuity and tightness, minimizers exist; uniqueness holds for strictly convex or strongly convex costs, or generic supports (Moradi, 8 Jan 2025).
Extensions: Multi-marginal, martingale, moment-constrained, and conic generalizations fit in this duality framework.

5. Extensions: Entropic, Unbalanced, Matrix-valued, Bandit, and Spherical K-OT

K-OT has been extended to accommodate:

Entropic OT: Strictly convexified objective enables smooth approximations and differentiable operators for machine learning modules (e.g., differentiable sorting, quantile regression) (Cuturi et al., 2019, Bercu et al., 2024).
Unbalanced OT: Relaxed mass conservation is encoded via penalties (e.g., $c: X \times Y \to [0,+\infty]$ 1-divergences), supporting creation/destruction of mass. Duals and dynamics generalize Benamou–Brenier flows (Chizat et al., 2015).
Matrix-valued OT: Extends to couplings of spectral densities for matrix-valued mass, encoding "rotation" costs; underpins spectral analysis in multivariable time series (Ning et al., 2013).
Bandit K-OT: Online/sequential variants where costs are revealed stochastically, provably reducing to infinite-dimensional linear bandits with sublinear regret (Croissant, 11 Feb 2025).
Spherical/data-manifold K-OT: Extension to non-Euclidean settings (e.g., sphere), using harmonic expansions for efficient stochastic optimization and out-of-sample extension (Bercu et al., 2024).

6. Applications Across Disciplines

K-OT metrics and their computational proxies are widely applied:

Machine Learning: Wasserstein distances power generative modeling (WGANs), domain adaptation, and representation learning. Entropic K-OT yields differentiable surrogates for statistics, sorting, and CDF computation (Peyré et al., 2018, Cuturi et al., 2019).
Imaging and Vision: Tasks including shape matching, image retrieval, registration, color transfer, and clustering exploit the geometry of K-OT distances (Peyré et al., 2018, Snow et al., 2018).
Stochastic Analysis and Bayesian Methods: K-OT flows underpin diffusion models, optimal matching, sequential allocation, and transport-based sampling (Croissant, 11 Feb 2025).
Signal Processing and Time Series: Matrix-valued K-OT is studied in spectral morphing and multichannel analysis (Ning et al., 2013).

7. Emerging Directions and Computational Frontiers

Current research trends include:

High-dimensional scalability: Fast reduction-based algorithms leverage connection to graph matching and minimum-cost flow (Moradi, 8 Jan 2025).
Particle and min–max approaches: Particle-based min–max gradient flows provide new daemons for imposing transport plans with adaptive regularization (Conger et al., 23 Apr 2025).
Decorrelated, unbalanced, and dynamic settings: Algorithms now accommodate incomplete or noisy data, mass imbalance, and dynamics (e.g., Wasserstein-Fisher-Rao, time-varying transport) (Chizat et al., 2015).
Integration with learning systems: OT modules are increasingly embedded into large models for differentiable optimization and end-to-end learning (Cuturi et al., 2019, Bercu et al., 2024).

Despite these advances, scalability, robustness, selection of algorithmic parameters, and interpretability in high-dimensional, unbalanced, or manifold-valued settings remain active research challenges (Moradi, 8 Jan 2025, Conger et al., 23 Apr 2025, Bercu et al., 2024).