Entropic Optimal Transport Problem

Updated 8 December 2025

Entropic optimal transport is a regularized variant of optimal transport that incorporates a KL divergence term to ensure unique and computationally tractable solutions.
It employs variational formulations and duality to derive Gibbs-type densities, enabling robust and efficient algorithms like the Sinkhorn method.
EOT is widely applied in imaging, machine learning, and statistical inference, bridging theoretical insights with scalable optimization techniques.

Entropic optimal transport (EOT) is a regularized variant of the classical Monge--Kantorovich optimal transport problem, where one seeks a coupling between probability measures minimizing the expected cost of transportation, subject to marginal constraints. In EOT, the objective function is augmented with a relative entropy (Kullback--Leibler divergence) term between the candidate coupling and a fixed reference measure, typically the product of the marginals. This strictly convex regularization ensures uniqueness and computational tractability of the solution, and has become central in modern computational and statistical optimal transport. The entropic optimal transport problem, its canonical algorithms, theoretical properties, and recent advances in selection, statistical learning, and limiting behavior are surveyed below.

1. Variational Formulations and Duality

The classical Kantorovich problem seeks $\min_{\pi\in\Pi(\mu,\nu)} \int c(x,y)d\pi(x,y)$ , where $\Pi(\mu,\nu)$ is the set of couplings matching fixed marginals $\mu, \nu$ , and $c$ is a cost function. The entropic regularization augments this with an entropy term,

$J_\varepsilon(\pi) = \int c(x,y)d\pi(x,y) + \varepsilon \mathrm{KL}(\pi\|\mu\otimes\nu)$

where $\mathrm{KL}$ is the relative entropy and $\varepsilon>0$ is the regularization parameter. The entropic optimal transport (EOT) plan $\pi_\varepsilon$ is the unique minimizer, characterized in dual form by

$\max_{\varphi,\psi} \left\{ \int\varphi d\mu + \int\psi d\nu - \varepsilon \int e^{(\varphi(x)+\psi(y)-c(x,y))/\varepsilon} d(\mu\otimes\nu) \right\}$

with optimal $\pi_\varepsilon$ given by the Gibbs-type density

$\frac{d\pi_\varepsilon}{d\mu\otimes\nu}(x,y) = \exp\left(\frac{\varphi_\varepsilon(x)+\psi_\varepsilon(y)-c(x,y)}{\varepsilon}\right)$

(Ley, 4 Dec 2025, Srinivasan et al., 16 Jul 2025, Carlier et al., 2015).

2. Existence, Uniqueness, and Selection Principles

Strict convexity of $J_\varepsilon$ ensures existence and uniqueness of $\pi_\varepsilon$ for $\varepsilon>0$ . In the classical unregularized setting ( $\varepsilon=0$ ), minimizers may be highly nonunique, especially for non-strictly convex costs such as $|x-y|$ . The entropic penalty acts as a selection principle: as $\varepsilon\to 0^+$ , any limit point of $\pi_\varepsilon$ is an optimal plan for the classical problem, but the regularization may isolate a unique minimizer of minimal entropy among all optimizers (Ley, 4 Dec 2025, Aryan et al., 22 Feb 2025, Carlier et al., 2023, Rigollet et al., 2018).

For $d=1$ and cost $|x-y|$ , the selection principle is explicit: entropic OT selects the unique minimal entropy plan supported on forward-backward intervals determined by barrier points and stochastic orders (Ley, 4 Dec 2025). For $d>1$ and strictly convex, smooth cost, the entropic plan converges to the Monge solution when that solution is unique (Carlier et al., 2015, Clason et al., 2019). For cost $\|x-y\|$ in $d>1$ , the recent resolution (Aryan et al., 22 Feb 2025) shows entropic selection yields a plan supported on transport rays, with a unique entropy-minimizing conditional coupling on each ray.

3. Limiting Behavior and $\Gamma$ -Convergence

EOT exhibits strong $\Gamma$ -convergence properties as $\varepsilon\to 0^+$ . Suppose $c$ is continuous and the marginals are compactly supported:

If the classical OT plan is unique, $J_\varepsilon$ $\Gamma$ -converges to the original transport cost, and $\pi_\varepsilon$ converges narrowly to the Monge–Kantorovich optimizer (Carlier et al., 2015, Clason et al., 2019).
If the classical optimizer is nonunique, entropic regularization selects limit points with additional structural properties (minimum entropy, cyclic monotonicity, or $\infty$ -cyclically monotone support for supremal cost) (Carlier et al., 2023, Ley, 4 Dec 2025, Aryan et al., 22 Feb 2025).
If the cost is only lower-semicontinuous, the $\Gamma$ -limit uses the $\mu\otimes\nu$ essential lsc-envelope of $c$ (Brizzi et al., 7 Jan 2025).
For multi-marginal problems, analogous $\Gamma$ -convergence characterizations and selection principles apply (Clason et al., 2019, Brizzi et al., 7 Jan 2025).

Second-order asymptotics have been computed in specialized regimes (semi-discrete, continuous-discrete), highlighting settings where the entropic bias is quadratic in $\varepsilon$ , as opposed to the linear/leading-order behaviors in purely discrete or continuous cases (Altschuler et al., 2021).

4. Structural Decomposition and Multiplicativity

Recent advances provide detailed decompositions of optimal transport plans under entropic regularization, especially with $L^1$ cost on the real line. The main development is a direct-sum decomposition of the set of cyclically monotone (optimal) couplings via barrier points, into forward, backward, and identity (fixed) regions. Each block corresponds to a subproblem with stochastic ordering, admitting a uniquely characterized strongly multiplicative (Kellerer-type) coupling (Ley, 4 Dec 2025). Cluster points of entropic minimizers satisfy weak multiplicativity: their restriction to order intervals factorizes into tensor products of marginals. This structure is extended to atomless, arbitrary, and discrete marginals (Ley, 4 Dec 2025, Aryan et al., 22 Feb 2025).

In higher dimensions, entropic selection on transport rays induces a one-dimensional entropy minimization on each ray, with an explicit reference measure incorporating local Gaussian geometric factors (Aryan et al., 22 Feb 2025). This yields uniqueness of the limiting plan, a previously open question for $d>1$ .

5. Computation: Sinkhorn, Mirror Descent, and Stochastic Algorithms

EOT admits efficient computational methods, notably the Sinkhorn algorithm and its variants:

Sinkhorn scaling alternates row and column normalizations or KL-projections for discrete marginals, producing exponentially fast convergence (Abid et al., 2018).
Mirror descent and block-coordinate methods generalize Sinkhorn to continuous and semi-dual formulations, introducing momentum acceleration and non-asymptotic convergence rates $O(1/N)$ , $O(1/N^2)$ for step count $N$ (Srinivasan et al., 16 Jul 2025, Abid et al., 2018).
For unbalanced optimal transport, domain decomposition methods solve large-scale problems by partitioning the domain and performing local Sinkhorn-like optimization, with sequential, parallel, and staggered schemes achieving near-linear scaling in grid size and empirical robustness to penalty parameters (Medina et al., 11 Oct 2024).
Neural estimation strategies exploit parametrizations of dual potentials by neural networks and optimize via sample-based semi-dual objectives, breaking curse-of-dimensionality barriers and attaining parametric rates in both cost and plan estimation (Wang et al., 10 May 2024, Gushchin et al., 2022).

Choice of regularization parameter $\varepsilon$ strongly impacts computational efficiency, bias, and statistical properties, requiring tradeoffs that ProgOT and adaptive methods seek to address (Kassraie et al., 7 Jun 2024).

6. Statistical and Dynamical Interpretation

EOT admits a statistical interpretation: regularization with entropy is equivalent to maximum likelihood deconvolution for Gaussian noise, bridging the gap between computational OT and statistical inference (Rigollet et al., 2018). In Schrödinger bridge and dynamic OT, entropic regularization enforces minimal relative entropy with respect to a reference stochastic process, and admits stochastic control duality via Hamilton–Jacobi–Bellman PDEs or explicit SDE discretizations (Benamou et al., 18 Aug 2024, Choi et al., 3 Oct 2024, Gushchin et al., 2022).

Limiting distribution results for the entropic OT potentials as $\varepsilon\to 0$ provide new estimators for the score function (gradient of log density), achieving minimax optimal rates and characterizing the sample complexity of regularized OT-based methods (Mordant, 16 Dec 2024).

7. Applications and Extensions

Entropic optimal transport has broad impact in mathematical imaging, machine learning, computational statistics, gradient flow theory, large-scale generative modeling, and robust inference. Its computational schemes underlie scalable barycenter computation, Wasserstein gradient flows, and simulation-free Schrödinger bridges for high-dimensional generative models (Carlier et al., 2022, Choi et al., 3 Oct 2024, Gushchin et al., 2022). Domain decomposition, progressive regularization, multi-marginal extensions, bias correction via non-product reference measures, and interpretable selection principles continue to enlarge its methodological and theoretical scope (Medina et al., 11 Oct 2024, Freulon et al., 2 Jul 2025, Kassraie et al., 7 Jun 2024, Brizzi et al., 7 Jan 2025, Ley, 4 Dec 2025, Aryan et al., 22 Feb 2025).

Entropic optimal transport thus forms a robust, broadly applicable framework for regularized transport theory, combining rigorous selection mechanisms, computational efficiency, and deep statistical structure.