Entropy-Regularized Transport Cost

Updated 29 April 2026

Entropy-regularized transport cost is a convex relaxation of the classical optimal transport problem, adding an entropy term for smoothing and stability.
It refines transport plans by resolving non-uniqueness through entropic selection, yielding unique solutions and improved statistical properties.
The approach underpins efficient algorithms like Sinkhorn iterations, ensuring robust performance across discrete, continuous, and geometric settings.

The entropy-regularized transport cost is a convex relaxation of the classical optimal transport problem, incorporating a term proportional to the (relative) entropy between transport plans and a reference measure. This regularization, now central in computational optimal transport, smooths and stabilizes the primal optimization, yields unique solutions, enhances sample and numerical complexity, and enables robust and scalable algorithms across discrete, continuous, and geometric settings. In the vanishing-regularization limit, the entropic minimizer converges to a variationally selected optimal plan, resolving ambiguities in non-uniqueness and revealing refined geometric and statistical structures.

1. Mathematical Formulation and Dual Structure

Let $c(x, y)$ be a measurable cost on Polish spaces $X$ , $Y$ , and let $\mu \in \mathcal{P}(X)$ , $\nu \in \mathcal{P}(Y)$ . For $\varepsilon > 0$ , the entropy-regularized transport cost is

$C_\varepsilon(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} \int c(x, y) \, d\pi(x, y) + \varepsilon\, \mathrm{KL}(\pi \,|\,\mu \otimes \nu),$

where $\Pi(\mu, \nu)$ is the set of couplings and $\mathrm{KL}$ denotes the Kullback–Leibler divergence.

The corresponding strong dual problem is

$C_\varepsilon(\mu, \nu) = \sup_{f \in L^\infty(\mu),\; g \in L^\infty(\nu)} \left[ \int f \, d\mu + \int g \, d\nu - \varepsilon \int e^{(f(x) + g(y) - c(x, y))/\varepsilon} \, d\mu(x) d\nu(y) + \varepsilon \right],$

with unique (modulo constants) dual optimizers $X$ 0. The Sinkhorn potentials satisfy the stationarity conditions: $X$ 1 The primal minimizer admits the explicit kernel representation

$X$ 2

For discrete spaces, this yields explicit Gibbs kernels involving scaling vectors determined by the Sinkhorn algorithm (Carlier et al., 2015, González-Sanz et al., 2023).

2. Convergence, Variational Limit, and Entropic Selection

As $X$ 3, $X$ 4 and $X$ 5 converges narrowly to an optimal plan for the classical problem (Carlier et al., 2015, Brizzi et al., 7 Jan 2025). When the optimal plan is non-unique, entropy-regularization selects the optimizer of minimal entropy relative to the reference, a phenomenon termed "entropic selection." For dimension $X$ 6, this selection is local to each transport ray, and the second-order expansion is nontrivial: $X$ 7 where $X$ 8 depends on the geometry of the foliation by transport rays and on the one-dimensional entropic minimizers per ray (Nutz et al., 23 Apr 2026). For the quadratic cost, the gap $X$ 9 can admit pure $Y$ 0 behavior; in $Y$ 1-settings, the error is $Y$ 2 or $Y$ 3, depending on regularity assumptions (Malamut et al., 2023, Nenna et al., 2023). For $Y$ 4 in 1D, the entropic regularization selects the most diffuse optimal plan, explicitly characterized via factorized exponential forms on monotonicity intervals (Marino et al., 2017).

3. Regularity, Smoothing, and Differential Structure

Entropy regularization yields strictly convex and differentiable transport costs on the probability simplex and enhances the regularity of optimal plans and Kantorovich potentials. For smooth, positive marginals and convex cost structure, the entropy-regularized cost admits unique, smooth (even analytic) potentials (Carlier et al., 2015, Porretta, 2022). In dynamical (Eulerian) formulations, the additional entropy induces elliptic or parabolic regularization, leading to smooth curve interpolations between measures in the Wasserstein space: $Y$ 5 subject to the continuity equation, with existence and regularity governed by elliptic estimates. On manifolds, entropy regularization ensures $Y$ 6 smoothing in any positive time (Porretta, 2022, Bocchi et al., 2024). In unbalanced or generalized settings (such as Wasserstein–Fisher–Rao), entropy regularization extends naturally and inherits regularity from classical transport (Gallouët et al., 2021).

4. Statistical Properties and Sample Complexity

The effect of entropy penalization is the alleviation of statistical curse of dimensionality in empirical OT estimation. Under bounded cost, the empirical entropy-regularized cost admits $Y$ 7-rate central limit theorems for the cost, plan integrals, and transported observables, with plug-in variance estimators consistent for the asymptotic variance (González-Sanz et al., 2023). For subgaussian distributions, estimation of entropy-regularized OT maps achieves expected $Y$ 8 error rates of $Y$ 9 when one marginal is compactly supported or log-concave, and $\mu \in \mathcal{P}(X)$ 0 rates of $\mu \in \mathcal{P}(X)$ 1 without compactness (Werenski et al., 2023). The differentiability of the cost with respect to marginals under general assumptions is leveraged for statistical inference and barycenter estimation (Mallery et al., 13 Jan 2025), with dimension-independent synthesis rates and explicit quadratic programming formulations for barycentric analysis. Empirically, entropy-regularized estimators enable practical confidence bands and bootstrap inference even for nonsmooth or high-dimensional costs (González-Sanz et al., 2023).

5. Numerical Algorithms and Computational Aspects

The entropy-regularized transport problem admits efficient algorithms via Sinkhorn-type matrix scaling, with cost per iteration $\mu \in \mathcal{P}(X)$ 2, rapid convergence, and high parallelizability (Carlier et al., 2015, Abid et al., 2018). In the semi-discrete regime, entropy regularization yields convex duals over finite-dimensional potentials, and large-scale problems are addressed by a combination of spatial truncation, fast range searches, multilevel discretizations, and regularization annealing, reducing computational complexity dramatically (Khamlich et al., 31 Jul 2025). For multi-marginal and general linear-constrained problems, well-posed ODEs in the dual potentials parameterized by a regularization parameter provide a means to interpolate solutions and compute cost derivatives (Hiew et al., 2024). Entropy-regularized OT can be interpreted as a robust "adversarial" cost problem, where the entropy penalty corresponds to maximizing the transport cost over an uncertainty set consisting of exponential tilts of the cost matrix (Paty et al., 2020).

6. Extensions: Multi-Marginal, Unbalanced, and Structural Generalizations

The entropy-regularized cost extends to multi-marginal settings: $\mu \in \mathcal{P}(X)$ 3 with well-posedness under minimal assumptions and convergence rates in $\mu \in \mathcal{P}(X)$ 4 of order $\mu \in \mathcal{P}(X)$ 5, with constants depending explicitly on the cost signature and marginal dimensions (Nenna et al., 2023, Brizzi et al., 7 Jan 2025). For unbalanced OT, penalizing deviations from the prescribed marginals via the Kullback–Leibler divergence leads to formulations that interpolate between OT and Fisher–Rao metrics, with dynamical and dual structures, regularity results, and geometric properties analogous to balanced entropy-regularized OT (Gallouët et al., 2021).

7. Applications, Statistical and Geometric Interpretations

Entropy-regularized OT is foundational in scalable generative modeling, kernel methods, barycenter synthesis, and robust inference, serving as a loss in GANs and regression for probability measures (Liu et al., 2018, Mallery et al., 13 Jan 2025). For Gaussian measures, closed-form formulas for the cost and optimizer induce a smooth Riemannian geometry closely related to the Wasserstein metric, with explicit interpolations between independence and optimal transport as the regularization sweeps from infinity to zero (Tong et al., 2020). In one-dimensional and multi-dimensional settings, entropic regularization selects optimally diffuse plans among (possibly infinitely many) cost-minimizing ones, resolves nonuniqueness, and quantifies fine geometric effects of mass spreading on transport rays (Marino et al., 2017, Nutz et al., 23 Apr 2026). On graphs, entropy-regularized OT enables Markovian randomized routing subject to edge capacities, efficiently solving flow and coupling problems with added robustness and inherent smoothing of abrupt solutions (Courtain et al., 2021).

Entropy-regularized transport cost is a unifying principle in modern computational, statistical, and geometric optimal transport, balancing fidelity and regularity, underpinning both theoretical developments and large-scale applications.