Entropic Optimal Transport

Updated 30 June 2025

Entropic optimal transport is a regularized framework that adds a KL-divergence term to classical transport, ensuring unique and smooth solutions.
It enables efficient numerical schemes, such as the Sinkhorn algorithm and Bregman splitting, to solve high-dimensional transport problems rapidly.
Its applications span imaging, statistical machine learning, and nonlinear PDEs, demonstrating both theoretical and practical impact.

Entropic optimal transport (EOT) is a regularized version of optimal transport in which a Kullback–Leibler (KL) divergence term is added to the transport cost in order to smooth and strictly convexify the underlying optimization problem. This regularization, motivated both by computational tractability and theoretical properties, fundamentally alters the structure, stability, and applicability of transport couplings, and is a cornerstone of contemporary computational optimal transport, statistical machine learning, and the analysis of nonlinear gradient flows.

1. Mathematical Formulation and Regularization

The standard optimal transport problem seeks a coupling $\pi$ between probability measures $p$ and $q$ that minimizes a cost functional, most commonly $\langle c, \pi \rangle$ , under marginal constraints. In the entropic regularized version, an entropy term is added: $W_\gamma(p, q) = \min_{\pi \in \Pi(p, q)} \langle c, \pi \rangle + \gamma E(\pi)$ where $E(\pi) = \sum_{i,j} \pi_{i,j} (\log \pi_{i,j} - 1) + \iota_{\mathbb{R}^+}(\pi_{i,j})$ is the relative entropy and $\gamma > 0$ is the regularization parameter. The set $\Pi(p, q)$ denotes the couplings with prescribed marginals.

Entropic regularization strictly convexifies the problem, ensuring a unique optimizer, smoothing the coupling and making the minimization numerically stable and amenable to scalable algorithms, notably the Sinkhorn algorithm. As $\gamma \to 0$ , $W_\gamma$ recovers the classical optimal transport (OT) $L^1$ cost; as $\gamma$ increases, the solution approaches the independent product measure.

2. Fast Numerical Methods and Bregman Proximal Splitting

The addition of the entropy term allows the usage of scalable first-order splitting algorithms for numerically solving EOT and related gradient flows. Each step in the discretized flow is rewritten as a minimization of the Kullback–Leibler divergence, augmented with convex constraints: $\min_\pi\, \mathrm{KL}(\pi \mid \xi) + \phi_1(\pi) + \phi_2(\pi)$ where $\xi = e^{-c/\gamma}$ is the Gibbs kernel derived from the cost, and $\phi_1$ , $\phi_2$ encode convex penalties or constraints.

A key computational strategy is Dykstra's algorithm in the space endowed with the KL divergence (i.e., Bregman projection), which alternately applies KL-proximal operators: $\begin{aligned} \pi^{(0)} &= \xi, \quad z^{(-1)} = \mathbf{1} \ \pi^{(\ell)} &= \operatorname{Prox}_{\phi_{[\ell]_2}}^{\mathrm{KL}}\!\left( \pi^{(\ell-1)} \odot z^{(\ell-2)} \right) \ z^{(\ell)} &= z^{(\ell-2)} \odot \frac{\pi^{(\ell-1)}}{\pi^{(\ell)}} \end{aligned}$ The major computational cost reduces to applications of the Gibbs kernel, which on uniform grids and translation-invariant costs (e.g., squared Euclidean) becomes convolution (such as Gaussian convolution), enabling near-linear time complexity. On general domains (e.g., manifolds), multiplication by $\xi$ can be approximated by applying the heat kernel, implemented as sparse linear solves for short-time heat diffusion steps.

3. Wasserstein Gradient Flows and Applications to PDEs

EOT supplies a practical framework for discretizing and numerically solving Wasserstein gradient flows: $p_{t+1} = \underset{p \in \Sigma_N}{\arg\min} \; W_\gamma(p, p_t) + \tau f(p)$ which is an implicit Euler discretization in Wasserstein space. As $\gamma \to 0$ , this recovers the classical JKO scheme for nonlinear diffusion PDEs, including the Fokker–Planck equation (linear diffusion), the porous medium equation (nonlinear diffusion), and crowd transport models with pointwise congestion constraints. The entropic approach yields weak solutions that remain stable and tractable even when classical PDE solvers are challenged by non-linearity or non-smooth constraints.

Through the splitting algorithm, flows with constraints (e.g., hard congestion for crowd motion) or nonlinear local terms can be handled efficiently, generalizing naturally to macroscopic models with complex interactions, spatially varying coefficients, and even multi-density couplings for mixtures or interacting species.

4. Computational Advantages and Scalability

Entropic regularization bestows several computational advantages:

Strict convexity: Guarantees uniqueness and smoothness, alleviating degenerate or ill-posed numerical behaviors present in unregularized OT.
Parallelizability: The underlying operations (matrix scaling, kernel vector products) are highly parallel, suitable for GPU architectures.
Scalability: On uniform grids, convolutions via FFT or highly optimized heat solvers on meshes for non-Euclidean domains are possible.
Sinkhorn algorithm: Matrix scaling reduces EOT to an iterative scaling of the Gibbs kernel, which is orders of magnitude faster than LP solvers for large-scale problems.

The entire operator splitting approach leverages only basic, repeatable operations (e.g., scaling, convolution, sparse solutions), making it robust for industrial and scientific computations, particularly in high dimensions or on geometric domains such as manifolds.

5. Extensions and Generalizations

Methods in EOT extend naturally to:

Complex domains: Heat kernel approximations enable adaptation to triangulated meshes, non-convex and higher genus surfaces.
General penalizations: Any convex functional with a tractable KL-proximal operator can be incorporated, allowing volume constraints, congestion, or even nonlocal interactions.
Multi-marginal problems: The Bregman splitting extends to those with multiple interacting densities and nonpairwise costs, relevant to mixture modeling and multi-agent systems.
Barycenters and geodesics: Efficient and stable computation of Wasserstein barycenters and interpolation along geodesics in measure spaces uses the same entropic infrastructure.

6. Practical Impact and Future Directions

EOT is foundational in modern computational optimal transport, underpinning algorithms in imaging (e.g., color transfer, shape interpolation), machine learning (domain adaptation, distributional robustness, generative modeling), and the analysis of nonlinear PDEs. Its robust numerical properties, theoretical convergence guarantees, and scalability ensure its enduring relevance and expansion into new domains.

Recent directions include deeper integration with learned representations, large-scale graph and manifold learning, and the extension of entropic methods to dynamic or stochastic variants of optimal transport. Continued improvements in proximal algorithms, dual formulations, and adaptivity of regularization parameters will further enhance the scope and efficiency of EOT-based approaches.

Step	Description	Operation
1	JKO Time Stepping	$p_{t+1} = \arg\min_p W_\gamma(p, p_t) + \tau f(p)$
2	Reformulate as KL Minimization	$\operatorname{KL}(\pi \| \xi) + \phi_1 + \phi_2$
3	Bregman/Proximal Splitting (Dykstra's algorithm)	Alternating KL-prox over constraints
4	Gibbs Kernel Multiplication	$\xi = e^{-c/\gamma}$ (convolution or heat flow)
5	Extension to Complex Geometries	Heat diffusion on manifolds

Entropic optimal transport, and the associated Bregman schemes detailed here, represent a central paradigm shift in the efficient and stable simulation and optimization of transport-driven dynamics for high-dimensional, constrained, and geometrically complex domains, unifying a wide array of applications through a rigorously justified computational foundation.

PDF Markdown Chat (Upgrade)