Divergence-Regularized Optimal Transport
- Divergence-Regularized Optimal Transport is a framework that incorporates convex divergence penalties (e.g., KL, Tsallis) to smooth the classical OT problem and facilitate convex relaxations.
- It leverages dual formulations and specialized numerical algorithms such as Sinkhorn, mirror descent, and Bregman projections to achieve efficient, robust, and scalable optimization.
- This approach improves sample complexity, statistical convergence, and robustness, with applications in robust learning, statistical inference, and generative modeling.
Divergence-Regularized Optimal Transport is a broad framework for convex relaxations of the classical optimal transport (OT) problem, in which the original linear programming problem is smoothed or "regularized" by a convex divergence term acting on the space of couplings. This methodology encompasses entropic regularization (based on Kullback-Leibler (KL) divergence), Tsallis, Rényi, -divergence, Bregman, and other -divergence regularizations, as well as kernel methods such as MMD-regularized OT. It enables scalable numerical algorithms, admits rigorous statistical theory, improves robustness and sample complexity, and provides a unifying interface between geometry, information, convex analysis, and applications, especially in high-dimensional data science, statistical inference, and generative modeling.
1. Mathematical Formulation of Divergence-Regularized OT
Let and be Polish probability spaces and a lower-semicontinuous cost. The set of couplings is
The unregularized Monge–Kantorovich problem is
Divergence regularization introduces an additive penalty on with respect to a reference, usually , via a convex divergence :
with
for convex satisfying . Notable cases:
- KL (entropic):
- Tsallis: for
- -divergence: as in (Nakamura et al., 2022)
- Bregman, Rényi, MMD, and others
Dual formulations are given via Fenchel–Rockafellar conjugacy using the convex dual . For Tsallis-regularized OT (Suguro et al., 2023), e.g.,
2. Classes of Divergences, Interpolation, and Limiting Behavior
Divergence regularizations span a wide spectrum. KL, Tsallis, -divergence, Bregman divergences, and Rényi divergences (for ) are prominent:
- KL (, ): recovers the entropic regularizer and Sinkhorn algorithm.
- Tsallis (): allows polynomial penalty structure, fusing sparsity with smoothness, converges to KL as ; as , recovers classic OT (Suguro et al., 2023).
- -divergence (Nakamura et al., 2022): interpolates entropic (KL) and robust hard-thresholding regimes. Robust to outliers for .
- Rényi divergence (Bresch et al., 29 Apr 2024): for recovers KL; for recovers unregularized OT. Not an -divergence/Bregman distance, but admits strict convexity, metrization, and symmetry.
This interpolation property allows one to tune regularization parameters (e.g. in Rényi or in Tsallis) to squeeze the regularized solution toward the true OT plan or, inversely, to maximize smoothness for tractable Sinkhorn-like computation.
3. Convergence Rates and -Convergence
Central theoretical results concern the vanishing regularization limit :
- The functional -converges to the unregularized (Suguro et al., 2023, Eckstein et al., 2022).
- Quantization and shadow coupling arguments yield explicit convergence rates: in the entropic case (KL), the gap for typical costs and dimension (Suguro et al., 2023, Eckstein et al., 2022).
- For Tsallis regularization , rates slow to polynomial decay in : , and the KL case is provably optimal among all (Suguro et al., 2023).
The limits in other regularization families parallel this pattern—-Rényi divergence regularization recovers the hard OT plan for without the numerical instability attendant to in classical entropic regularization (Bresch et al., 29 Apr 2024).
4. Numerical Algorithms and Computation
Divergence regularization transforms the OT problem into a strictly convex optimization, admitting scalable algorithms:
- Sinkhorn Algorithm: Classical for KL/entropic regularization; multiplicative row-column scaling (Benamou et al., 2014, Muzellec et al., 2016, Bigot et al., 2017).
- Generalized Sinkhorn and IPFP: For Tsallis/-divergences, projections via Newton-type updates or mirror descent, or via alternation of marginal constraints (Nakamura et al., 2022, Terjék et al., 2021, Marino et al., 2020).
- Mirror Descent for Rényi-regularized OT: Uses negative Shannon entropy map, explicit gradient computation, and marginal-projection steps (Bresch et al., 29 Apr 2024).
- Bregman Projection Methods: For general smooth convex regularizers, using Dykstra-style alternation, Newton–Raphson for separable cases, sparse extension for high dimension (Dessein et al., 2016).
- Accelerated Projected Gradient Descent: For kernelized (MMD) and other IPM regularized OT (Manupriya et al., 2020).
- Infinite-dimensional settings: Dynamic (Benamou–Brenier, mean-field game) representations for proximal OT divergences, blending divergence and OT via infimal convolution (Baptista et al., 17 May 2025, Birrell et al., 2023).
Strict convexity and superlinear growth of divergence ensure unique minimizers, strong duality, and global convergence, subject to compactness and regularity.
5. Regularized OT: Statistical Theory and Sample Complexity
Divergence regularization fundamentally shapes the statistical behavior of empirical OT estimators:
- Parametric rates () for all -divergences: Provided the cost is bounded and is , empirical regularized OT achieves the parametric rate for sample complexity, in sharp contrast with the curse of dimensionality intrinsic to unregularized OT (Yang et al., 2 Oct 2025, González-Sanz et al., 7 May 2025, Bayraktar et al., 2022).
- Central Limit Theorems: Limiting Gaussian distributions for the regularized OT cost, plan, and dual potentials, in both one- and two-sample regimes, with explicit covariance formulas (Klatt et al., 2018, Bigot et al., 2017, Yang et al., 2 Oct 2025, González-Sanz et al., 7 May 2025).
- Bootstrap Consistency: Ordinary -out-of- bootstrap is valid for empirical divergence-regularized OT, enabling statistical inference and confidence bands (Bigot et al., 2017, Klatt et al., 2018).
- Stability and Regularity: Quantitative bounds on the change in optimizers under marginal perturbations, strengthening robustness.
- Intrinsic Dimension and Smoothness: Fast rates are achievable depending on the regularity of cost/divergence and the “intrinsic dimension” of the data (Bayraktar et al., 2022).
6. Applications: Robustness, Inference, and Geometry
Divergence-regularized OT is used extensively across disciplines:
- Robust Learning: -potential regularization prevents mass transport to outliers, achieving statistical performance superior to entropic OT in contaminated data (Nakamura et al., 2022).
- Ecological inference: Tsallis-regularized OT offers state-of-the-art marginal reconstruction for political science datasets (Muzellec et al., 2016).
- Kernel and RKHS-based OT: MMD regularization interpolates classical OT and kernel distances, combining sample efficiency and ground-metric geometry (Manupriya et al., 2020).
- Generative modeling: Proximal OT divergences provide tractable interpolants between GAN-style -divergences and OT distances, governing flows in probability space (Baptista et al., 17 May 2025, Birrell et al., 2023).
- Distributionally Robust Optimization: Infimal-convolution (OT-regularized divergence) sets define ambiguity sets that generalize Wasserstein and -divergence DRO schemes (Birrell et al., 2023, Baptista et al., 17 May 2025).
Empirical results show that, for moderate regularization, sparser couplings (Tsallis, Rényi, ) yield OT plans closer to the ground truth than KL-regularized schemes, particularly in real-world inference tasks (Bresch et al., 29 Apr 2024).
7. Comparisons, Limitations, and Generalizations
- KL regularization: Fast convergence, maximal smoothness, and computational simplicity via the Sinkhorn algorithm; cost bias vanishes logarithmically as .
- Tsallis/ regularization: Allows sparse couplings and variable convergence rates polynomial in ; bias decays slower than KL (Suguro et al., 2023, Muzellec et al., 2016, González-Sanz et al., 7 May 2025).
- Rényi regularization: Enables interpolation from OT to KL without numerical instability, yielding plans that empirically outperform both (Bresch et al., 29 Apr 2024).
- General -divergence/Bregman regularization: All known convex regularizers with suitable smoothness yield strict convexity, well-behaved limits, statistical efficiency, and strong convergence theory.
- Extensions: Multi-marginal divergence-regularized OT, barycenters, unbalanced OT, and models with spatially varying divergences (e.g., homogeneous UROT/OT with boundary (Lacombe, 2022)), and optimal transport-regularized divergences (infimal convolution) (Baptista et al., 17 May 2025).
Sharpness of rates and uniform statistical bounds depend critically on the specific divergence, cost regularity, and data geometry. KL regularization remains optimal in terms of fastest vanishing bias, but non-entropy divergences enable practical gains in robustness and inference accuracy.
References
- (Suguro et al., 2023) Convergence rate of Tsallis entropic regularized optimal transport
- (González-Sanz et al., 7 May 2025) Sparse Regularized Optimal Transport without Curse of Dimensionality
- (Yang et al., 2 Oct 2025) General Divergence Regularized Optimal Transport: Sample Complexity and Central Limit Theorems
- (Muzellec et al., 2016) Tsallis Regularized Optimal Transport and Ecological Inference
- (Nakamura et al., 2022) Robust computation of optimal transport by -potential regularization
- (Bresch et al., 29 Apr 2024) Interpolating between Optimal Transport and KL regularized Optimal Transport using Rényi Divergences
- (Manupriya et al., 2020) MMD-Regularized Unbalanced Optimal Transport
- (Eckstein et al., 2022) Convergence Rates for Regularized Optimal Transport via Quantization
- (Terjék et al., 2021) Optimal transport with -divergence regularization and generalized Sinkhorn algorithm
- (Marino et al., 2020) Optimal Transport losses and Sinkhorn algorithm with general convex regularization
- (Dessein et al., 2016) Regularized Optimal Transport and the Rot Mover's Distance
- (Bayraktar et al., 2022) Stability and Sample Complexity of Divergence Regularized Optimal Transport
- (Klatt et al., 2018) Empirical Regularized Optimal Transport: Statistical Theory and Applications
- (Bigot et al., 2017) Central limit theorems for entropy-regularized optimal transport on finite spaces and statistical applications
- (Birrell et al., 2023) Adversarially Robust Learning with Optimal Transport Regularized Divergences
- (Baptista et al., 17 May 2025) Proximal optimal transport divergences
- (Lacombe, 2022) An Homogeneous Unbalanced Regularized Optimal Transport model with applications to Optimal Transport with Boundary
- (Morikuni et al., 2023) Error estimate for regularized optimal transport problems via Bregman divergence
- (Benamou et al., 2014) Iterative Bregman Projections for Regularized Transportation Problems