Papers
Topics
Authors
Recent
2000 character limit reached

Divergence-Regularized Optimal Transport

Updated 25 December 2025
  • Divergence-Regularized Optimal Transport is a framework that incorporates convex divergence penalties (e.g., KL, Tsallis) to smooth the classical OT problem and facilitate convex relaxations.
  • It leverages dual formulations and specialized numerical algorithms such as Sinkhorn, mirror descent, and Bregman projections to achieve efficient, robust, and scalable optimization.
  • This approach improves sample complexity, statistical convergence, and robustness, with applications in robust learning, statistical inference, and generative modeling.

Divergence-Regularized Optimal Transport is a broad framework for convex relaxations of the classical optimal transport (OT) problem, in which the original linear programming problem is smoothed or "regularized" by a convex divergence term acting on the space of couplings. This methodology encompasses entropic regularization (based on Kullback-Leibler (KL) divergence), Tsallis, Rényi, β\beta-divergence, Bregman, and other ff-divergence regularizations, as well as kernel methods such as MMD-regularized OT. It enables scalable numerical algorithms, admits rigorous statistical theory, improves robustness and sample complexity, and provides a unifying interface between geometry, information, convex analysis, and applications, especially in high-dimensional data science, statistical inference, and generative modeling.

1. Mathematical Formulation of Divergence-Regularized OT

Let (X1,B1,μ)(X_1,\mathcal{B}_1,\mu) and (X2,B2,ν)(X_2,\mathcal{B}_2,\nu) be Polish probability spaces and c:X1×X2[0,+)c : X_1 \times X_2 \to [0, +\infty) a lower-semicontinuous cost. The set of couplings is

Π(μ,ν)={πP(X1×X2):πX1=μ,πX2=ν}.\Pi(\mu, \nu) = \{ \pi \in \mathcal P(X_1 \times X_2) : \pi_{X_1} = \mu,\, \pi_{X_2} = \nu \}.

The unregularized Monge–Kantorovich problem is

OT(μ,ν)=infπΠ(μ,ν)X1×X2c(x,y)dπ(x,y).\mathrm{OT}(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} \int_{X_1 \times X_2} c(x, y)\,d\pi(x, y).

Divergence regularization introduces an additive penalty on π\pi with respect to a reference, usually μν\mu\otimes\nu, via a convex divergence DfD_f:

OTf,ε(μ,ν)=infπΠ(μ,ν){cdπ+εDf(πμν)}\mathrm{OT}_{f, \varepsilon}(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} \left\{ \int c\,d\pi + \varepsilon\,D_f(\pi\|\mu\otimes\nu) \right\}

with

Df(πμν)=X1×X2f(dπd(μν))d(μν)D_f(\pi\|\mu\otimes\nu) = \int_{X_1 \times X_2} f\left( \frac{d\pi}{d(\mu\otimes\nu)} \right)\,d(\mu\otimes\nu)

for convex ff satisfying f(1)=0f(1) = 0. Notable cases:

  • KL (entropic): f(u)=uloguf(u) = u \log u
  • Tsallis: f(u)=(uqu)/(q1)f(u) = (u^q - u)/(q-1) for q>1q > 1
  • β\beta-divergence: fβf_\beta as in (Nakamura et al., 2022)
  • Bregman, Rényi, MMD, and others

Dual formulations are given via Fenchel–Rockafellar conjugacy using the convex dual ff^*. For Tsallis-regularized OT (Suguro et al., 2023), e.g.,

OTq,ε(μ,ν)=sup(h1,h2)L1(μ)×L1(ν){h1dμ+h2dνεfq(h1(x)+h2(y)c(x,y)ε)dμdν}.\mathrm{OT}_{q,\varepsilon}(\mu,\nu) = \sup_{(h_1, h_2) \in L^1(\mu) \times L^1(\nu)} \left\{ \int h_1 d\mu + \int h_2 d\nu - \varepsilon \int f_q^*\left( \frac{h_1(x) + h_2(y) - c(x, y)}{\varepsilon} \right) d\mu d\nu \right\}.

2. Classes of Divergences, Interpolation, and Limiting Behavior

Divergence regularizations span a wide spectrum. KL, Tsallis, β\beta-divergence, Bregman divergences, and Rényi divergences (for α(0,1)\alpha \in (0, 1)) are prominent:

  • KL (q1q \to 1, β1\beta \to 1): recovers the entropic regularizer and Sinkhorn algorithm.
  • Tsallis (q>1q > 1): allows polynomial penalty structure, fusing sparsity with smoothness, converges to KL as q1q \to 1; as q0q \to 0, recovers classic OT (Suguro et al., 2023).
  • β\beta-divergence (Nakamura et al., 2022): interpolates entropic (KL) and robust hard-thresholding regimes. Robust to outliers for β>1\beta > 1.
  • Rényi divergence (Bresch et al., 29 Apr 2024): for α1\alpha \nearrow 1 recovers KL; for α0\alpha \searrow 0 recovers unregularized OT. Not an ff-divergence/Bregman distance, but admits strict convexity, metrization, and symmetry.

This interpolation property allows one to tune regularization parameters (e.g. α\alpha in Rényi or qq in Tsallis) to squeeze the regularized solution toward the true OT plan or, inversely, to maximize smoothness for tractable Sinkhorn-like computation.

3. Convergence Rates and Γ\Gamma-Convergence

Central theoretical results concern the vanishing regularization limit ε0\varepsilon \to 0:

  • The functional OTf,ε\mathrm{OT}_{f,\varepsilon} Γ\Gamma-converges to the unregularized OT\mathrm{OT} (Suguro et al., 2023, Eckstein et al., 2022).
  • Quantization and shadow coupling arguments yield explicit convergence rates: in the entropic case (KL), the gap OTKL,εOT=O(εlog(1/ε))\mathrm{OT}_{\rm KL, \varepsilon} - \mathrm{OT} = O(\varepsilon\log(1/\varepsilon)) for typical costs and dimension (Suguro et al., 2023, Eckstein et al., 2022).
  • For Tsallis regularization (q>1)(q > 1), rates slow to polynomial decay in ε\varepsilon: OTq,εOT=O(ε1/(q1)+1)\mathrm{OT}_{q, \varepsilon} - \mathrm{OT} = O\big(\varepsilon^{1/(q-1)+1}\big), and the KL case is provably optimal among all q1q \ge 1 (Suguro et al., 2023).

The limits in other regularization families parallel this pattern—α\alpha-Rényi divergence regularization recovers the hard OT plan for α0\alpha \to 0 without the numerical instability attendant to ε0\varepsilon \to 0 in classical entropic regularization (Bresch et al., 29 Apr 2024).

4. Numerical Algorithms and Computation

Divergence regularization transforms the OT problem into a strictly convex optimization, admitting scalable algorithms:

Strict convexity and superlinear growth of divergence ensure unique minimizers, strong duality, and global convergence, subject to compactness and regularity.

5. Regularized OT: Statistical Theory and Sample Complexity

Divergence regularization fundamentally shapes the statistical behavior of empirical OT estimators:

6. Applications: Robustness, Inference, and Geometry

Divergence-regularized OT is used extensively across disciplines:

  • Robust Learning: β\beta-potential regularization prevents mass transport to outliers, achieving statistical performance superior to entropic OT in contaminated data (Nakamura et al., 2022).
  • Ecological inference: Tsallis-regularized OT offers state-of-the-art marginal reconstruction for political science datasets (Muzellec et al., 2016).
  • Kernel and RKHS-based OT: MMD regularization interpolates classical OT and kernel distances, combining sample efficiency and ground-metric geometry (Manupriya et al., 2020).
  • Generative modeling: Proximal OT divergences provide tractable interpolants between GAN-style ff-divergences and OT distances, governing flows in probability space (Baptista et al., 17 May 2025, Birrell et al., 2023).
  • Distributionally Robust Optimization: Infimal-convolution (OT-regularized divergence) sets define ambiguity sets that generalize Wasserstein and ff-divergence DRO schemes (Birrell et al., 2023, Baptista et al., 17 May 2025).

Empirical results show that, for moderate regularization, sparser couplings (Tsallis, Rényi, β\beta) yield OT plans closer to the ground truth than KL-regularized schemes, particularly in real-world inference tasks (Bresch et al., 29 Apr 2024).

7. Comparisons, Limitations, and Generalizations

  • KL regularization: Fast convergence, maximal smoothness, and computational simplicity via the Sinkhorn algorithm; cost bias vanishes logarithmically as ε0\varepsilon \to 0.
  • Tsallis/LpL^p regularization: Allows sparse couplings and variable convergence rates polynomial in ε\varepsilon; bias decays slower than KL (Suguro et al., 2023, Muzellec et al., 2016, González-Sanz et al., 7 May 2025).
  • Rényi regularization: Enables interpolation from OT to KL without numerical instability, yielding plans that empirically outperform both (Bresch et al., 29 Apr 2024).
  • General ff-divergence/Bregman regularization: All known convex regularizers with suitable smoothness yield strict convexity, well-behaved limits, statistical efficiency, and strong convergence theory.
  • Extensions: Multi-marginal divergence-regularized OT, barycenters, unbalanced OT, and models with spatially varying divergences (e.g., homogeneous UROT/OT with boundary (Lacombe, 2022)), and optimal transport-regularized divergences (infimal convolution) (Baptista et al., 17 May 2025).

Sharpness of rates and uniform statistical bounds depend critically on the specific divergence, cost regularity, and data geometry. KL regularization remains optimal in terms of fastest vanishing bias, but non-entropy divergences enable practical gains in robustness and inference accuracy.


References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Divergence-Regularized Optimal Transport.