Papers
Topics
Authors
Recent
Search
2000 character limit reached

Equivalence of optimal transport problems to regularization on the family of f-divergences

Published 14 Apr 2026 in math.ST | (2604.12996v1)

Abstract: This work establishes that an optimal transport~(OT) problem regularized by a given $f$-divergence admits the same solution as another OT problem regularized by a different $g$-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.

Summary

  • The paper establishes that optimal transport problems regularized by any f-divergence are equivalent to a transformed cost formulation that preserves the unique optimal coupling.
  • Using convex duality, it derives explicit transformations and closed-form optimality conditions via dual potentials under bounded cost conditions.
  • The findings unify different OT regularization methods and facilitate transferring theoretical results across divergences, with implications for generative modeling and domain adaptation.

Equivalence of Divergence-Regularized Optimal Transport Problems

Introduction

The paper "Equivalence of optimal transport problems to regularization on the family of f-divergences" (2604.12996) rigorously establishes that, for optimal transport (OT) problems regularized with ff-divergences, the specific choice of divergence can be traded for a suitable transformation of the cost function without altering the unique optimal coupling. The equivalence is situated within the general framework of Polish spaces with bounded cost, and the results are derived under conditions that ensure both strong duality and the existence of optimal potentials. This work structurally generalizes prior equivalence results from empirical risk minimization to a broader class of regularized OT problems.

Theoretical Framework

Divergence-Regularized OT

The regularized OT problem at the core of this work is formulated as:

infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)

where DϕD_\phi is an ff-divergence generated by a Legendre-type convex function ϕ\phi, and the feasible couplings PXYP_{XY} share fixed marginals PX,PYP_X, P_Y. Crucially, the ff-divergence regularization renders the OT problem smooth and, depending on ϕ\phi, can enforce properties such as sparsity or robustness to outliers.

Dual Formulation and Optimality

Leveraging convex duality, the authors derive the dual problem and establish the structure and uniqueness (up to additive constants) of the optimizers. The derivations employ technical conditions—specifically, that ϕ\phi be a Legendre-type generator and that the cost infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)0 is bounded—allowing derivation of both closed-form optimal pairings and justification for strong duality.

Particularly notable is the explicit characterization of the Radon-Nikodym derivative of the optimal coupling as:

infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)1

for unique dual potentials infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)2.

Main Equivalence Result

The central theorem proves that, for any Legendre-type generator infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)3, there exists a bounded, transformed cost infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)4 such that the infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)5-regularized OT problem is equivalent—in terms of the primal optimizer—to a infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)6-regularized OT problem with cost infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)7:

infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)8

where the transformation of infPXYΠ(PX,PY)X×Yc(x,y)dPXY(x,y)+λDϕ(PXYPXPY)\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)9 to DϕD_\phi0 is determined by the dual potentials of the original problem and the relationships between the derivatives and inverses of DϕD_\phi1 and DϕD_\phi2.

Explicitly,

DϕD_\phi3

This result formally codifies the notion that the impact of the choice of regularization divergence can be fully absorbed into a non-linear cost transformation, such that the computed optimal coupling is unchanged.

Implications and Examples

Numerical Equivalence

The paper provides explicit transformations for notable divergences:

  • Kullback-Leibler to reverse KL: The transformed cost involves exponentials of the original dual potentials and cost, yielding an explicit, bounded, nonlinear cost function.
  • Kullback-Leibler to Jensen-Shannon: The resulting cost transformation involves logarithmic expressions dependent on the original dual solutions.

In both cases, the unique primal minimizer (i.e., the optimal coupling) is preserved despite the change in divergence, confirming the strength of the equivalence claim.

Practical and Theoretical Implications

This equivalence has several significant implications:

  • Unifying View: The equivalence provides a formal unification of regularization strategies in regularized OT, allowing the study of algorithmic and statistical behavior under different divergences via cost transformations.
  • Computational Perspective: While the equivalence is structural, the transformed cost DϕD_\phi4 generally depends on the original problem’s optimal dual variables, so it does not simplify or accelerate numerical computation.
  • Theoretical Utility: The result can be leveraged to analyze properties (e.g., sparsity, robustness) associated with particular divergences via corresponding cost changes, and to transfer results between settings with different regularizations.

Future Research Directions

The authors’ theoretical construction points to several avenues for further research:

  • Developing effective numerical methods for estimating the transformed cost in high-dimensional settings.
  • Leveraging the equivalence to design transfer or adaptation methods in generative modeling or domain adaptation tasks using OT.
  • Extending the equivalence to other relaxation frameworks, such as partial transport or unbalanced OT.

Conclusion

This paper offers a comprehensive theoretical framework for relating divergence-regularized OT problems through cost transformations. Under technical conditions on the divergence and cost, it shows that the choice of regularization divergence is not intrinsic to the minimizer: any such OT problem can be re-expressed with another divergence and an explicitly transformed cost function. The implications are chiefly theoretical, providing a unified lens through which to analyze regularization effects in OT. While the direct computational consequences are limited by the dependence on dual optimality, the results lay groundwork for future algorithmic and theoretical advances in optimal transport and its applications in machine learning and statistics.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 8 likes about this paper.