- The paper establishes that optimal transport problems regularized by any f-divergence are equivalent to a transformed cost formulation that preserves the unique optimal coupling.
- Using convex duality, it derives explicit transformations and closed-form optimality conditions via dual potentials under bounded cost conditions.
- The findings unify different OT regularization methods and facilitate transferring theoretical results across divergences, with implications for generative modeling and domain adaptation.
Equivalence of Divergence-Regularized Optimal Transport Problems
Introduction
The paper "Equivalence of optimal transport problems to regularization on the family of f-divergences" (2604.12996) rigorously establishes that, for optimal transport (OT) problems regularized with f-divergences, the specific choice of divergence can be traded for a suitable transformation of the cost function without altering the unique optimal coupling. The equivalence is situated within the general framework of Polish spaces with bounded cost, and the results are derived under conditions that ensure both strong duality and the existence of optimal potentials. This work structurally generalizes prior equivalence results from empirical risk minimization to a broader class of regularized OT problems.
Theoretical Framework
Divergence-Regularized OT
The regularized OT problem at the core of this work is formulated as:
PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)
where Dϕ is an f-divergence generated by a Legendre-type convex function ϕ, and the feasible couplings PXY share fixed marginals PX,PY. Crucially, the f-divergence regularization renders the OT problem smooth and, depending on ϕ, can enforce properties such as sparsity or robustness to outliers.
Leveraging convex duality, the authors derive the dual problem and establish the structure and uniqueness (up to additive constants) of the optimizers. The derivations employ technical conditions—specifically, that ϕ be a Legendre-type generator and that the cost PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)0 is bounded—allowing derivation of both closed-form optimal pairings and justification for strong duality.
Particularly notable is the explicit characterization of the Radon-Nikodym derivative of the optimal coupling as:
PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)1
for unique dual potentials PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)2.
Main Equivalence Result
The central theorem proves that, for any Legendre-type generator PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)3, there exists a bounded, transformed cost PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)4 such that the PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)5-regularized OT problem is equivalent—in terms of the primal optimizer—to a PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)6-regularized OT problem with cost PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)7:
PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)8
where the transformation of PXY∈Π(PX,PY)inf∫X×Yc(x,y)dPXY(x,y)+λDϕ(PXY∥PXPY)9 to Dϕ0 is determined by the dual potentials of the original problem and the relationships between the derivatives and inverses of Dϕ1 and Dϕ2.
Explicitly,
Dϕ3
This result formally codifies the notion that the impact of the choice of regularization divergence can be fully absorbed into a non-linear cost transformation, such that the computed optimal coupling is unchanged.
Implications and Examples
Numerical Equivalence
The paper provides explicit transformations for notable divergences:
- Kullback-Leibler to reverse KL: The transformed cost involves exponentials of the original dual potentials and cost, yielding an explicit, bounded, nonlinear cost function.
- Kullback-Leibler to Jensen-Shannon: The resulting cost transformation involves logarithmic expressions dependent on the original dual solutions.
In both cases, the unique primal minimizer (i.e., the optimal coupling) is preserved despite the change in divergence, confirming the strength of the equivalence claim.
Practical and Theoretical Implications
This equivalence has several significant implications:
- Unifying View: The equivalence provides a formal unification of regularization strategies in regularized OT, allowing the study of algorithmic and statistical behavior under different divergences via cost transformations.
- Computational Perspective: While the equivalence is structural, the transformed cost Dϕ4 generally depends on the original problem’s optimal dual variables, so it does not simplify or accelerate numerical computation.
- Theoretical Utility: The result can be leveraged to analyze properties (e.g., sparsity, robustness) associated with particular divergences via corresponding cost changes, and to transfer results between settings with different regularizations.
Future Research Directions
The authors’ theoretical construction points to several avenues for further research:
- Developing effective numerical methods for estimating the transformed cost in high-dimensional settings.
- Leveraging the equivalence to design transfer or adaptation methods in generative modeling or domain adaptation tasks using OT.
- Extending the equivalence to other relaxation frameworks, such as partial transport or unbalanced OT.
Conclusion
This paper offers a comprehensive theoretical framework for relating divergence-regularized OT problems through cost transformations. Under technical conditions on the divergence and cost, it shows that the choice of regularization divergence is not intrinsic to the minimizer: any such OT problem can be re-expressed with another divergence and an explicitly transformed cost function. The implications are chiefly theoretical, providing a unified lens through which to analyze regularization effects in OT. While the direct computational consequences are limited by the dependence on dual optimality, the results lay groundwork for future algorithmic and theoretical advances in optimal transport and its applications in machine learning and statistics.