Equivalence of optimal transport problems to regularization on the family of f-divergences

Published 14 Apr 2026 in math.ST | (2604.12996v1)

Abstract: This work establishes that an optimal transport~(OT) problem regularized by a given $f$-divergence admits the same solution as another OT problem regularized by a different $g$-divergence, under an appropriate transformation of the cost function. This structural equivalence between OT problems regularized by distinct divergences, in the sense of sharing the same unique minimizer, is demonstrated within the framework of Polish spaces with bounded cost functions.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper establishes that optimal transport problems regularized by any f-divergence are equivalent to a transformed cost formulation that preserves the unique optimal coupling.
Using convex duality, it derives explicit transformations and closed-form optimality conditions via dual potentials under bounded cost conditions.
The findings unify different OT regularization methods and facilitate transferring theoretical results across divergences, with implications for generative modeling and domain adaptation.

Equivalence of Divergence-Regularized Optimal Transport Problems

Introduction

The paper "Equivalence of optimal transport problems to regularization on the family of f-divergences" (2604.12996) rigorously establishes that, for optimal transport (OT) problems regularized with $f$ -divergences, the specific choice of divergence can be traded for a suitable transformation of the cost function without altering the unique optimal coupling. The equivalence is situated within the general framework of Polish spaces with bounded cost, and the results are derived under conditions that ensure both strong duality and the existence of optimal potentials. This work structurally generalizes prior equivalence results from empirical risk minimization to a broader class of regularized OT problems.

Theoretical Framework

Divergence-Regularized OT

The regularized OT problem at the core of this work is formulated as:

$\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$

where $D_\phi$ is an $f$ -divergence generated by a Legendre-type convex function $\phi$ , and the feasible couplings $P_{XY}$ share fixed marginals $P_X, P_Y$ . Crucially, the $f$ -divergence regularization renders the OT problem smooth and, depending on $\phi$ , can enforce properties such as sparsity or robustness to outliers.

Dual Formulation and Optimality

Leveraging convex duality, the authors derive the dual problem and establish the structure and uniqueness (up to additive constants) of the optimizers. The derivations employ technical conditions—specifically, that $\phi$ be a Legendre-type generator and that the cost $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 0 is bounded—allowing derivation of both closed-form optimal pairings and justification for strong duality.

Particularly notable is the explicit characterization of the Radon-Nikodym derivative of the optimal coupling as:

$\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 1

for unique dual potentials $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 2.

Main Equivalence Result

The central theorem proves that, for any Legendre-type generator $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 3, there exists a bounded, transformed cost $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 4 such that the $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 5-regularized OT problem is equivalent—in terms of the primal optimizer—to a $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 6-regularized OT problem with cost $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 7:

$\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 8

where the transformation of $\inf_{P_{XY}\in \Pi(P_X, P_Y)} \int_{\mathcal{X}\times \mathcal{Y}} c(x, y)\, dP_{XY}(x, y) + \lambda D_\phi(P_{XY} \| P_X P_Y)$ 9 to $D_\phi$ 0 is determined by the dual potentials of the original problem and the relationships between the derivatives and inverses of $D_\phi$ 1 and $D_\phi$ 2.

Explicitly,

$D_\phi$ 3

This result formally codifies the notion that the impact of the choice of regularization divergence can be fully absorbed into a non-linear cost transformation, such that the computed optimal coupling is unchanged.

Implications and Examples

Numerical Equivalence

The paper provides explicit transformations for notable divergences:

Kullback-Leibler to reverse KL: The transformed cost involves exponentials of the original dual potentials and cost, yielding an explicit, bounded, nonlinear cost function.
Kullback-Leibler to Jensen-Shannon: The resulting cost transformation involves logarithmic expressions dependent on the original dual solutions.

In both cases, the unique primal minimizer (i.e., the optimal coupling) is preserved despite the change in divergence, confirming the strength of the equivalence claim.

Practical and Theoretical Implications

This equivalence has several significant implications:

Unifying View: The equivalence provides a formal unification of regularization strategies in regularized OT, allowing the study of algorithmic and statistical behavior under different divergences via cost transformations.
Computational Perspective: While the equivalence is structural, the transformed cost $D_\phi$ 4 generally depends on the original problem’s optimal dual variables, so it does not simplify or accelerate numerical computation.
Theoretical Utility: The result can be leveraged to analyze properties (e.g., sparsity, robustness) associated with particular divergences via corresponding cost changes, and to transfer results between settings with different regularizations.

Future Research Directions

The authors’ theoretical construction points to several avenues for further research:

Developing effective numerical methods for estimating the transformed cost in high-dimensional settings.
Leveraging the equivalence to design transfer or adaptation methods in generative modeling or domain adaptation tasks using OT.
Extending the equivalence to other relaxation frameworks, such as partial transport or unbalanced OT.

Conclusion

This paper offers a comprehensive theoretical framework for relating divergence-regularized OT problems through cost transformations. Under technical conditions on the divergence and cost, it shows that the choice of regularization divergence is not intrinsic to the minimizer: any such OT problem can be re-expressed with another divergence and an explicitly transformed cost function. The implications are chiefly theoretical, providing a unified lens through which to analyze regularization effects in OT. While the direct computational consequences are limited by the dependence on dual optimality, the results lay groundwork for future algorithmic and theoretical advances in optimal transport and its applications in machine learning and statistics.

Markdown Report Issue