Unbalanced Optimal Transport: Theory & Applications

Updated 25 September 2025

UOT is a generalization of optimal transport that compares measures with different total mass using divergence penalties.
It achieves faster convergence by relaxing hard marginal constraints, improving complexity from 1/ε² to 1/ε under entropic regularization.
UOT is widely applied in computational biology, imaging, and machine learning to effectively manage noisy or imbalanced data.

Unbalanced Optimal Transport (UOT) generalizes classical optimal transport by allowing the comparison of measures with different total mass and introducing penalties for deviations from prescribed marginals. This extension is critical in modern applications—such as computational biology, imaging, and machine learning—where the assumption of equal total mass is often violated. UOT achieves robustness to outliers, flexibility in modeling, and, with appropriate regularization and algorithmic advances, exhibits favorable complexity properties and convergence rates.

1. Core Formulation and Complexity of UOT

The regularized UOT problem considers two measures (with possibly different total mass), discrete support of size at most $n$ , a ground cost matrix $C \in \mathbb{R}^{n \times n}$ , and divergence penalties (typically Kullback–Leibler) on the marginals. The entropic regularized UOT objective takes the form

$\min_{X \geq 0} \;\langle C, X \rangle + \tau \,\mathrm{KL}(X \mathbf{1}_n \;||\; a) + \tau \,\mathrm{KL}(X^\top \mathbf{1}_n \;||\; b) + \eta H(X)$

where:

$a,b$ are nonnegative vectors (marginals, with total mass $\alpha$ and $\beta$ ),
$H(X)$ is the negative entropy,
$\tau$ is the divergence penalty strength,
$\eta$ is the entropic regularization.

The Sinkhorn algorithm for solving the entropic regularized UOT problem operates with complexity

$\widetilde{\mathcal{O}}(n^2/\varepsilon)$

in the high-accuracy regime (fixed $\tau, \alpha, \beta$ ), as established by a geometric convergence proof on the dual variables. In contrast, the standard Sinkhorn for balanced OT scales as $\widetilde{\mathcal{O}}(n^2/\varepsilon^2)$ . This favorable linear $1/\varepsilon$ dependence arises because the UOT solution does not have to meet exact marginal constraints, and the divergence regularizer “absorbs” some error, yielding faster convergence for high-accuracy solutions (Pham et al., 2020).

2. Dual Convergence and Primal–Dual Coupling

A central step in the complexity analysis is the geometric contraction of the error in the dual potentials (measured in $\ell_\infty$ norm), quantified as

$\Delta_{k+1} \leq \Lambda_k, \qquad \Lambda_k = \tau \left(\frac{\tau}{\tau+\eta}\right)^k R,$

where $R$ depends logarithmically on data and $C$ , and $\eta$ is proportional to $\varepsilon$ . This contraction ensures the number of necessary iterations is proportional to $\log(1/\varepsilon)$ . In the primal, the relationship between the function value and the dual error is established through the equation

$g(X^*) + (2\tau+\eta)x^* = \tau(\alpha+\beta),$

with $x^*$ being the total transported mass at optimum. Bounds on the differences in the primal across iterations, together with the dual error bound, ultimately show that the total number of arithmetic operations required for $\varepsilon$ -approximation is of order $\widetilde{\mathcal{O}}(n^2/\varepsilon)$ .

3. Distinctions Between UOT and Classical OT

Unlike OT, which enforces strict equality constraints on the marginals, UOT relaxes these constraints by penalizing deviations using a $\phi$ -divergence. In OT, convergence can be monitored by direct inspection of marginal errors. In UOT, this feasibility check is not possible; instead, one leverages relationships between the objective value and total mass (the “key equation”) to certify convergence.

The notable differences can be summarized as:

Property	OT	UOT
Marginal constraints	Hard (equality)	Soft (divergence penalties)
Verification	Marginal error	Objective-mass coupling
$\varepsilon$ -dependence	Quadratic ( $1/\varepsilon^2$ )	Linear ( $1/\varepsilon$ )
Flexibility	No mass creation/destruction	Allows imbalance

This relaxation in UOT not only makes the optimization more flexible (important for applications with variable total mass), but, critically, under entropic regularization, leads to the improved $1/\varepsilon$ scaling (Pham et al., 2020).

4. Algorithms and Theoretical Guarantees

The iteration complexity of the Sinkhorn-type algorithm for UOT is

$\mathcal{O}\left(\frac{\tau(\alpha+\beta) n^2}{\varepsilon} \log n\,[\,\log(\|C\|_{\infty})+\log\log n + \log(1/\varepsilon)]\right)$

for fixed problem parameters. This complexity result arises by multiplying the per-iteration cost ( $\mathcal{O}(n^2)$ ) by the number of iterations, itself controlled by the dual geometric contraction and the coupling in the primal.

The improved scaling is significant in precision-demanding scenarios: as $\varepsilon \to 0$ , UOT iterations outpace those needed for OT. This efficiency in the high-accuracy regime is practically relevant for tasks where the measures naturally have unbalanced or non-normalized mass.

5. Practical Implications and Deployment Scenarios

In computational biology, imaging, and machine learning, UOT provides a more realistic and effective modeling tool than balanced OT. Empirically, applications may involve total mass that varies substantially and where strict matching is both unnatural and numerically ill-conditioned. UOT exploits this flexibility, resulting in not just theoretical benefits (faster convergence for high-accuracy), but also practical implementation advantages:

Simpler convergence certificates (based on the key coupling equation).
Operationally straightforward stopping criteria.
Robustness to unbalanced or noisy data, with divergence penalties absorbing outliers or small total mass mismatch.
Broad utility in settings where mass preservation is inappropriate or impossible.

6. Key Mathematical Formulas

The central results pivot around the following formulas:

Dual contraction: $\Delta_{k+1} \leq \tau (\tau/(\tau + \eta))^k R$ .
Primal–dual “key equation”: $g(X^*) + (2\tau+\eta)x^* = \tau(\alpha+\beta)$ .
Per-iteration complexity and total cost:

$\text{Total cost} = \mathcal{O}\left( \frac{\tau(\alpha+\beta)n^2}{\varepsilon} \log n [\log \|C\|_\infty + \log \log n + \log (1/\varepsilon)] \right).$

7. Summary and Significance

By geometrically controlling dual convergence and relating dual variable error to the primal objective via a scaling law, the Sinkhorn algorithm for UOT delivers a complexity of $\widetilde{\mathcal{O}}(n^2/\varepsilon)$ . This is a marked improvement over the $1/\varepsilon^2$ scaling of balanced OT and becomes especially impactful for precision-sensitive large-scale data and domains where unbalanced measures are the rule rather than the exception. The result constitutes a fundamental theoretical and practical advance in computational optimal transport, enhancing both feasibility and efficiency in real-world applications (Pham et al., 2020).

PDF Markdown Chat (Pro)

References (1)

On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm (2020)

Follow Topic

Get notified by email when new papers are published related to Unbalanced Optimal Transport (UOT).