Optimal Transport-Based Regularization

Updated 20 December 2025

Optimal Transport-Based Regularization is a framework that applies convex penalties to transport plans to induce low-dimensional, sparse, or smooth structures.
It leverages various penalties such as Schatten norms, entropy, and quadratic norms to address challenges in sample complexity, noise, and interpretability.
Efficient algorithms like mirror descent with KL projections enable scalable optimization with provable recovery guarantees in clustering and domain adaptation.

Optimal transport-based regularization encompasses a broad class of techniques in which convex penalties are imposed on the transport plan, map, or underlying potentials in order to induce low-dimensional, sparse, smooth, or otherwise desirable structure. These schemes extend classical OT to address challenges of sample complexity, statistical noise, computational tractability, interpretability, and robustness in learning and signal-processing applications. The precise effect of regularization depends on the choice of penalty (e.g., entropy, quadratic norm, Schatten norm, $f$ -divergences, group-sparsity, adaptive margin constraints, others), the corresponding convex geometry, and the associated optimization algorithms.

1. Convex Formulations: Schatten- $p$ Norm and General Penalties

A unified convex-analytic formulation for regularized OT problems is

$\min_{P\in U(a,b)} \;\langle C, P\rangle + \mathcal{R}(P)$

where $U(a,b) = \{P\ge0: P1 = a, P^\top1 = b\}$ is the classic transport polytope, $C$ is the cost matrix, and $\mathcal{R}(P)$ is a convex regularizer.

The Schatten‑ $p$ norm regularization (Maunu, 13 Oct 2025) is defined as

$\left\| M \right\|_{S_p} = \left( \sum_{i=1}^r \sigma_i^p \right)^{1/p}$

with $\sigma_i$ the singular values of $M$ , and leads to the convex program

$\min_{P\in U(a,b)} \; \langle C, P \rangle + \lambda \|A(P)\|_{S_p}^q$

for affine maps $A$ and parameters $\lambda,p,q$ . Prominent examples include:

$A(P)=P$ , $p=q=1$ : nuclear norm, promotes low-rank couplings,
$A(P)=P$ , $p=q=2$ : Frobenius norm penalty,
$A$ set to penalize barycentric displacements or other linear maps.

This framework encompasses quadratic ( $L^2$ ), elastic, and other regularization schemes, and supports multi-term penalties $\sum_i \lambda_i \|A_i(P)\|_{S_{p_i}}^{q_i}$ .

Other notable convex OT regularizers include entropy, squared Frobenius, $f$ -divergence, group sparsity, sum-of-norms, and Orlicz-type terms (Tsutsui, 2020, Maunu, 13 Oct 2025, Liu et al., 2022, Rahbar et al., 2019, Terjék et al., 2021, Lorenz et al., 2019, Dessein et al., 2016, Lorenz et al., 2020).

2. Theoretical Analysis and Recovery Guarantees

The convexity of Schatten‑ $p$ regularization enables direct optimality analysis (Maunu, 13 Oct 2025). For $p,q\ge1$ , the KKT conditions for

$\min_{P\in U(a,b)} \;\langle C,P\rangle + \lambda \| A(P)\|_{S_p}^q$

yield that $P^*$ solves the OT problem for the "tilted" cost $C + \lambda G^*$ , where $G^*\in\partial\|A(P^*)\|_{S_p}^q$ . The explicit form for the subgradient involves the SVD of $A(P^*)$ .

Key low-rank recovery results include:

Block-structure recovery: If $\mu,\nu$ each have $R$ clusters and the cost matrix has sufficient cluster separation ( $\Delta_{min}$ ), nuclear norm penalization ( $p=q=1$ ) for $\lambda<g\Delta_{min}$ provably yields the rank‑ $R$ block-diagonal plan matching clusters uniformly:

$P^* = \sum_{t=1}^R \frac{1}{g}1_{S_t}1_{T_t}^\top$
Rank-1 barycentric map recovery: With sources, targets structured along subspaces, Schatten-1 penalization of the displacement matrix $A(P)$ produces rank-1 barycentric structure for $0\le\lambda<\Lambda-2\rho$ , where $\Lambda$ is minimal source separation.

3. Algorithmic Approaches: Mirror Descent and KL Projections

For large-scale problems, efficient algorithms are essential. The mirror descent paradigm with a KL (negative-entropy) mirror map is central for Schatten‑ $p$ and other convex regularizations (Maunu, 13 Oct 2025).

Mirror descent iteration for Schatten OT:

Initialize $P^0\in U(a,b)$ .
For $k=0,1,\dots:$ $k = 0, 1, \dots :$
- Compute SVD $A(P^k)=U^k\operatorname{diag}(\sigma^k)(V^k)^\top$ ,
- Construct subgradient $G^k$ using $q\|A(P^k)\|_{S_p}^{q-p} A^*(U^k (\sigma^k)^{p-1} (V^k)^\top)$ ,
- Update: $\widehat{P}_{ij}=P^k_{ij}\exp(-\tau^k G^k_{ij})$ ,
- KL-projection: $P^{k+1}=\arg\min_{P\in U(a,b)} KL(P\|\widehat{P})$ .

The KL-projection is performed by Sinkhorn scaling. For general convex $p,q\ge1$ , step-size $\tau^k\propto 1/\sqrt{k}$ ensures $O(1/\sqrt{T})$ convergence. In the regime of sharp minima (e.g. low-rank recovery with $p=q=1$ ), a geometrically decaying $\tau^k$ yields linear convergence.

This methodology enables scalable optimization (up to $n=1000$ ) with low-rank SVDs and Sinkhorn subroutines; it admits practical extension to convex generic regularizers with dual or alternating projection techniques (Tsutsui, 2020, Dessein et al., 2016).

4. Empirical Properties and Practical Benefits

Experimental evidence demonstrates the efficacy of Schatten- $p$ regularization in both synthetic and real datasets (Maunu, 13 Oct 2025):

Synthetic cluster data: Schatten‑1 (nuclear) regularization sharply reduces the effective rank of the transport plan, imposing block-diagonal structure, with minimal increase in transport cost; Schatten‑2 (Frobenius) provides a more gradual decrease in rank.
Barycentric displacement models: Rank of barycentric maps is similarly suppressed.
Cell perturbation data: On 4i proteomics (CellOT), Schatten-1 regularization yields significant compression (rank reduction) in both the transport plan and barycentric displacement, while preserving cost at levels comparable to classical entropic-regularized OT.
Scalability: The mirror-descent framework scales efficiently via alternating Sinkhorn and SVD steps.
Convergence: Linear convergence is attained under strong regularization; sublinear in highly ill-conditioned settings.

5. Connections to Other Regularization Schemes

Schatten‑ $p$ OT regularization generalizes several classic and emerging approaches:

Entropy (KL) Regularization (Sinkhorn): Promotes full-rank, dense plans and enables fast scalable solvers (Cuturi et al., 2018, Dessein et al., 2016).
Quadratic/Frobenius Regularization: Yields sparse transport plans with explicit dual structure and enables Newton-type solvers (Lorenz et al., 2019, Essid et al., 2017).
Sum-of-Norms/Group-Sparse Regularization: Encourages block-structured or class-preserving coupling matrices (Rahbar et al., 2019).
$f$ ‑divergence Regularization: Promotes various tradeoffs between sparsity, robustness, and smoothness via divergence classes (Terjék et al., 2021, Nakamura et al., 2022).
Orlicz-space Regularization: Generalizes entropic and $L^p$ penalties to Orlicz norms, broadening the design space of convex regularizers (Lorenz et al., 2020).

A tabular summary of prominent regularizers:

Regularizer	Effect on Plan	Example Parameter
Nuclear/Schatten-1	Low-rank	$p=q=1$ , $A(P)=P$
Frobenius/Schatten-2	Smooth, gradual	$p=q=2$
Entropy (KL)	Dense, full-rank	$\mathcal{R}(P) = \mathrm{KL}(P)$
Group/Block lasso	Block-sparsity	Sum-of-norms

6. Extensions and Applications

Schatten- $p$ and OT-based regularization techniques support further extensions and real-world deployments:

Barycentric regularization: Penalizing barycentric displacement matrices via Schatten norm induces interpretable low-rank structure in pushforward maps (Maunu, 13 Oct 2025).
Domain adaptation: Low-rank or group-sparsity regularizers improve OT-based domain adaptation and class-preserving transport (Assel et al., 2023, Rahbar et al., 2019).
Robustness: $f$ -divergence and $\beta$ -potential regularizations increase robustness to outliers or noise in transport problems (Nakamura et al., 2022).
High-dimensional statistical learning: Schatten and similar regularizers yield plans and maps with controlled statistical complexity, mitigating the curse of dimensionality (Paty et al., 2019).

7. Summary and Outlook

Optimal transport-based regularization, and in particular the Schatten‑ $p$ norm framework, provides a principled machinery to induce low-dimensional, interpretable, and robust structure in transport plans and transport-induced maps. Its convexity enables explicit optimality analysis and tractable algorithms that scale to thousands of points. These methods yield rigorous recovery guarantees in clustered and low-dimensional regimes, with empirical efficacy for both synthetic and large-scale biological data (Maunu, 13 Oct 2025). The approach unifies and generalizes a wide array of regularization architectures, with ongoing research extending these principles to more complex and high-dimensional applications across machine learning, computational biology, and signal processing.