Papers
Topics
Authors
Recent
2000 character limit reached

Optimal Transport-Based Regularization

Updated 20 December 2025
  • Optimal Transport-Based Regularization is a framework that applies convex penalties to transport plans to induce low-dimensional, sparse, or smooth structures.
  • It leverages various penalties such as Schatten norms, entropy, and quadratic norms to address challenges in sample complexity, noise, and interpretability.
  • Efficient algorithms like mirror descent with KL projections enable scalable optimization with provable recovery guarantees in clustering and domain adaptation.

Optimal transport-based regularization encompasses a broad class of techniques in which convex penalties are imposed on the transport plan, map, or underlying potentials in order to induce low-dimensional, sparse, smooth, or otherwise desirable structure. These schemes extend classical OT to address challenges of sample complexity, statistical noise, computational tractability, interpretability, and robustness in learning and signal-processing applications. The precise effect of regularization depends on the choice of penalty (e.g., entropy, quadratic norm, Schatten norm, ff-divergences, group-sparsity, adaptive margin constraints, others), the corresponding convex geometry, and the associated optimization algorithms.

1. Convex Formulations: Schatten-pp Norm and General Penalties

A unified convex-analytic formulation for regularized OT problems is

minPU(a,b)  C,P+R(P)\min_{P\in U(a,b)} \;\langle C, P\rangle + \mathcal{R}(P)

where U(a,b)={P0:P1=a,P1=b}U(a,b) = \{P\ge0: P1 = a, P^\top1 = b\} is the classic transport polytope, CC is the cost matrix, and R(P)\mathcal{R}(P) is a convex regularizer.

The Schatten‑pp norm regularization (Maunu, 13 Oct 2025) is defined as

MSp=(i=1rσip)1/p\left\| M \right\|_{S_p} = \left( \sum_{i=1}^r \sigma_i^p \right)^{1/p}

with σi\sigma_i the singular values of MM, and leads to the convex program

minPU(a,b)  C,P+λA(P)Spq\min_{P\in U(a,b)} \; \langle C, P \rangle + \lambda \|A(P)\|_{S_p}^q

for affine maps AA and parameters λ,p,q\lambda,p,q. Prominent examples include:

  • A(P)=PA(P)=P, p=q=1p=q=1: nuclear norm, promotes low-rank couplings,
  • A(P)=PA(P)=P, p=q=2p=q=2: Frobenius norm penalty,
  • AA set to penalize barycentric displacements or other linear maps.

This framework encompasses quadratic (L2L^2), elastic, and other regularization schemes, and supports multi-term penalties iλiAi(P)Spiqi\sum_i \lambda_i \|A_i(P)\|_{S_{p_i}}^{q_i}.

Other notable convex OT regularizers include entropy, squared Frobenius, ff-divergence, group sparsity, sum-of-norms, and Orlicz-type terms (Tsutsui, 2020, Maunu, 13 Oct 2025, Liu et al., 2022, Rahbar et al., 2019, Terjék et al., 2021, Lorenz et al., 2019, Dessein et al., 2016, Lorenz et al., 2020).

2. Theoretical Analysis and Recovery Guarantees

The convexity of Schatten‑pp regularization enables direct optimality analysis (Maunu, 13 Oct 2025). For p,q1p,q\ge1, the KKT conditions for

minPU(a,b)  C,P+λA(P)Spq\min_{P\in U(a,b)} \;\langle C,P\rangle + \lambda \| A(P)\|_{S_p}^q

yield that PP^* solves the OT problem for the "tilted" cost C+λGC + \lambda G^*, where GA(P)SpqG^*\in\partial\|A(P^*)\|_{S_p}^q. The explicit form for the subgradient involves the SVD of A(P)A(P^*).

Key low-rank recovery results include:

  • Block-structure recovery: If μ,ν\mu,\nu each have RR clusters and the cost matrix has sufficient cluster separation (Δmin\Delta_{min}), nuclear norm penalization (p=q=1p=q=1) for λ<gΔmin\lambda<g\Delta_{min} provably yields the rank‑RR block-diagonal plan matching clusters uniformly:

    P=t=1R1g1St1TtP^* = \sum_{t=1}^R \frac{1}{g}1_{S_t}1_{T_t}^\top

  • Rank-1 barycentric map recovery: With sources, targets structured along subspaces, Schatten-1 penalization of the displacement matrix A(P)A(P) produces rank-1 barycentric structure for 0λ<Λ2ρ0\le\lambda<\Lambda-2\rho, where Λ\Lambda is minimal source separation.

3. Algorithmic Approaches: Mirror Descent and KL Projections

For large-scale problems, efficient algorithms are essential. The mirror descent paradigm with a KL (negative-entropy) mirror map is central for Schatten‑pp and other convex regularizations (Maunu, 13 Oct 2025).

Mirror descent iteration for Schatten OT:

  1. Initialize P0U(a,b)P^0\in U(a,b).
  2. For k=0,1,:k=0,1,\dots:
    • Compute SVD A(Pk)=Ukdiag(σk)(Vk)A(P^k)=U^k\operatorname{diag}(\sigma^k)(V^k)^\top,
    • Construct subgradient GkG^k using qA(Pk)SpqpA(Uk(σk)p1(Vk))q\|A(P^k)\|_{S_p}^{q-p} A^*(U^k (\sigma^k)^{p-1} (V^k)^\top),
    • Update: P^ij=Pijkexp(τkGijk)\widehat{P}_{ij}=P^k_{ij}\exp(-\tau^k G^k_{ij}),
    • KL-projection: Pk+1=argminPU(a,b)KL(PP^)P^{k+1}=\arg\min_{P\in U(a,b)} KL(P\|\widehat{P}).

The KL-projection is performed by Sinkhorn scaling. For general convex p,q1p,q\ge1, step-size τk1/k\tau^k\propto 1/\sqrt{k} ensures O(1/T)O(1/\sqrt{T}) convergence. In the regime of sharp minima (e.g. low-rank recovery with p=q=1p=q=1), a geometrically decaying τk\tau^k yields linear convergence.

This methodology enables scalable optimization (up to n=1000n=1000) with low-rank SVDs and Sinkhorn subroutines; it admits practical extension to convex generic regularizers with dual or alternating projection techniques (Tsutsui, 2020, Dessein et al., 2016).

4. Empirical Properties and Practical Benefits

Experimental evidence demonstrates the efficacy of Schatten-pp regularization in both synthetic and real datasets (Maunu, 13 Oct 2025):

  • Synthetic cluster data: Schatten‑1 (nuclear) regularization sharply reduces the effective rank of the transport plan, imposing block-diagonal structure, with minimal increase in transport cost; Schatten‑2 (Frobenius) provides a more gradual decrease in rank.
  • Barycentric displacement models: Rank of barycentric maps is similarly suppressed.
  • Cell perturbation data: On 4i proteomics (CellOT), Schatten-1 regularization yields significant compression (rank reduction) in both the transport plan and barycentric displacement, while preserving cost at levels comparable to classical entropic-regularized OT.
  • Scalability: The mirror-descent framework scales efficiently via alternating Sinkhorn and SVD steps.
  • Convergence: Linear convergence is attained under strong regularization; sublinear in highly ill-conditioned settings.

5. Connections to Other Regularization Schemes

Schatten‑pp OT regularization generalizes several classic and emerging approaches:

  • Entropy (KL) Regularization (Sinkhorn): Promotes full-rank, dense plans and enables fast scalable solvers (Cuturi et al., 2018, Dessein et al., 2016).
  • Quadratic/Frobenius Regularization: Yields sparse transport plans with explicit dual structure and enables Newton-type solvers (Lorenz et al., 2019, Essid et al., 2017).
  • Sum-of-Norms/Group-Sparse Regularization: Encourages block-structured or class-preserving coupling matrices (Rahbar et al., 2019).
  • ff‑divergence Regularization: Promotes various tradeoffs between sparsity, robustness, and smoothness via divergence classes (Terjék et al., 2021, Nakamura et al., 2022).
  • Orlicz-space Regularization: Generalizes entropic and LpL^p penalties to Orlicz norms, broadening the design space of convex regularizers (Lorenz et al., 2020).

A tabular summary of prominent regularizers:

Regularizer Effect on Plan Example Parameter
Nuclear/Schatten-1 Low-rank p=q=1p=q=1, A(P)=PA(P)=P
Frobenius/Schatten-2 Smooth, gradual p=q=2p=q=2
Entropy (KL) Dense, full-rank R(P)=KL(P)\mathcal{R}(P) = \mathrm{KL}(P)
Group/Block lasso Block-sparsity Sum-of-norms

6. Extensions and Applications

Schatten-pp and OT-based regularization techniques support further extensions and real-world deployments:

  • Barycentric regularization: Penalizing barycentric displacement matrices via Schatten norm induces interpretable low-rank structure in pushforward maps (Maunu, 13 Oct 2025).
  • Domain adaptation: Low-rank or group-sparsity regularizers improve OT-based domain adaptation and class-preserving transport (Assel et al., 2023, Rahbar et al., 2019).
  • Robustness: ff-divergence and β\beta-potential regularizations increase robustness to outliers or noise in transport problems (Nakamura et al., 2022).
  • High-dimensional statistical learning: Schatten and similar regularizers yield plans and maps with controlled statistical complexity, mitigating the curse of dimensionality (Paty et al., 2019).

7. Summary and Outlook

Optimal transport-based regularization, and in particular the Schatten‑pp norm framework, provides a principled machinery to induce low-dimensional, interpretable, and robust structure in transport plans and transport-induced maps. Its convexity enables explicit optimality analysis and tractable algorithms that scale to thousands of points. These methods yield rigorous recovery guarantees in clustered and low-dimensional regimes, with empirical efficacy for both synthetic and large-scale biological data (Maunu, 13 Oct 2025). The approach unifies and generalizes a wide array of regularization architectures, with ongoing research extending these principles to more complex and high-dimensional applications across machine learning, computational biology, and signal processing.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Optimal Transport-Based Regularization.