Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sliced-Regularized Optimal Transport

Published 27 Apr 2026 in stat.ML and cs.LG | (2604.23944v2)

Abstract: We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

Authors (1)

Summary

  • The paper presents SROT as an innovative formulation that leverages a sliced OT (SOT) prior to achieve higher fidelity to exact optimal transport while maintaining Sinkhorn scalability.
  • Empirical results on synthetic datasets, color transfer, and gradient flows demonstrate that SROT consistently reduces bias compared to traditional entropic OT under varying regularization.
  • The study offers deep theoretical insights and practical algorithms for integrating informative priors in OT, setting the stage for advances in generative modeling and statistical inference.

Sliced-Regularized Optimal Transport: Formulation, Theory, and Empirical Evaluation

Motivation and Problem Context

Optimal transport (OT) and its computational relaxations—most notably entropic OT (EOT)—have become foundational for mathematical modeling across machine learning, statistics, and computer vision. EOT leverages entropic regularization which, by penalizing deviation from the independent coupling, enables efficient computation via the Sinkhorn algorithm and yields the Sinkhorn divergence, a metric with favorable statistical and computational properties. However, the independent coupling prior used in EOT is typically non-informative with respect to the structure of the true OT plan, leading to regularization-induced bias especially at moderate ε\varepsilon. While recent extensions (quadratic, sparse, or low-rank regularization) encode desirable inductive biases, they do not explicitly favor plans close to the true OT solution and can thus degrade transport geometry.

The paper "Sliced-Regularized Optimal Transport" (2604.23944) addresses this limitation by proposing a novel regularized OT formulation—Sliced-Regularized Optimal Transport (SROT)—which regularizes towards a sliced OT (SOT) plan, a proxy for the true OT plan derived from aggregating one-dimensional projections. SROT replaces the independent prior with a smoothened SOT plan, aiming for higher fidelity to the exact OT geometry at equivalent regularization strengths, while retaining Sinkhorn scalability.

Formulation and Theoretical Foundations

SROT modifies the classical entropic regularization framework by introducing a SOT plan πSOT\pi^{\mathrm{SOT}} as the KL prior for regularization, leading to the optimization:

πε,SOT=argminπΠ(μ,ν)c(x,y)dπ(x,y)+εKL(ππSOT)\pi^\star_{\varepsilon,\mathrm{SOT}} = \arg\min_{\pi \in \Pi(\mu, \nu)} \int c(x, y)\, d\pi(x,y) + \varepsilon\,\mathrm{KL}\left(\pi \mid \pi^{\mathrm{SOT}}\right)

As ε0\varepsilon \to 0, SROT recovers the exact OT; as ε\varepsilon \to \infty, it converges to the SOT plan. To ensure full support, the prior is practical as a convex combination of the SOT plan and the independent coupling, though empirical evidence supports success with the pure SOT prior in most cases.

The dual formulation mirrors EOT, but integrates with respect to the SOT reference coupling. The optimal plan admits the closed-form:

dπε(x,y)=exp(f(x)+g(y)c(x,y)ε)dπSOT(x,y)d\pi^\star_\varepsilon(x, y) = \exp\left(\frac{f^\star(x) + g^\star(y) - c(x,y)}{\varepsilon}\right) d\pi^{\mathrm{SOT}}(x,y)

where (f,g)(f^\star, g^\star) are dual potentials.

The SROT divergence, defined analogously to the Sinkhorn divergence, is constructed as:

Sε,SOT(μ,ν)=OTε,SOT(μ,ν)12OTε,SOT(μ,μ)12OTε,SOT(ν,ν)\mathcal{S}_{\varepsilon,\mathrm{SOT}}(\mu, \nu) = \mathrm{OT}_{\varepsilon,\mathrm{SOT}}(\mu, \nu) - \frac{1}{2} \mathrm{OT}_{\varepsilon,\mathrm{SOT}}(\mu, \mu) - \frac{1}{2} \mathrm{OT}_{\varepsilon,\mathrm{SOT}}(\nu, \nu)

The divergence is proved symmetric, non-negative, and metrizes weak convergence for any ε>0\varepsilon > 0, thus suitable for estimation, alignment, and generative modeling tasks.

Computational Algorithms and Sinkhorn Scaling

SROT retains the computational efficiency of Sinkhorn scaling. The SOT plan in the discrete setting is efficiently constructed via sorting and aggregation across pre-sampled projections. The matrix scaling steps are structurally identical to EOT, except with the reference coupling replaced:

K=PSOTexp(C/ε)K = P^{\mathrm{SOT}} \odot \exp(-C/\varepsilon)

with scaling iterations for marginals and the final plan:

πSOT\pi^{\mathrm{SOT}}0

The additional computational cost of constructing πSOT\pi^{\mathrm{SOT}}1 is negligible relative to Sinkhorn iterations, even for moderate numbers of projections.

Empirical Results

Synthetic Datasets

The paper evaluates SROT against EOT and SOT across several synthetic probability distributions—half moon, 8 Gaussians, and two rings—visualizing and quantifying transport plan fidelity relative to exact OT. Figure 1

Figure 2: Visualization of transportation plans from OT, SOT, EOT, and SROT for synthetic datasets.

SROT consistently yields transport plans closer to the exact OT than EOT at comparable levels of regularization, with superior robustness even when the SOT prior itself deviates significantly from OT (notably in the 8 Gaussians case). Figure 3

Figure 3

Figure 3

Figure 3

Figure 4: Ablation studies of varying the regularization strengths (πSOT\pi^{\mathrm{SOT}}2) and Sinkhorn iterations (πSOT\pi^{\mathrm{SOT}}3).

Across variations in πSOT\pi^{\mathrm{SOT}}4 and Sinkhorn iteration counts, SROT demonstrates tighter approximation error versus OT. SOT serves as a stronger prior than the independent coupling; SROT maintains lower bias than EOT across nearly all tested hyperparameters.

Color Transfer

In image processing, color transfer tasks were benchmarked across 132 pairs. Each image is matched as a discrete weighted point cloud in RGB space. The mean πSOT\pi^{\mathrm{SOT}}5 errors to exact OT plans are lowest for SROT at all regularization strengths. Qualitative results show SROT plans yield visually more faithful transfers relative to EOT and SOT. Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 5

Figure 1: Color transfer results of OT, SOT, EOT, and SROT.

Gradient Flows

Gradient flows driven by Sinkhorn divergence or SROT divergence demonstrate practical advantages. Notably, for small πSOT\pi^{\mathrm{SOT}}6, SROT divergence induces weaker repulsion (πSOT\pi^{\mathrm{SOT}}7, πSOT\pi^{\mathrm{SOT}}8) than Sinkhorn; however, for larger πSOT\pi^{\mathrm{SOT}}9, SROT gradient flows converge faster in Wasserstein distance, confirming more effective bias correction throughout the flow. Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 6

Figure 3: Gradient flows of Sinkhorn divergence and SR divergence with Wasserstein distance as neutral evaluation metric.

Figure 7

Figure 7

Figure 7

Figure 7

Figure 5: Gradient flows of Sinkhorn divergence and SR divergence with Wasserstein distance as neutral evaluation metric.

Ablation: Number of Projections

Robustness to the number of projections used in SOT computation was established. Uniform SOT averaging is generally recommended, although alternatives (softmin, min cost) can yield incremental improvements for specific geometries. Figure 8

Figure 8

Figure 6: Ablation study of varying the number of projections πε,SOT=argminπΠ(μ,ν)c(x,y)dπ(x,y)+εKL(ππSOT)\pi^\star_{\varepsilon,\mathrm{SOT}} = \arg\min_{\pi \in \Pi(\mu, \nu)} \int c(x, y)\, d\pi(x,y) + \varepsilon\,\mathrm{KL}\left(\pi \mid \pi^{\mathrm{SOT}}\right)0.

Discussion, Implications, and Future Directions

SROT demonstrates that leveraging a more informative prior—in this case, the SOT plan—substantially improves the fidelity of regularized OT without sacrificing computational tractability. This approach directly addresses the main limitation of entropic regularization: bias towards non-informative independent couplings. The strong numerical results for color transfer and synthetic plan recovery establish SROT as preferable to EOT for OT approximation under moderate regularization. Furthermore, SROT divergence inherits favorable theoretical properties including symmetry, non-negativity, and metrization of weak convergence, opening its application for parameter estimation, generative modeling, and domain adaptation.

Key future directions include:

  • Statistical analysis of SROT: sample complexities, central limit behavior, and consistency properties in parameter estimation.
  • Optimization of SOT priors: learning projection distributions and aggregation strategies for different metric geometries to further enhance plan fidelity.
  • Extensions to unbalanced, partial, and semi-discrete OT settings, leveraging the modularity of the regularization framework.
  • Integration of SROT in scalable deep generative models and neural optimal transport architectures.

Conclusion

The SROT framework establishes a regularized OT paradigm driven by informative priors derived from sliced OT plans, achieving superior accuracy relative to EOT and robust scalability. Theoretical analysis and empirical validation support its adoption across OT-driven tasks, including generative modeling, statistical inference, and image processing. SROT's post-Bayesian interpretation, duality structures, and divergence properties ensure its theoretical and practical integration with classical and advanced OT methodology (2604.23944).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.