Sliced Optimal Transport (SOT)

Updated 19 August 2025

Sliced Optimal Transport (SOT) is a method that approximates high-dimensional optimal transport by projecting measures onto one-dimensional subspaces using closed-form 1D solutions.
Extensions such as tree-sliced, nonlinear, and unbalanced variants enhance SOT's adaptability to non-Euclidean structures and varying mass conditions.
SOT underpins efficient computations in kernels, gradient flows, and dataset comparisons, relying on robust sampling strategies to achieve scalability and speed.

Sliced Optimal Transport (SOT) is a class of methods designed to approximate optimal transport (OT) distances and plans between high-dimensional probability measures by leveraging closed-form solutions of the OT problem in one dimension. SOT achieves computational scalability by projecting measures onto one-dimensional subspaces, solving the resulting 1D OT problems, and aggregating these solutions across many slices. This framework underlies a variety of efficient metrics, kernels, and plan estimation techniques that have seen widespread adoption in machine learning, statistics, computational geometry, imaging, and computer vision.

1. Mathematical Foundations

At the core of SOT is the reduction of the high-dimensional OT problem to a set of analytically tractable one-dimensional problems. Given probability measures μ, ν on $\mathbb{R}^d$ (or more generally, on a Riemannian manifold), and for a fixed direction $\theta \in \mathbb{S}^{d-1}$ , SOT projects both measures onto the line via $x \mapsto \langle x, \theta \rangle$ , resulting in pushforward measures $\theta_\# \mu$ and $\theta_\# \nu$ on $\mathbb{R}$ . For each such projection, the 1D $p$ -Wasserstein distance is computed: $W_p^p(\theta_\# \mu, \theta_\# \nu) = \int_0^1 \left|F_\mu^{-1}(t) - F_\nu^{-1}(t)\right|^p dt,$ where $F_\mu^{-1}$ and $F_\nu^{-1}$ are the quantile functions of the projected measures.

The sliced $p$ -Wasserstein distance is then defined as an average (or expectation) over a family of directions, typically sampled from the uniform distribution on the unit sphere: $\mathrm{SW}_p^p(\mu, \nu) = \mathbb{E}_{\theta \sim \sigma} \left[ W_p^p(\theta_\# \mu, \theta_\# \nu) \right].$ This reducing procedure is justified by the Radon transform, which ensures invertibility and injectivity for probability measures under appropriate conditions (Kolouri et al., 2015).

2. Extensions and Generalizations

Slices Beyond Lines: Tree and Nonlinear Projections

The tree-sliced method generalizes the slicing idea, replacing 1D projections by structured tree metrics that capture higher-order topological relations in the data. Here, the OT distance is computed with respect to a random tree metric, which, owing to its unique path property, admits a closed-form solution: $W_{d_T}(\mu, \nu) = \sum_{e \in T} w_e \left| \mu(\Gamma(v_e)) - \nu(\Gamma(v_e)) \right|,$ where $w_e$ and $\Gamma(v_e)$ are the edge weights and subtrees of $T$ (Le et al., 2019). Averaging over random trees yields the tree-sliced Wasserstein (TSW) distance, which includes SW as a special case.

Nonlinear and manifold-aware projections (e.g., geodesics on Riemannian manifolds, spherical trees, or horospherical projections via the Busemann function) enable SOT to be defined on non-Euclidean spaces, preserving the intrinsic geometry of the data (Quellmalz et al., 2023, Bonet et al., 11 Mar 2024, Tran et al., 14 Mar 2025).

Unbalanced and Partial Sliced OT

Unbalanced SOT variants allow comparison of measures with different total masses, introducing marginal relaxation via $\varphi$ -divergences or partial mass transport. In such unbalanced settings, the SOT loss is defined by integrating one-dimensional unbalanced OT problems (such as the Hellinger-Kantorovich or Lagrangian penalty formulations) over the sphere of directions: $\mathrm{SUOT}(\alpha, \beta) = \int_\mathbb{S} \mathrm{UOT}(\theta_\# \alpha, \theta_\# \beta) d\sigma(\theta)$ (Bonet et al., 2023, Bai et al., 2022). Sliced Partial OT methods similarly average one-dimensional partial transport distances, which include explicit mass destruction costs.

3. Computational Methods and Sampling Strategies

SOT’s efficiency depends critically on the strategy for sampling projection directions. Uniform Monte Carlo sampling yields unbiased estimators with $O(M^{-1/2})$ convergence in the number of projections $M$ , robust to the ambient dimension (Sisouk et al., 4 Feb 2025). Quasi-Monte Carlo and deterministic low-discrepancy point sets can improve convergence rates in low-to-moderate dimensions.

The computational complexity for empirical measures is $O(L n \log n)$ for $L$ projections and $n$ samples, with each 1D OT solved by sorting (Kolouri et al., 2015). For streaming data, streaming quantile sketches allow approximation of 1D Wasserstein distances with provable error bounds and sublinear memory, resulting in the “streaming sliced Wasserstein” (Stream-SW) algorithm (Nguyen, 11 May 2025).

4. Transport Plan Estimation and Sliced Plans

Traditional SOT methods only yield distances. Recent advances address the lack of a transport plan by “lifting” 1D transport plans back to the ambient space. For each direction $\theta$ , the unique 1D OT plan $\pi_\theta$ (determined by quantile matching) is lifted to the high-dimensional space using the conditional measures determined by the fibers of the projection (Liu et al., 16 Oct 2024, Tanguy et al., 2 Aug 2025). The expected sliced transport (EST) plan averages these lifted plans over directions: $\bar{\gamma}^{\mu, \nu}(x, y) = \int_{\mathbb{S}^{d-1}} \gamma_\theta^{\mu, \nu}(x, y) d\sigma(\theta).$ The EST discrepancy,

$\mathcal{D}_p(\mu, \nu) = \left( \sum_{x, y} \|x-y\|^p \bar{\gamma}^{\mu, \nu}(x, y) \right)^{1/p},$

is a metric on finite discrete probability measures and converges to the Wasserstein distance in limiting regimes (Liu et al., 16 Oct 2024). Pivot Sliced Discrepancy and min-sliced SWGG select an optimal direction/minimum-slice and provide constrained Kantorovich formulations for enhanced plan estimation (Tanguy et al., 2 Aug 2025, Chapel et al., 28 May 2025).

5. Theoretical Properties and Limitations

Topological and Metric Properties

SOT-induced metrics (including SW and general max-sliced/Minkowski combinations) are shown to be complete and separable, and to generate the same topology as the classical Wasserstein metric (Kitagawa et al., 2023). However, except for trivial settings (e.g. $n=1$ or $p=1$ ), SOT metrics are not bi-Lipschitz equivalent to the full OT metric and often lack a geodesic structure, potentially distorting distances and impacting geometry-sensitive applications such as interpolation and barycenter computation.

Sample Complexity and Estimator Theory

SOT metrics exhibit favorable sample complexity. For empirical measures with $n$ samples, the estimation error of the SOT metric decays at a one-dimensional rate, assuming $q$ th moment conditions ( $q>p$ , $p$ being the OT order): $\mathbb{E}\left[ |\mathrm{DSW}_p(\mu_n, \nu_m ; \sigma) - \mathrm{DSW}_p(\mu, \nu ; \sigma)| \right] \lesssim (M_q(\mu)^{1/q} + M_q(\nu)^{1/q}) \cdot n^{-1/(2p)}$ for $q>2p$ , up to logarithmic factors. The associated Monte Carlo estimator for DSW is unbiased and converges at $O(L^{-1/2})$ in the number of projections $L$ (Nguyen, 17 Aug 2025).

6. Applications and Empirical Results

SOT is foundational for numerous algorithmic advances:

Kernel construction: SOT-based positive-definite kernels enable kernel SVMs, KPCA, and kernel $k$ -means clustering on probability measures, outperforming classical kernels in several learning tasks (Kolouri et al., 2015).
Gradient flows and barycenters: Particle-based schemes and neural parameterizations enable scalable Wasserstein gradient flows, variational inference, and clustering in Euclidean and manifold (spherical, hyperbolic) spaces (Bonet, 2023, Bonet et al., 11 Mar 2024, Chapel et al., 28 May 2025).
Unbalanced/partial matching: Robust matching under mass variation and outliers, with state-of-the-art results in point cloud registration and adaptation (Bai et al., 2022, Bonet et al., 2023).
Geometric data analysis: SOT-based metrics enable fast comparisons for shape analysis, mesh sampling (with blue noise guarantees), spherical/circular data analysis, and rotation synchronization, with new metrics such as spherical tree-sliced Wasserstein enhancing rotational invariance and topology sensitivity (Tran et al., 14 Mar 2025, Liu et al., 9 Nov 2024, Genest et al., 26 Feb 2024).
Dataset comparison: Sliced OT dataset distances (s-OTDD) provide fast, model-agnostic evaluation of data discrepancies, suitable for transfer learning and domain adaptation (Nguyen et al., 31 Jan 2025).

Empirical studies confirm the computational efficiency and scalability of SOT-based methods, with compressed memory and faster convergence in gradient-based methods compared to full OT.

7. Limitations and Future Directions

Although SOT is statistically efficient and computationally scalable, it is not generally a substitute for full OT in applications sensitive to precise metric or geodesic structures. The lack of bi-Lipschitz equivalence and absence of geodesics make SOT metrics unsuitable for applications relying on the finer geometric and convexity properties of the Wasserstein space, such as optimal interpolation and shape morphing (Kitagawa et al., 2023).

Ongoing research is expanding SOT in several directions:

Plan estimation and generalized slicing: Incorporating manifold projections, tree-based or nonlinear slicing, and refined lifting mechanisms to further improve the sharpness of plan estimation and adaptation to non-Euclidean structures.
Unbalanced and partial SOT: Developing robust methods for signals with missing or excess data, leveraging φ-divergences and dynamic slicing weights.
Streaming and distributed SOT: Enhancing large-scale, online deployment via quantile sketching and communication-efficient schemes (Nguyen, 11 May 2025).
Statistical optimality and estimator theory: Further paper of the limiting distributions and finite-sample deviations of SOT estimators, particularly in structured settings or with non-uniform slicing.
Applications in generative modeling: Leveraging plan-based SOT and barycentric schemes for conditional flow matching, generative flow models, and domain adaptation tasks with complex data geometry (Chapel et al., 28 May 2025).

In sum, SOT provides a principled and computationally effective toolkit for scalable high-dimensional transport, density comparison, and plan estimation, with broad applicability across statistical and geometric data analysis, provided the limitations regarding metric preservation are properly accounted for.