Double-Sliced Wasserstein Metric

Updated 11 November 2025

Double-Sliced Wasserstein is a metric that compares probability meta-measures using two sequential slicing operations, preserving the topology of the original Wasserstein-over-Wasserstein distance.
It combines Euclidean projections and quantile-space slicing to achieve computational efficiency and numerical robustness in high-dimensional data analysis.
Empirical evaluations demonstrate that DSW provides comparable discriminative power to WoW while accelerating computation and reducing sensitivity to unstable high-order moment estimation.

The Double-Sliced Wasserstein (DSW) metric is a recent development in the paper of optimal transport on spaces of probability measures, specifically designed as a computationally efficient and statistically robust surrogate for the Wasserstein-over-Wasserstein (WoW) distance between meta-measures. The DSW metric achieves speed and stability by combining traditional Euclidean slicing with an inner slicing in quantile function space, avoiding reliance on high-order moments or unstable operations. DSW is topologically equivalent to WoW on empirical meta-measures and empirically offers substantial speedups with comparable discriminative power for applications in dataset similarity, point-cloud analysis, and perceptual evaluation of images and shapes (Piening et al., 26 Sep 2025).

1. Meta-Measure Spaces and the Wasserstein-Over-Wasserstein Problem

Let $\mathcal{X}$ be a Polish space and $P_2(\mathcal{X})$ the set of Borel probability measures with finite second moment, equipped with the 2-Wasserstein distance,

$W_2(\mu,\nu) = \left(\inf_{\pi\in\Gamma(\mu,\nu)} \int_{\mathcal{X}^2} d^2(x,x')\,d\pi(x,x')\right)^{1/2}.$

A meta-measure is defined as $\alpha\in P_2\bigl(P_2(\mathcal{X})\bigr)$ , that is, a probability law over probability measures on $\mathcal{X}$ . The Wasserstein-over-Wasserstein (WoW) metric lifts the $W_2$ distance to the meta-measure space: $\mathrm{WoW}(\alpha, \beta) = \left[ \inf_{\Pi \in \Gamma(\alpha,\beta)} \int_{P_2(\mathcal{X}) \times P_2(\mathcal{X})} W_2^2(\mu,\nu)\, d\Pi(\mu,\nu) \right]^{1/2},$ which is computationally prohibitive for large collections of distributions, especially in high-dimensions due to quadratic scaling in the number of inner measures.

2. Quantile Isometry and Functional Slicing

For measures on $\mathbb{R}$ , the 1D 2-Wasserstein metric admits an isometry to $L^2([0,1])$ , mapping a measure $\mu$ to its quantile function $Q_\mu(s)= \inf\{x:\mu(-\infty,x]\geq s\}$ : $W_2(\mu,\nu;\mathbb{R}) = \left[ \int_0^1 |Q_\mu(s)-Q_\nu(s)|^2 ds \right]^{1/2} = \|Q_\mu-Q_\nu\|_{L^2([0,1])}.$ This isometry underpins the functional optimal transport approach used in DSW. Sliced-Wasserstein distances on general Banach spaces $U$ make use of projections $\pi_v(x)=\langle v, x\rangle$ for $v\in U^*$ , and for a probability measure $\xi$ on $U^*$ ,

$SW(\mu,\nu; \xi) = \left[ \int_{v \in U^*} W_2^2(\pi_{v\#}\mu, \pi_{v\#}\nu; \mathbb{R})\, d\xi(v) \right]^{1/2}.$

This construction, under appropriate support conditions on $\xi$ , yields a true metric on $P_2(U)$ .

In the specific setting of meta-measures on $P_2(\mathbb{R})$ , the quantile map $q:\mu\to Q_\mu$ pushes $\alpha$ to a law $q_\#\alpha$ on $L^2([0,1])$ , yielding a “sliced-quantile WoW” (SQW) metric,

$SQW(\alpha, \beta; \xi) = SW(q_\#\alpha, q_\#\beta; \xi).$

3. Construction and Mathematical Formulation of Double-Sliced Wasserstein

The Double-Sliced Wasserstein metric is constructed through consecutive application of two slicing steps:

Euclidean Slicing: For each $\theta \in S^{d-1}$ , project every inner measure $\mu \in P_2(\mathbb{R}^d)$ onto $\mathbb{R}$ via $\pi_\theta(x) = \langle \theta, x \rangle$ , inducing a pushed-forward measure $\pi_{\theta\#}\mu$ .
Quantile-Space Slicing: For fixed $\theta$ , one obtains two 1D meta-measures $_{\theta\#}\alpha,\,_{\theta\#}\beta \in P_2(P_2(\mathbb{R}))$. Using a Gaussian process prior $\xi$ on $L^2([0,1])$ (e.g., with an RBF kernel), the SQW distance between the meta-measures is

$SW(_{\theta\#}\alpha,\,_{\theta\#}\beta; \xi) = \left[ \int_{v \in L^2([0,1])} W_2^2\left(\pi_{v\#}q_\#(_{\theta\#}\alpha),\, \pi_{v\#}q_\#(_{\theta\#}\beta) \right) d\xi(v) \right]^{1/2}.$

Aggregation: Integrate the inner SQW metric over $S^{d-1}$ to obtain the Double-Sliced Wasserstein: $DSW(\alpha, \beta; \xi) = \left[ \int_{S^{d-1}} SW^2(_{\theta\#}\alpha,\,_{\theta\#}\beta; \xi) dS^{d-1}(\theta) \right]^{1/2}.$

The full expansion writes: $DSW(\alpha, \beta) = \left\{ \int_{S^{d-1}} \int_{v \in L^2([0,1])} \int_{u \in [0,1]} \left\langle Q_{P_\theta\#\alpha}(u) - Q_{P_\theta\#\beta}(u), v(u) \right\rangle^2 du\, d\xi(v)\, dS^{d-1}(\theta) \right\}^{1/2}.$

For computation, inner integrals are estimated using Monte Carlo samples $\theta_s \in S^{d-1}$ and $v_{s,t}$ Gaussian process paths.

4. Topological Properties and Equivalence with WoW

Let empirical meta-measures $\alpha_n, \beta_n \in P^N(P^{\tilde n}(\mathbb{R}^d))$ be composed of $N$ inner empirical measures, each with $\tilde n$ support points. The DSW metric is topologically equivalent to the WoW metric: $DSW(\alpha_n, \beta_n; \xi) \to 0 \quad\Longleftrightarrow\quad \mathrm{WoW}(\alpha_n, \beta_n) \to 0$ for any positive Gaussian $\xi$ (Piening et al., 26 Sep 2025). The argument combines a discrete Cramér–Wold theorem at each slice $\theta$ and the quantile-space isometry. This ensures that DSW is a true metric and it preserves the geometry induced by WoW on the space of empirical meta-measures.

5. Computational Complexity and Numerical Stability

Metric	Complexity per evaluation	Stability Considerations
WoW	$\mathcal{O}(N^2 n^2\log n)$	Requires all pairwise inner 2-Wasserstein computations; slow for large $N$ and $n$ ; sensitive to moment estimation
DSW	$\mathcal{O}(S N n\log n)$	Only $S \ll N^2$ projections needed; relies on quantile functions (no high-order moments); numerically robust

For full WoW on $N$ meta-points of size $n$ , cost is dominated by an $N\times N$ matrix of pairwise 2-Wasserstein computations, each in $O(n^2\log n)$ (entropic case).
For DSW, sampling $S$ directions and computing 1D quantile-transport per meta-measure, total complexity is $O(S N n \log n)$ ; usually $S = O(N)$ or constant.
DSW avoids the unstable high-order moments used in s-OTDD, maintaining stability even with non-Gaussian or heavy-tailed meta-distributions.

6. Empirical Results and Applications

Experimental evaluations in (Piening et al., 26 Sep 2025) demonstrate that DSW achieves strong performance in several tasks:

Shape classification via local distance distributions (mm-spaces): DSW matches accuracy of Gromov–Wasserstein and sliced GW (STLB), with an order of magnitude faster runtime.
Dataset similarity (OTDD surrogate): On MNIST, Fashion-MNIST, and CIFAR-10 splits, DSW correlates with exact OTDD (Pearson $> 0.9$ ), outperforming s-OTDD in stability and speed.
Point-cloud evaluation: For batches of 3D shapes modeled as meta-measures in $P(P(\mathbb{R}^3))$ , DSW matches OT-NNA and WoW in sensitivity but is 10–20 $\times$ faster, providing similar robustness to mode collapse and sampling noise.
Image perceptual distance: Image batches are represented as meta-measures on patch distributions. DSW defines a perceptual metric sensitive to qualitative similarity, aligns with standard fiducial metrics (e.g., Kernel Inception Distance), and is 40 $\times$ faster than full WoW.

These results indicate that DSW yields operationally efficient metrics for meta-measure comparison without the compromises of parametric forms or unstable statistical estimators.

7. Significance and Prospects

Double-Sliced Wasserstein provides a tractable, mathematically principled metric for meta-level optimal transport problems, preserving the topology and discriminative power of WoW while mitigating prohibitive computational demands. Its combination of classical slicing and functional quantile-space slicing leverages both geometry and statistical properties of optimal transport. The approach is widely applicable to large-scale shape analysis, dataset comparison, and the evaluation of structured or hierarchical data distributions.

Pending open questions include: optimizing DSW kernel choices for task-adaptiveness, theoretical dual formulations for functional slicing, and extensions beyond empirical meta-measures to infinite or continuous families. A plausible implication is that DSW could serve as a foundation for scalable learning frameworks in high-level data spaces where conventional OT remains intractable, especially in large-dimensional and nonparametric distributional regimes (Piening et al., 26 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Slicing Wasserstein Over Wasserstein Via Functional Optimal Transport (2025)

Follow Topic

Get notified by email when new papers are published related to Double-sliced Wasserstein (DSW).