Double-Sliced Wasserstein Metric
- Double-Sliced Wasserstein is a metric that compares probability meta-measures using two sequential slicing operations, preserving the topology of the original Wasserstein-over-Wasserstein distance.
- It combines Euclidean projections and quantile-space slicing to achieve computational efficiency and numerical robustness in high-dimensional data analysis.
- Empirical evaluations demonstrate that DSW provides comparable discriminative power to WoW while accelerating computation and reducing sensitivity to unstable high-order moment estimation.
The Double-Sliced Wasserstein (DSW) metric is a recent development in the paper of optimal transport on spaces of probability measures, specifically designed as a computationally efficient and statistically robust surrogate for the Wasserstein-over-Wasserstein (WoW) distance between meta-measures. The DSW metric achieves speed and stability by combining traditional Euclidean slicing with an inner slicing in quantile function space, avoiding reliance on high-order moments or unstable operations. DSW is topologically equivalent to WoW on empirical meta-measures and empirically offers substantial speedups with comparable discriminative power for applications in dataset similarity, point-cloud analysis, and perceptual evaluation of images and shapes (Piening et al., 26 Sep 2025).
1. Meta-Measure Spaces and the Wasserstein-Over-Wasserstein Problem
Let be a Polish space and the set of Borel probability measures with finite second moment, equipped with the 2-Wasserstein distance,
A meta-measure is defined as , that is, a probability law over probability measures on . The Wasserstein-over-Wasserstein (WoW) metric lifts the distance to the meta-measure space: which is computationally prohibitive for large collections of distributions, especially in high-dimensions due to quadratic scaling in the number of inner measures.
2. Quantile Isometry and Functional Slicing
For measures on , the 1D 2-Wasserstein metric admits an isometry to , mapping a measure to its quantile function : This isometry underpins the functional optimal transport approach used in DSW. Sliced-Wasserstein distances on general Banach spaces make use of projections for , and for a probability measure on ,
This construction, under appropriate support conditions on , yields a true metric on .
In the specific setting of meta-measures on , the quantile map pushes to a law on , yielding a “sliced-quantile WoW” (SQW) metric,
3. Construction and Mathematical Formulation of Double-Sliced Wasserstein
The Double-Sliced Wasserstein metric is constructed through consecutive application of two slicing steps:
- Euclidean Slicing: For each , project every inner measure onto via , inducing a pushed-forward measure .
- Quantile-Space Slicing: For fixed , one obtains two 1D meta-measures $_{\theta\#}\alpha,\,_{\theta\#}\beta \in P_2(P_2(\mathbb{R}))$. Using a Gaussian process prior on (e.g., with an RBF kernel), the SQW distance between the meta-measures is
$SW(_{\theta\#}\alpha,\,_{\theta\#}\beta; \xi) = \left[ \int_{v \in L^2([0,1])} W_2^2\left(\pi_{v\#}q_\#(_{\theta\#}\alpha),\, \pi_{v\#}q_\#(_{\theta\#}\beta) \right) d\xi(v) \right]^{1/2}.$
- Aggregation: Integrate the inner SQW metric over to obtain the Double-Sliced Wasserstein: $DSW(\alpha, \beta; \xi) = \left[ \int_{S^{d-1}} SW^2(_{\theta\#}\alpha,\,_{\theta\#}\beta; \xi) dS^{d-1}(\theta) \right]^{1/2}.$
The full expansion writes:
For computation, inner integrals are estimated using Monte Carlo samples and Gaussian process paths.
4. Topological Properties and Equivalence with WoW
Let empirical meta-measures be composed of inner empirical measures, each with support points. The DSW metric is topologically equivalent to the WoW metric: for any positive Gaussian (Piening et al., 26 Sep 2025). The argument combines a discrete Cramér–Wold theorem at each slice and the quantile-space isometry. This ensures that DSW is a true metric and it preserves the geometry induced by WoW on the space of empirical meta-measures.
5. Computational Complexity and Numerical Stability
| Metric | Complexity per evaluation | Stability Considerations |
|---|---|---|
| WoW | Requires all pairwise inner 2-Wasserstein computations; slow for large and ; sensitive to moment estimation | |
| DSW | Only projections needed; relies on quantile functions (no high-order moments); numerically robust |
- For full WoW on meta-points of size , cost is dominated by an matrix of pairwise 2-Wasserstein computations, each in (entropic case).
- For DSW, sampling directions and computing 1D quantile-transport per meta-measure, total complexity is ; usually or constant.
- DSW avoids the unstable high-order moments used in s-OTDD, maintaining stability even with non-Gaussian or heavy-tailed meta-distributions.
6. Empirical Results and Applications
Experimental evaluations in (Piening et al., 26 Sep 2025) demonstrate that DSW achieves strong performance in several tasks:
- Shape classification via local distance distributions (mm-spaces): DSW matches accuracy of Gromov–Wasserstein and sliced GW (STLB), with an order of magnitude faster runtime.
- Dataset similarity (OTDD surrogate): On MNIST, Fashion-MNIST, and CIFAR-10 splits, DSW correlates with exact OTDD (Pearson ), outperforming s-OTDD in stability and speed.
- Point-cloud evaluation: For batches of 3D shapes modeled as meta-measures in , DSW matches OT-NNA and WoW in sensitivity but is 10–20 faster, providing similar robustness to mode collapse and sampling noise.
- Image perceptual distance: Image batches are represented as meta-measures on patch distributions. DSW defines a perceptual metric sensitive to qualitative similarity, aligns with standard fiducial metrics (e.g., Kernel Inception Distance), and is 40 faster than full WoW.
These results indicate that DSW yields operationally efficient metrics for meta-measure comparison without the compromises of parametric forms or unstable statistical estimators.
7. Significance and Prospects
Double-Sliced Wasserstein provides a tractable, mathematically principled metric for meta-level optimal transport problems, preserving the topology and discriminative power of WoW while mitigating prohibitive computational demands. Its combination of classical slicing and functional quantile-space slicing leverages both geometry and statistical properties of optimal transport. The approach is widely applicable to large-scale shape analysis, dataset comparison, and the evaluation of structured or hierarchical data distributions.
Pending open questions include: optimizing DSW kernel choices for task-adaptiveness, theoretical dual formulations for functional slicing, and extensions beyond empirical meta-measures to infinite or continuous families. A plausible implication is that DSW could serve as a foundation for scalable learning frameworks in high-level data spaces where conventional OT remains intractable, especially in large-dimensional and nonparametric distributional regimes (Piening et al., 26 Sep 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free