Double-Sliced Wasserstein (DSW) Metric
- Double-Sliced Wasserstein (DSW) Metric is a fully metric-based approach that sequentially slices meta-measures, addressing the limitations of the Wasserstein-over-Wasserstein metric.
- It leverages quantile isometry and Gaussian process–parametrized slicing in infinite-dimensional spaces to achieve computational efficiency and robust optimal transport.
- Empirical validation shows DSW achieves competitive matching, high accuracy, and significant speedups for applications in shape analysis, dataset comparison, and patch-based image similarity.
The double-sliced Wasserstein (DSW) metric is a scalable, robust, and fully metric-based approach for comparing meta-measures—distributions over distributions—on Euclidean spaces, particularly relevant for applications in shape analysis, dataset comparison, and patch-based image distances. DSW provides a principled and computationally tractable substitute for the Wasserstein-over-Wasserstein (WoW) metric, overcoming limitations of higher-order moment requirements and instability in existing approaches. By applying sequential slices—first in Euclidean space, then in function space—DSW leverages the isometry between univariate measures and quantile functions, and employs Gaussian process–parametrized slicing in infinite-dimensional function spaces.
1. Wasserstein-over-Wasserstein (WoW) and its Computational Limitations
Given a ground Polish space , the standard 2-Wasserstein distance defines a metric on , the space of Borel probability measures with finite second moment: Extending this to meta-measures , the WoW metric is defined as
Computation becomes prohibitive for empirical meta-measures with underlying empirical measures (each over points in ), since pairwise inner Wasserstein matrices are and the outer transport is , which is infeasible for large or .
2. Quantile Isometry and Sliced Optimal Transport in Banach Spaces
For measures on , the quantile map acts as an isometric embedding: This property motivates generalizing sliced optimal transport to Banach spaces. For a separable Banach space and direction , projections yield the sliced Wasserstein distance: When is infinite-dimensional, there is no uniform sphere; instead, is typically chosen as a Gaussian on .
3. Construction of the Double-Sliced Wasserstein Metric
The DSW metric applies two slicing steps for meta-measures in :
- Euclidean slicing: For each direction , measures are projected: . The meta-measure is consequently pushed forward, producing $\metaProj_{\theta\#}\boldsymbol{\mu}$.
- Functional slicing: Applying the quantile isometry to each constituent, the resulting meta-measure is now over . Slices (sampled via a Gaussian process) project to 1D empirical measures in . Wasserstein distance is computed on these projections.
The DSW metric is then
$DSW^2(\boldsymbol{\mu}, \boldsymbol{\nu}) = \int_{S^{d-1}}\int_{L^2(0,1)} W^2\bigl(\pi_{v\#}(q_\#\,\metaProj_{\theta\#}\boldsymbol{\mu})\,,\,\pi_{v\#}(q_\#\,\metaProj_{\theta\#}\boldsymbol{\nu});\,\mathbb{R}\bigr) d\xi(v)\,d\sigma_{S^{d-1}}(\theta)$
with the uniform measure and the Gaussian process law.
4. Equivalence and Properties on Discrete Meta-Measures
For empirical meta-measures and , DSW minimization yields the same optimal matchings (transport plans) as WoW minimization. The matching induced by DSW collapses to permutations (Cramér–Wold argument), ensuring plans only between identical support indices when for all . Thus, the computational procedure for DSW coincides—on discretized data—with the theoretically optimal WoW solution.
5. Algorithm, Complexity, and Implementation
Given two meta-measures each supported on empirical measures of points in :
- Loop over Euclidean directions , project all to ; sort to obtain their quantile functions.
- For each and function space slices (Gaussian process realizations), compute scalars , analogously.
- Sort the vectors and evaluate the 1D Wasserstein distance in .
- Aggregate the squared Wasserstein values across all slices, outputting
The total computational complexity is
where is the grid size for integration. For moderate , DSW achieves significant speedups over WoW, whose complexity is .
6. Theoretical and Topological Properties
- DSW is a metric on empirical meta-measures for suitable, full-support Gaussian and full angular integration.
- DSW metrizes the same topology as WoW on discrete meta-measures: iff .
- DSW is Lipschitz-stable with respect to changes in the outer meta-measure, up to constants based on the second moment of .
- Monte Carlo estimation achieves convergence rate , given sufficient outer and inner slices.
7. Empirical Validation and Applications
Experiments illustrate DSW's efficiency and fidelity relative to WoW across three domains:
| Application | DSW Runtime | WoW Runtime / Accuracy | DSW Accuracy / Correlation |
|---|---|---|---|
| Shape classification (K-NN, 2D/3D data) | 2 ms/pair | GW: 40 ms/pair; both ≈99% small | 42.7% ± 5.9 (FAUST-1000) |
| Dataset distance (MNIST, CIFAR-10 splits) | — | s-OTDD corr: 0.75–0.85 | corr(DSW, OTDD) = 0.90–0.95 |
| Patch-based image similarity | — | patch-WoW: 40× slower | Agreement with inception kernel |
On shape data, DSW matches GW for accuracy, running 20× faster. For dataset distances (e.g., OTDD replacement), DSW achieves 5–10× speedup over s-OTDD with higher correlation to ground-truth OTDD. As a patch-based image similarity metric, DSW closely tracks kernel-inception distance while being ~40× faster than patch-based WoW.
8. Context, Distinctions, and Related Works
DSW's construction is motivated by previous sliced Wasserstein approaches, which often rely on parametric meta-measures or high-order moments, creating numerical instability. The DSW approach circumvents these issues by operating exclusively via the quantile isometry and slicing in via Gaussian processes. It stands in contrast to single-level distributional slicing (e.g., SW, Max-SW, distributional SW as in v-DSW), which project only once and do not handle meta-measures. DSW is not a max-slice metric and retains strict metricity, avoiding the limitations of non-metric approximations commonly encountered in max-sliced or amortized distributional projection variants (Nguyen et al., 2023).
DSW thereby provides a general, mathematically principled, and scalable framework for optimal transport–based comparison of meta-distributions across a wide array of scientific and machine learning applications (Piening et al., 26 Sep 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free