Double-Sliced Wasserstein (DSW) Metric

Updated 12 November 2025

Double-Sliced Wasserstein (DSW) Metric is a fully metric-based approach that sequentially slices meta-measures, addressing the limitations of the Wasserstein-over-Wasserstein metric.
It leverages quantile isometry and Gaussian process–parametrized slicing in infinite-dimensional spaces to achieve computational efficiency and robust optimal transport.
Empirical validation shows DSW achieves competitive matching, high accuracy, and significant speedups for applications in shape analysis, dataset comparison, and patch-based image similarity.

The double-sliced Wasserstein (DSW) metric is a scalable, robust, and fully metric-based approach for comparing meta-measures—distributions over distributions—on Euclidean spaces, particularly relevant for applications in shape analysis, dataset comparison, and patch-based image distances. DSW provides a principled and computationally tractable substitute for the Wasserstein-over-Wasserstein (WoW) metric, overcoming limitations of higher-order moment requirements and instability in existing approaches. By applying sequential slices—first in Euclidean space, then in function space—DSW leverages the isometry between univariate measures and quantile functions, and employs Gaussian process–parametrized slicing in infinite-dimensional function spaces.

1. Wasserstein-over-Wasserstein (WoW) and its Computational Limitations

Given a ground Polish space $\mathcal{X}$ , the standard 2-Wasserstein distance defines a metric on $P_2(\mathcal{X})$ , the space of Borel probability measures with finite second moment: $W(\mu, \nu; \mathcal{X}) = \inf_{\pi \in \Gamma(\mu,\nu)} \left( \iint_{\mathcal{X} \times \mathcal{X}} d^2(x,y) \, d\pi(x,y) \right)^{1/2}$ Extending this to meta-measures $\boldsymbol{\mu}, \boldsymbol{\nu} \in P_2(P_2(\mathcal{X}))$ , the WoW metric is defined as

$\mathrm{WoW}(\boldsymbol{\mu}, \boldsymbol{\nu}) = W(\boldsymbol{\mu}, \boldsymbol{\nu}; P_2(\mathcal{X}))$

Computation becomes prohibitive for empirical meta-measures with $N$ underlying empirical measures (each over $n$ points in $\mathbb{R}^d$ ), since pairwise inner Wasserstein matrices are $\mathcal{O}(N^2 n^2 \log n)$ and the outer transport is $\mathcal{O}(N^3\log N)$ , which is infeasible for large $N$ or $n$ .

2. Quantile Isometry and Sliced Optimal Transport in Banach Spaces

For measures on $\mathbb{R}$ , the quantile map $Q_\mu: (0,1) \to \mathbb{R}$ acts as an isometric embedding: $W^2(\mu,\nu; \mathbb{R}) = \int_0^1 | Q_\mu(s) - Q_\nu(s) |^2 ds \implies \| Q_\mu - Q_\nu \|_{L^2(0,1)} = W(\mu, \nu)$ This property motivates generalizing sliced optimal transport to Banach spaces. For a separable Banach space $U$ and direction $v \in U^*$ , projections $\pi_v: U \to \mathbb{R}$ yield the sliced Wasserstein distance: $SW(\mu, \nu; \xi) = \left( \int_{U^*} W^2( \pi_{v\#}\mu, \pi_{v\#}\nu; \mathbb{R}) d\xi(v) \right)^{1/2}$ When $U$ is infinite-dimensional, there is no uniform sphere; instead, $\xi$ is typically chosen as a Gaussian on $U^*$ .

3. Construction of the Double-Sliced Wasserstein Metric

The DSW metric applies two slicing steps for meta-measures in $P_2(P_2(\mathbb{R}^d))$ :

Euclidean slicing: For each direction $\theta \in S^{d-1}$ , measures $\mu \in P_2(\mathbb{R}^d)$ are projected: $\mu \mapsto \pi_{\theta\#}\mu \in P_2(\mathbb{R})$ . The meta-measure $\boldsymbol{\mu}$ is consequently pushed forward, producing $\metaProj_{\theta\#}\boldsymbol{\mu}$.
Functional slicing: Applying the quantile isometry $q: P_2(\mathbb{R}) \to L^2(0,1)$ to each constituent, the resulting meta-measure is now over $L^2(0,1)$ . Slices $v \in L^2(0,1)$ (sampled via a Gaussian process) project to 1D empirical measures in $\mathbb{R}$ . Wasserstein distance is computed on these projections.

The DSW metric is then

$DSW^2(\boldsymbol{\mu}, \boldsymbol{\nu}) = \int_{S^{d-1}}\int_{L^2(0,1)} W^2\bigl(\pi_{v\#}(q_\#\,\metaProj_{\theta\#}\boldsymbol{\mu})\,,\,\pi_{v\#}(q_\#\,\metaProj_{\theta\#}\boldsymbol{\nu});\,\mathbb{R}\bigr) d\xi(v)\,d\sigma_{S^{d-1}}(\theta)$

with $\sigma_{S^{d-1}}$ the uniform measure and $\xi$ the Gaussian process law.

4. Equivalence and Properties on Discrete Meta-Measures

For empirical meta-measures $\boldsymbol{\mu}=\frac1N\sum_i\delta_{\mu_i}$ and $\boldsymbol{\nu}=\frac1N\sum_j\delta_{\nu_j}$ , DSW minimization yields the same optimal matchings (transport plans) as WoW minimization. The matching induced by DSW collapses to permutations (Cramér–Wold argument), ensuring plans only between identical support indices when $\mu_i \neq \nu_j$ for all $(i,j)$ . Thus, the computational procedure for DSW coincides—on discretized data—with the theoretically optimal WoW solution.

5. Algorithm, Complexity, and Implementation

Given two meta-measures each supported on $N$ empirical measures of $n$ points in $\mathbb{R}^d$ :

Loop over $S_\theta$ Euclidean directions $\theta_s \sim \text{Uniform}(S^{d-1})$ , project all $\mu_i,\nu_j$ to $\mathbb{R}$ ; sort to obtain their quantile functions.
For each $\theta_s$ and $S_v$ function space slices $v_{s,t}$ (Gaussian process realizations), compute scalars $a_i = \int_0^1 Q^{\theta_s}_{\mu_i}(u) v_{s,t}(u) du$ , $b_j$ analogously.
Sort the vectors $\{a_i\}, \{b_j\}$ and evaluate the 1D Wasserstein distance $W(a, b)$ in $\mathcal{O}(N \log N)$ .
Aggregate the squared Wasserstein values across all slices, outputting

$DSW = \sqrt{ \frac{1}{S_\theta S_v} \sum_{s,t} W(a,b)^2 }$

The total computational complexity is

$\mathcal{O}\left(S_\theta (N n + N\log N + S_v (N R + N\log N)) \right)$

where $R$ is the grid size for $L^2$ integration. For moderate $S_\theta, S_v \ll N, n$ , DSW achieves significant speedups over WoW, whose complexity is $\mathcal{O}(N^2 n^2 \log n)$ .

6. Theoretical and Topological Properties

DSW is a metric on empirical meta-measures for suitable, full-support Gaussian $\xi$ and full angular integration.
DSW metrizes the same topology as WoW on discrete meta-measures: $DSW(\boldsymbol{\mu}_n, \boldsymbol{\mu}) \to 0$ iff $WoW(\boldsymbol{\mu}_n, \boldsymbol{\mu}) \to 0$ .
DSW is Lipschitz-stable with respect to changes in the outer meta-measure, up to constants based on the second moment of $\xi$ .
Monte Carlo estimation achieves convergence rate $|\hat{DSW}^2 - DSW^2| = \mathcal{O}_\mathbb{P}(1/\sqrt{S_\theta S_v})$ , given sufficient outer and inner slices.

7. Empirical Validation and Applications

Experiments illustrate DSW's efficiency and fidelity relative to WoW across three domains:

Application	DSW Runtime	WoW Runtime / Accuracy	DSW Accuracy / Correlation
Shape classification (K-NN, 2D/3D data)	2 ms/pair	GW: 40 ms/pair; both ≈99% small	42.7% ± 5.9 (FAUST-1000)
Dataset distance (MNIST, CIFAR-10 splits)	—	s-OTDD corr: 0.75–0.85	corr(DSW, OTDD) = 0.90–0.95
Patch-based image similarity	—	patch-WoW: 40× slower	Agreement with inception kernel

On shape data, DSW matches GW for accuracy, running 20× faster. For dataset distances (e.g., OTDD replacement), DSW achieves 5–10× speedup over s-OTDD with higher correlation to ground-truth OTDD. As a patch-based image similarity metric, DSW closely tracks kernel-inception distance while being ~40× faster than patch-based WoW.

DSW's construction is motivated by previous sliced Wasserstein approaches, which often rely on parametric meta-measures or high-order moments, creating numerical instability. The DSW approach circumvents these issues by operating exclusively via the $L^2$ quantile isometry and slicing in $L^2(0,1)$ via Gaussian processes. It stands in contrast to single-level distributional slicing (e.g., SW, Max-SW, distributional SW as in v-DSW), which project only once and do not handle meta-measures. DSW is not a max-slice metric and retains strict metricity, avoiding the limitations of non-metric approximations commonly encountered in max-sliced or amortized distributional projection variants (Nguyen et al., 2023).

DSW thereby provides a general, mathematically principled, and scalable framework for optimal transport–based comparison of meta-distributions across a wide array of scientific and machine learning applications (Piening et al., 26 Sep 2025).

PDF Markdown Chat (Pro)

References (2)

Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction (2023)

Slicing Wasserstein Over Wasserstein Via Functional Optimal Transport (2025)

Follow Topic

Get notified by email when new papers are published related to Double-Sliced Wasserstein (DSW) Metric.