Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 114 tok/s
Gemini 3.0 Pro 53 tok/s Pro
Gemini 2.5 Flash 132 tok/s Pro
Kimi K2 176 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Double-Sliced Wasserstein (DSW) Metric

Updated 12 November 2025
  • Double-Sliced Wasserstein (DSW) Metric is a fully metric-based approach that sequentially slices meta-measures, addressing the limitations of the Wasserstein-over-Wasserstein metric.
  • It leverages quantile isometry and Gaussian process–parametrized slicing in infinite-dimensional spaces to achieve computational efficiency and robust optimal transport.
  • Empirical validation shows DSW achieves competitive matching, high accuracy, and significant speedups for applications in shape analysis, dataset comparison, and patch-based image similarity.

The double-sliced Wasserstein (DSW) metric is a scalable, robust, and fully metric-based approach for comparing meta-measures—distributions over distributions—on Euclidean spaces, particularly relevant for applications in shape analysis, dataset comparison, and patch-based image distances. DSW provides a principled and computationally tractable substitute for the Wasserstein-over-Wasserstein (WoW) metric, overcoming limitations of higher-order moment requirements and instability in existing approaches. By applying sequential slices—first in Euclidean space, then in function space—DSW leverages the isometry between univariate measures and quantile functions, and employs Gaussian process–parametrized slicing in infinite-dimensional function spaces.

1. Wasserstein-over-Wasserstein (WoW) and its Computational Limitations

Given a ground Polish space X\mathcal{X}, the standard 2-Wasserstein distance defines a metric on P2(X)P_2(\mathcal{X}), the space of Borel probability measures with finite second moment: W(μ,ν;X)=infπΓ(μ,ν)(X×Xd2(x,y)dπ(x,y))1/2W(\mu, \nu; \mathcal{X}) = \inf_{\pi \in \Gamma(\mu,\nu)} \left( \iint_{\mathcal{X} \times \mathcal{X}} d^2(x,y) \, d\pi(x,y) \right)^{1/2} Extending this to meta-measures μ,νP2(P2(X))\boldsymbol{\mu}, \boldsymbol{\nu} \in P_2(P_2(\mathcal{X})), the WoW metric is defined as

WoW(μ,ν)=W(μ,ν;P2(X))\mathrm{WoW}(\boldsymbol{\mu}, \boldsymbol{\nu}) = W(\boldsymbol{\mu}, \boldsymbol{\nu}; P_2(\mathcal{X}))

Computation becomes prohibitive for empirical meta-measures with NN underlying empirical measures (each over nn points in Rd\mathbb{R}^d), since pairwise inner Wasserstein matrices are O(N2n2logn)\mathcal{O}(N^2 n^2 \log n) and the outer transport is O(N3logN)\mathcal{O}(N^3\log N), which is infeasible for large NN or nn.

2. Quantile Isometry and Sliced Optimal Transport in Banach Spaces

For measures on R\mathbb{R}, the quantile map Qμ:(0,1)RQ_\mu: (0,1) \to \mathbb{R} acts as an isometric embedding: W2(μ,ν;R)=01Qμ(s)Qν(s)2ds    QμQνL2(0,1)=W(μ,ν)W^2(\mu,\nu; \mathbb{R}) = \int_0^1 | Q_\mu(s) - Q_\nu(s) |^2 ds \implies \| Q_\mu - Q_\nu \|_{L^2(0,1)} = W(\mu, \nu) This property motivates generalizing sliced optimal transport to Banach spaces. For a separable Banach space UU and direction vUv \in U^*, projections πv:UR\pi_v: U \to \mathbb{R} yield the sliced Wasserstein distance: SW(μ,ν;ξ)=(UW2(πv#μ,πv#ν;R)dξ(v))1/2SW(\mu, \nu; \xi) = \left( \int_{U^*} W^2( \pi_{v\#}\mu, \pi_{v\#}\nu; \mathbb{R}) d\xi(v) \right)^{1/2} When UU is infinite-dimensional, there is no uniform sphere; instead, ξ\xi is typically chosen as a Gaussian on UU^*.

3. Construction of the Double-Sliced Wasserstein Metric

The DSW metric applies two slicing steps for meta-measures in P2(P2(Rd))P_2(P_2(\mathbb{R}^d)):

  1. Euclidean slicing: For each direction θSd1\theta \in S^{d-1}, measures μP2(Rd)\mu \in P_2(\mathbb{R}^d) are projected: μπθ#μP2(R)\mu \mapsto \pi_{\theta\#}\mu \in P_2(\mathbb{R}). The meta-measure μ\boldsymbol{\mu} is consequently pushed forward, producing $\metaProj_{\theta\#}\boldsymbol{\mu}$.
  2. Functional slicing: Applying the quantile isometry q:P2(R)L2(0,1)q: P_2(\mathbb{R}) \to L^2(0,1) to each constituent, the resulting meta-measure is now over L2(0,1)L^2(0,1). Slices vL2(0,1)v \in L^2(0,1) (sampled via a Gaussian process) project to 1D empirical measures in R\mathbb{R}. Wasserstein distance is computed on these projections.

The DSW metric is then

$DSW^2(\boldsymbol{\mu}, \boldsymbol{\nu}) = \int_{S^{d-1}}\int_{L^2(0,1)} W^2\bigl(\pi_{v\#}(q_\#\,\metaProj_{\theta\#}\boldsymbol{\mu})\,,\,\pi_{v\#}(q_\#\,\metaProj_{\theta\#}\boldsymbol{\nu});\,\mathbb{R}\bigr) d\xi(v)\,d\sigma_{S^{d-1}}(\theta)$

with σSd1\sigma_{S^{d-1}} the uniform measure and ξ\xi the Gaussian process law.

4. Equivalence and Properties on Discrete Meta-Measures

For empirical meta-measures μ=1Niδμi\boldsymbol{\mu}=\frac1N\sum_i\delta_{\mu_i} and ν=1Njδνj\boldsymbol{\nu}=\frac1N\sum_j\delta_{\nu_j}, DSW minimization yields the same optimal matchings (transport plans) as WoW minimization. The matching induced by DSW collapses to permutations (Cramér–Wold argument), ensuring plans only between identical support indices when μiνj\mu_i \neq \nu_j for all (i,j)(i,j). Thus, the computational procedure for DSW coincides—on discretized data—with the theoretically optimal WoW solution.

5. Algorithm, Complexity, and Implementation

Given two meta-measures each supported on NN empirical measures of nn points in Rd\mathbb{R}^d:

  • Loop over SθS_\theta Euclidean directions θsUniform(Sd1)\theta_s \sim \text{Uniform}(S^{d-1}), project all μi,νj\mu_i,\nu_j to R\mathbb{R}; sort to obtain their quantile functions.
  • For each θs\theta_s and SvS_v function space slices vs,tv_{s,t} (Gaussian process realizations), compute scalars ai=01Qμiθs(u)vs,t(u)dua_i = \int_0^1 Q^{\theta_s}_{\mu_i}(u) v_{s,t}(u) du, bjb_j analogously.
  • Sort the vectors {ai},{bj}\{a_i\}, \{b_j\} and evaluate the 1D Wasserstein distance W(a,b)W(a, b) in O(NlogN)\mathcal{O}(N \log N).
  • Aggregate the squared Wasserstein values across all slices, outputting

DSW=1SθSvs,tW(a,b)2DSW = \sqrt{ \frac{1}{S_\theta S_v} \sum_{s,t} W(a,b)^2 }

The total computational complexity is

O(Sθ(Nn+NlogN+Sv(NR+NlogN)))\mathcal{O}\left(S_\theta (N n + N\log N + S_v (N R + N\log N)) \right)

where RR is the grid size for L2L^2 integration. For moderate Sθ,SvN,nS_\theta, S_v \ll N, n, DSW achieves significant speedups over WoW, whose complexity is O(N2n2logn)\mathcal{O}(N^2 n^2 \log n).

6. Theoretical and Topological Properties

  • DSW is a metric on empirical meta-measures for suitable, full-support Gaussian ξ\xi and full angular integration.
  • DSW metrizes the same topology as WoW on discrete meta-measures: DSW(μn,μ)0DSW(\boldsymbol{\mu}_n, \boldsymbol{\mu}) \to 0 iff WoW(μn,μ)0WoW(\boldsymbol{\mu}_n, \boldsymbol{\mu}) \to 0.
  • DSW is Lipschitz-stable with respect to changes in the outer meta-measure, up to constants based on the second moment of ξ\xi.
  • Monte Carlo estimation achieves convergence rate DSW^2DSW2=OP(1/SθSv)|\hat{DSW}^2 - DSW^2| = \mathcal{O}_\mathbb{P}(1/\sqrt{S_\theta S_v}), given sufficient outer and inner slices.

7. Empirical Validation and Applications

Experiments illustrate DSW's efficiency and fidelity relative to WoW across three domains:

Application DSW Runtime WoW Runtime / Accuracy DSW Accuracy / Correlation
Shape classification (K-NN, 2D/3D data) 2 ms/pair GW: 40 ms/pair; both ≈99% small 42.7% ± 5.9 (FAUST-1000)
Dataset distance (MNIST, CIFAR-10 splits) s-OTDD corr: 0.75–0.85 corr(DSW, OTDD) = 0.90–0.95
Patch-based image similarity patch-WoW: 40× slower Agreement with inception kernel

On shape data, DSW matches GW for accuracy, running 20× faster. For dataset distances (e.g., OTDD replacement), DSW achieves 5–10× speedup over s-OTDD with higher correlation to ground-truth OTDD. As a patch-based image similarity metric, DSW closely tracks kernel-inception distance while being ~40× faster than patch-based WoW.

DSW's construction is motivated by previous sliced Wasserstein approaches, which often rely on parametric meta-measures or high-order moments, creating numerical instability. The DSW approach circumvents these issues by operating exclusively via the L2L^2 quantile isometry and slicing in L2(0,1)L^2(0,1) via Gaussian processes. It stands in contrast to single-level distributional slicing (e.g., SW, Max-SW, distributional SW as in v-DSW), which project only once and do not handle meta-measures. DSW is not a max-slice metric and retains strict metricity, avoiding the limitations of non-metric approximations commonly encountered in max-sliced or amortized distributional projection variants (Nguyen et al., 2023).

DSW thereby provides a general, mathematically principled, and scalable framework for optimal transport–based comparison of meta-distributions across a wide array of scientific and machine learning applications (Piening et al., 26 Sep 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Double-Sliced Wasserstein (DSW) Metric.