Stream-SW: Streaming Sliced Wasserstein

Updated 14 November 2025

Streaming Sliced Wasserstein (Stream-SW) is a method that leverages quantile sketches and randomized projections to compute the sliced Wasserstein distance on streaming data.
It reduces high-dimensional optimal transport problems to a series of one-dimensional quantile queries, enabling single-pass processing with fixed memory.
The framework offers theoretical error bounds, improved convergence, and empirical advantages over random subsampling in diverse applications.

Streaming Sliced Wasserstein (Stream-SW) is a computational framework for estimating the sliced Wasserstein (SW) distance between probability distributions when samples arrive in a streaming fashion. It builds on quantile sketching techniques for 1D Wasserstein computation and extends them via randomized projections to provide a memory-efficient, single-pass algorithm for high-dimensional optimal transport problems. Stream-SW offers theoretical guarantees on accuracy and resource consumption, and demonstrates marked advantages over random subsampling approaches in a variety of empirical settings.

1. Streaming 1D-Wasserstein Distance via Quantile Sketches

Sliced Wasserstein methods reduce a $d$ -dimensional optimal transport problem to a collection of one-dimensional projection problems. The 1D $p$ -Wasserstein distance between empirical measures

$\mu_n = \frac{1}{n}\sum_{i=1}^n \delta_{x_i}, \quad \nu_m = \frac{1}{m}\sum_{j=1}^m \delta_{y_j}$

admits the closed form

$W_p^p(\mu_n, \nu_m) = \int_0^1 |F^{-1}_{\mu_n}(q) - F^{-1}_{\nu_m}(q)|^p\, dq,$

where $F^{-1}_{\mu_n}$ denotes the quantile function.

In a streaming context, storage of all samples is not feasible. Instead, Stream-SW maintains a quantile sketch $S_{\mu_n,k}$ of fixed size $k$ , supporting approximate quantile queries $Q(q; S)$ such that $|Q(q; S)-F^{-1}_{\hat\mu}(q)| \le \epsilon n C$ , with $C$ the maximal sample gap. The KKL-sketch of Karnin–Lang–Liberty supports one-pass updates with $O((1/\epsilon)\sqrt{\log(1/\delta)} + \log(n/k))$ memory.

Given two such sketches, the streaming 1D-Wasserstein estimator is

$\widetilde W_p^p(\mu_n, \nu_m; S_{\mu_n,k_1}, S_{\nu_m,k_2}) = \int_0^1 |Q(q; S_{\mu_n, k_1}) - Q(q; S_{\nu_m, k_2})|^p dq.$

For $p=1$ , this reduces to an integral over the absolute difference of quantile queries. Sketches are incrementally updated in $O(\log(n/k))$ amortized time per sample.

2. Streaming Sliced Wasserstein Algorithm

The sliced Wasserstein distance of order $p$ for $d$ -dimensional measures $\mu$ and $\nu$ is

$SW_p^p(\mu, \nu) = \mathbb{E}_{\theta \sim \mathcal{U}(\mathbb{S}^{d-1})} [W_p^p(\theta \sh \mu,\, \theta \sh \nu)],$

where $\mathbb{S}^{d-1}$ is the unit sphere and $\theta \sh \mu$ is the projection of $\mu$ onto direction $\theta$ .

Streaming Sliced Wasserstein (Stream-SW) replaces each $W_p$ with $\widetilde W_p$ from quantile sketches. The algorithm proceeds as follows:

Select $L$ projection directions $\theta_1, ..., \theta_L \in \mathbb{S}^{d-1}$ .
For each direction $\ell=1,...,L$ and each incoming $x_i$ from $\mu$ 's stream, compute $\theta_\ell^\top x_i$ and update $S_{\theta_\ell \sh \mu, k_1}$ . Similarly for $\nu$ .
At any point, estimate the SW distance by

$\widehat{\widetilde{SW}_p^p}(\mu_n, \nu_m; k_1, k_2, L) = \frac{1}{L} \sum_{\ell=1}^L \int_0^1 |Q(q; S_{\theta_\ell \sh \mu, k_1}) - Q(q; S_{\theta_\ell \sh \nu, k_2})|^p dq.$

Stream-SW thereby enables a single-pass, memory-bounded estimate of the SW distance at any time.

3. Theoretical Guarantees and Complexity

Stream-SW provides explicit, nonasymptotic bounds on both memory usage and approximation error:

Streaming 1DW error: If the supports have diameter $R$ and sketch precisions $\epsilon_1,\epsilon_2$ , the error satisfies

$|\widetilde W_p^p(\mu_n, \nu_m) - W_p^p(\mu_n, \nu_m)| \le p R^{p-1} (\epsilon_1 n C_1 + \epsilon_2 m C_2).$

SW population-level error: For i.i.d. samples from $\mu, \nu$ , the expected error in SW is

$\mathbb{E} |\widetilde{SW}_p^p(\mu_n, \nu_m) - SW_p^p(\mu, \nu)| \le C_{p,R} \left(\alpha(d, n, m) + \epsilon_1 n + \epsilon_2 m\right)$

with $\alpha(d,n,m) = \sqrt{d+1}\left(\sqrt{\frac{\log n}{n}} + \sqrt{\frac{\log m}{m}}\right)$ .

Monte Carlo error in $L$ projections:

$\mathbb{E}|\widehat{\widetilde{SW}_p^p} - \widetilde{SW}_p^p| \le \frac{\operatorname{Var}^{1/2}[\,\widetilde{W}_p^p(\theta \sh \mu_n, \theta \sh \nu_m)\,]}{\sqrt L}.$

Memory and computational complexity: Each KKL-sketch uses $k=O((1/\epsilon)\sqrt{\log(1/\delta)})$ space. With $2L$ sketches and direction storage, total space is $O(Lk + L\log(n/k) + Ld)$ ; per-sample update is $O(L\log(n/k) + Ld)$ .

Stream-SW thus achieves $O(n^{-1/2} + k^{-1} + L^{-1/2})$ error rates with memory and time scalable in $L$ and $k$ .

4. Proof Outline of Approximation Bounds

The approximation guarantees derive from three elements:

Quantile approximation: The sketch replaces the exact quantile function; a Taylor or Hölder expansion delivers a pointwise error proportional to the sketch precision and window width ( $O(\epsilon_1 n C_1 + \epsilon_2 m C_2)$ ).
Projection averaging: The error is averaged over random $\theta$ , yielding the same order bound for the SW aggregate.
Sampling error decomposition: The estimator's discrepancy with population SW separates into (i) sketching error and (ii) statistical sampling error, with the latter analyzed via empirical process (VC) bounds over half-spaces $\{x : \theta^\top x \leq z\}$ .
Monte Carlo in $L$ : Standard MC variance bound applies for the projection average, yielding $O(1/\sqrt L)$ behavior.

5. Empirical Evaluation

Stream-SW demonstrates favorable empirical properties across several domains:

Task	Key Finding	Comparison
Mixtures of Gaussians	2×–10× lower error than subsample-SW at same $k$	10×–100× fewer points retained
Point-cloud classification	Stream-SW( $k=20$ ) achieves 76.3–77.7% vs full SW 77.3–77.7% accuracy; subsample-SW ( $k=20$ ): 67.7–68.0%	ModelNet10, KNN ( $K=5$ )
Gradient flows	Faster convergence: $W_2^2\approx 26.9$ vs subsample-SW 31.1 at step 1000; only Stream-SW converges in 5000 steps	Euler–Maruyama, 1000 particles
Change-point detection	Detection delay reduced to 10–32 frames (SW sliding window: 49–100)	MSRC-12 Kinect

Stream-SW thus attains higher accuracy or faster convergence at fixed memory compared to uniform random subsampling approaches, particularly under severe memory bottlenecks.

6. Implementation, Tuning, and Extensions

Key practical considerations when deploying Stream-SW include:

Choosing number of projections $L$ and sketch size $k$ : Increasing $L$ reduces MC error as $O(1/\sqrt L)$ , whereas increasing $k$ strengthens quantile accuracy as $O(1/k)$ . Total memory and per-sample update cost scale linearly with $L$ and logarithmically with $n/k$ .
Projection schemes: Replace standard MC by quasi-MC sequences (e.g., Sobol, Halton) or optimized directions for sphere integration, as in quasi-Monte Carlo for 3D SW.
Handling asymmetric streams: In scenarios where only one distribution is streaming, maintain the sketch only for the streaming distribution and compare on-the-fly to the fixed other.
Extensions: The approach adapts to generalized sliced OT, spherical or manifold-projected SW, and partial-SW, by substituting the appropriate 1D streaming OT solver.

In sum, Stream-SW is the first single-pass, low-memory methodology for sliced Wasserstein distance estimation from sample streams, with rigorous finite-sample and memory–error guarantees, and empirically outperforms random subsampling algorithms under tight resource constraints (Nguyen, 11 May 2025).

PDF Markdown Chat (Pro)

References (1)

Streaming Sliced Optimal Transport (2025)

Follow Topic

Get notified by email when new papers are published related to Streaming Sliced Wasserstein (Stream-SW).