Conditional Sliced-Wasserstein Flow

Updated 26 September 2025

Conditional Sliced-Wasserstein Flow is a nonparametric framework for conditional generative modeling that extends SWF using gradient flows on probability measures.
It decomposes high-dimensional optimal transport into one-dimensional transport problems via Monte Carlo projections, enabling efficient computation and practical application.
The framework offers strong theoretical guarantees and convergence properties, making it competitive with parametric models in advanced machine learning tasks.

Conditional Sliced-Wasserstein Flow (CSWF) is a nonparametric framework for conditional generative modeling based on optimal transport and the sliced-Wasserstein metric. CSWF generalizes Sliced-Wasserstein Flows (SWF) from unconditional to conditional scenarios, enabling the modeling, sampling, and alignment of conditional distributions via tractable gradient flows on probability measures. The methodology leverages the decomposition of high-dimensional optimal transport into many one-dimensional transport problems, facilitating efficient computation, theoretical analysis, and application to a range of machine learning tasks.

1. Foundations of Sliced-Wasserstein Flows and Conditional Extension

Sliced-Wasserstein Flows (SWF), as introduced by (Liutkus et al., 2018), interpret generative modeling as transporting a source probability measure $\mu$ to a target data distribution $\nu$ via gradient flow over probability spaces. The SWF algorithm minimizes a functional composed of the squared sliced-Wasserstein distance and a negative entropy regularizer: $\min_\mu \left\{ F_\lambda^{(\nu)}(\mu) = \frac{1}{2} \mathrm{SW}^2(\mu, \nu) + \lambda H(\mu) \right\}$ where $\mathrm{SW}^2$ is the sliced-Wasserstein distance—defined by integrating one-dimensional Wasserstein distances for random projection directions—and $H(\mu)$ is the negative entropy.

The algorithmic update for particles $X_k^i$ over iterations $k$ is computed via an Euler–Maruyama discretization of an associated stochastic differential equation (SDE). The drift at each iteration is estimated by Monte Carlo integration over randomly chosen projection directions, with the derivative of the Kantorovich potential computed between projected empirical and data distributions.

Conditional Sliced-Wasserstein Flow (CSWF), as formalized by (Du et al., 2023), extends SWF for conditional modeling. Instead of simulating flows independently for each conditional variable, CSWF constructs the joint flow such that the marginal over the conditioning variable remains consistent and the velocity field’s component corresponding to the conditioning variable is effectively zero if conditional distributions vary slowly in the condition. This enables unified simulation over the joint space $(x, y)$ , with conditional generation achieved by "masking out" updates in the $y$ direction.

2. Mathematical Formulation and Gradient Flow Structure

In unconditional SWF, the evolution of the density $\rho_t$ is governed by the Fokker–Planck continuity equation: $\partial_t \rho_t(x) = -\mathrm{div}\left[v_t(x)\rho_t(x)\right] + \lambda \Delta \rho_t(x)$ with drift

$v_t(x) = - \int_{S^{d-1}} \psi'_{t, \theta}(\langle x, \theta \rangle)\, \theta\, d\theta$

where $\psi$ is the Kantorovich potential for the 1D optimal transport.

For CSWF, an analogous continuity equation is defined for the joint density $p_t(x,y)$ : $\partial_t p_t(x, y) + \nabla \cdot (p_t(x, y) v_t(x, y)) = 0$ where the velocity field $v_t(x, y)$ is constructed over the joint space $\mathcal{X} \times \mathcal{Y}$ but with the $y$ -component masked (either by scaling or setting to zero), ensuring that only the $x$ -component flows while $y$ remains nearly stationary.

Particle-based simulation uses the Monte Carlo estimation of the velocity field over random projections. Two main algorithmic improvements for image data are:

Locally-connected projections: Restricting projections to patches in the data, simulating the local spatial inductive bias of CNNs.
Pyramidal schedules: Applying projections at multiple spatial resolutions in sequence, enabling coarse-to-fine modeling.

An "amplifier" parameter can be introduced to slow variation in $q(x|y)$ with respect to $y$ , facilitating the masking assumption.

3. Theoretical Guarantees and Stability Properties

Finite-time error guarantees for the discretized SWF (and by extension CSWF) are provided under assumptions on the drift’s Lipschitz continuity and dissipativity (Liutkus et al., 2018). The error of the empirical distribution relative to the target law can be made arbitrarily small by choosing a sufficiently small step size.

In terms of optimization, (Vauthier et al., 10 Feb 2025) establishes that the critical points of the sliced-Wasserstein functional correspond to barycentric Lagrangian stationary points, and demonstrates via perturbation analysis that stable critical points cannot concentrate on low-dimensional segments. This ensures that CSWF will not stall at degenerate solutions and that standard choices of step size and projection counts yield robust and well-behaved optimization.

Convergence of stochastic optimization algorithms (e.g., SGD) applied to sliced-Wasserstein-based losses is established in neural network settings (Tanguy, 2023), guaranteeing that the iterates approach the set of generalized critical points under mild regularity and boundedness assumptions.

4. Extensions: Adaptive Slicing, Manifold Geometry, and Generalized Flows

Energy-based slicing distributions (Nguyen et al., 2023) introduce adaptivity by selecting projection directions based on an energy function of their 1D Wasserstein cost, which can be naturally made conditional. This leads to an EBSW distance: $\mathrm{EBSW_p}(\mu, \nu; f) = \left( \mathbb{E}_{\theta \sim \sigma_{\mu, \nu}(f, p)} [W_p^p(\theta_\# \mu, \theta_\# \nu)] \right)^{1/p}$ where $\sigma_{\mu, \nu}$ is the energy-based slicing distribution, which can be conditioned for CSWF applications.

For data on manifolds, (Bonet et al., 11 Mar 2024) proposes sliced-Wasserstein distances defined via geodesic or horospherical projections on Cartan–Hadamard manifolds. The gradient flow structure persists, with the velocity field computed via projected Kantorovich potential derivatives, and conditional slicing operators can be defined to respect both manifold geometry and external conditioning.

(Chapel et al., 28 May 2025) presents differentiable generalized sliced Wasserstein plans, formulating the task of finding optimal projections (linear or nonlinear) as a bilevel optimization problem. The outer problem minimizes the cost over projection parameters, smoothed via Gaussian perturbations for differentiability, allowing CSWF algorithms to efficiently optimize projection parameters for conditional flows.

When datasets themselves are mixtures of conditional distributions (e.g., labeled data), Conditional Sliced-Wasserstein Flows can be viewed as instances of hierarchical Wasserstein-over-Wasserstein (WoW) flows (Bonet et al., 9 Jun 2025), with conditionals as "particles" in the higher-order measure. Sliced-Wasserstein kernels within maximum mean discrepancy objectives provide tractable gradient flows for transfer learning and dataset distillation.

5. Practical Algorithms and Computational Aspects

The canonical CSWF algorithm proceeds by evolving an ensemble of particles according to

$X_{k+1}^i = X_k^i + h\, v_k(X_k^i) + \sqrt{2\lambda h}\, Z_{k+1}^i$

where $v_k$ is estimated by Monte Carlo over projected Kantorovich potentials and $Z_{k+1}^i$ is Gaussian noise (or omitted for Liouville-PDE flows (Lee et al., 22 May 2025)). In conditional variants, the drift either depends on conditioning variables or is masked for the $y$ -component in joint space.

Efficient variants exploit closed-form or differentiable approximations for SW distances (Bonet et al., 2021), adaptive energy-weighted slicing (Nguyen et al., 2023), or learnable nonlinear projections (Chapel et al., 28 May 2025). CSWF adapts naturally to diverse generative architectures, including conditional normalizing flows and neural ODEs.

When applied to image generation or fair regression, locally-connected and pyramidal schedules, as well as barycenter approximations via SWF (Lee et al., 22 May 2025), are used to capture inductive biases and align conditional output distributions for demographic parity.

6. Comparative Perspective and Applications

CSWF compares favorably to parametric conditional generative models (GANs, VAEs, diffusion models) in tasks such as class-conditional image generation, image inpainting, and fair regression (Du et al., 2023, Lee et al., 22 May 2025). Its nonparametric, particle-based nature facilitates online updates and the handling of new data.

Performance metrics such as Fréchet Inception Distance (FID), Kolmogorov–Smirnov (KS) scores, and mean squared error (MSE) demonstrate competitive or superior results in applications requiring high-fidelity generation and fairness (Du et al., 2023, Lee et al., 22 May 2025).

CSWF also integrates naturally with recent advances in conditional flow matching (CFM) (Chapel et al., 28 May 2025), Bayesian OT flow matching (Chemseddine et al., 27 Mar 2024), and deep conditional distribution learning via ODE flows (Chang et al., 2 Feb 2024), providing a general framework for conditional transport and sampling.

7. Limitations, Challenges, and Theoretical Implications

Key limitations of CSWF include sensitivity to the choice and number of projection directions, step size control, and the possible lack of equivalence between sliced and regular Wasserstein metrics (Hopper, 9 Jul 2024). While the sliced metric is computationally tractable, it may not capture "internal" structure as fully as the Wasserstein metric, and barycenter and gradient flow solutions may behave qualitatively differently.

Ensuring stability and avoiding degenerate attractors in optimization, particularly for high-dimensional or multimodal distributions, requires careful analysis and adaption of the algorithmic scheme, informed by the theoretical studies in (Vauthier et al., 10 Feb 2025, Tanguy, 2023).

A plausible implication is that the flexibility and scalability of CSWF—through nonparametric flows, adaptive slicing, and integration with neural architectures—will facilitate its adoption for complex real-world generative, domain adaptation, and fairness-critical applications.

In summary, Conditional Sliced-Wasserstein Flow (CSWF) provides a tractable, theoretically grounded framework for conditional generative modeling, leveraging optimal transport and adaptive projection strategies to capture complex conditional distributions. Its mathematical foundation, algorithmic adaptability, and computational efficiency underpin its success in a wide range of modern machine learning tasks.