Conditional Sampling via Wasserstein Autoencoders and Triangular Transport

Published 3 Apr 2026 in cs.LG and math.OC | (2604.02644v1)

Abstract: We present Conditional Wasserstein Autoencoders (CWAEs), a framework for conditional simulation that exploits low-dimensional structure in both the conditioned and the conditioning variables. The key idea is to modify a Wasserstein autoencoder to use a (block-) triangular decoder and impose an appropriate independence assumption on the latent variables. We show that the resulting model gives an autoencoder that can exploit low-dimensional structure while simultaneously the decoder can be used for conditional simulation. We explore various theoretical properties of CWAEs, including their connections to conditional optimal transport (OT) problems. We also present alternative formulations that lead to three architectural variants forming the foundation of our algorithms. We present a series of numerical experiments that demonstrate that our different CWAE variants achieve substantial reductions in approximation error relative to the low-rank ensemble Kalman filter (LREnKF), particularly in problems where the support of the conditional measures is truly low-dimensional.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces conditional Wasserstein autoencoders (CWAEs) that use a block-triangular transport map to exploit low-dimensional latent structures for improved conditional sampling.
The methodology integrates joint and conditional optimal transport objectives within a variational autoencoder framework, outperforming traditional ensemble Kalman filters.
Empirical results demonstrate that CWAEs significantly reduce Wasserstein error in tasks such as nonlinear filtering and high-dimensional flow field reconstruction.

Conditional Simulation with Wasserstein Autoencoders and Triangular Transport

Introduction

Conditional sampling is fundamental for Bayesian inference and inverse problems, especially when both data ( $Y$ ) and hidden states ( $X$ ) are high-dimensional. Many real-world applications—such as nonlinear filtering and model-based state estimation—rely on efficient and accurate conditional simulation. However, particle methods and ensemble Kalman filters are subject to the curse of dimensionality and often suffer significant error when latent structure is not fully utilized.

This paper introduces Conditional Wasserstein Autoencoders (CWAEs), a data-driven framework built on a variational optimal transport formalism. CWAEs address conditional simulation with explicit architectural constraints—a block-triangular transport map and independent latent codes—that enable exploitation of intrinsic low-dimensionality in both conditioning and conditioned variables. The construction connects triangle-structured measure transport and WAE principles, leading to efficient numerical algorithms and several architectural variants. The framework is validated with rigorous numerical experiments, demonstrating substantial error reduction compared to the low-rank ensemble Kalman filter (LREnKF), especially when the conditional support is intrinsically low-dimensional (2604.02644).

Methodology

Conditional Simulation via Triangular Transport and Autoencoding

The core task is sampling from $P_{X|Y=y}$ for arbitrary $y$ , given data samples $(Y, X)$ . The model posits a generative process in which independent, low-dimensional latent codes $Z$ and $U$ generate $Y$ and $X$ through a block-triangular map $G$ :

$X$ 0

Crucially, this structure means that once $X$ 1 is learned to generate $X$ 2 from $X$ 3, conditional sampling from $X$ 4 is accomplished by (i) encoding $X$ 5 into its latent code $X$ 6 and (ii) sampling $X$ 7 and pushing it through $X$ 8—enabling fast, amortized posterior sampling.

To embed this mechanism into an autoencoder-based generative modeling framework, the Wasserstein autoencoder is modified: The decoder incorporates the block-triangular structure and training objective is tied to minimization of appropriate (joint/conditional) Wasserstein distances. A strong independence constraint is enforced on the latent codes for identifiability and efficiency.

Optimization Criteria: Joint vs. Conditional OT

Two primary objective classes are introduced:

Joint Optimal Transport (OT) Cost: Minimize $X$ 9, enforcing the pushforward of the latent base onto the data distribution. This uses the standard WAE relaxation with a matching penalty for the aggregated posterior.
Conditional OT Cost: A finer objective focusing directly on conditional distributions, integrating Wasserstein distances over all values of $P_{X|Y=y}$ 0: $P_{X|Y=y}$ 1. This is a strictly stronger criterion for conditional approximation compared to joint OT.

The block-triangularity of $P_{X|Y=y}$ 2 enables these objectives to be translated into autoencoding architectures with specific encoder-decoder constraints and tractable training objectives. Three instantiations of the CWAE (CWAE1, CWAE2, CWAE3) are implemented, with different parametrizations of latent encoders and decoder composition.

Theoretical Analysis

The paper establishes equivalence of several variants at the population optimum, a joint formulation for the conditional loss, and error bounds that control the gap between joint and conditional OT objectives. Representation error bounds for the latent encoding show that matching the encoder's distribution to the prior (e.g., via penalty terms or adversarial losses) is critical for empirical performance, especially in the presence of loose regularization or sub-optimal parametrizations.

A key theoretically supported assertion is that the conditional OT cost yields superior control of conditional approximation error, and that block-triangular autoencoder architectures can provably exploit the intrinsic manifold structure of the joint distribution.

Empirical Results

Low-Dimensional Latent Structure, High-Dimensional Observations

A synthetic benchmark is presented in which the ambient space is high-dimensional but all dependence is controlled by a low-dimensional manifold affecting both $P_{X|Y=y}$ 3 and $P_{X|Y=y}$ 4. The posterior support is thus low-dimensional and nonlinear with respect to the ambient coordinates.

Figure 1: Sample distributions for different CWAE variants and LREnKF for the last three states of a synthetic nonlinear embedding, showing superior approximation by CWAEs.

The CWAE variants consistently yield much lower Wasserstein error to ground-truth compared to LREnKF, particularly as ambient dimension increases. CWAE2 demonstrates both accuracy and robustness.

Spherical Posterior Example

Another controlled scenario exploits a spherical posterior arising as the conditional law of a Gaussian under a quadratic observation—leading to nonlinear concentration on a sphere.

CWAE is able to recover the posterior mean and match distributional features more accurately than LREnKF for different conditioning values, indicating its strong performance even for nontrivial nonlinear manifolds.

High-Dimensional Flow Field Reconstruction

CWAE is applied to reconstruct high-dimensional, time-dependent 2D flow fields from sparse, noisy observations. The latent structure corresponds to underlying fluid flow modes.

Figure 2: Simulations for the flow field task show both the original and CWAE-reconstructed fields, with physically meaningful and diversified samples.

Figure 3: The ground-truth $P_{X|Y=y}$ 5 velocity component in the incompressible flow example.

Figure 4: The sample mean of reconstructed $P_{X|Y=y}$ 6 showing CWAE recovers sharp, coherent features from sparse observations.

CWAEs produce accurate reconstructions, as quantified by reductions in relative MSE (both first- and second-moment error) compared to LREnKF, with high-fidelity spatial structure and diversity in samples from the conditional.

Discussion and Implications

The CWAE framework systematizes the exploitation of manifold structure in conditional sampling for high-dimensional inverse problems. Unlike LIS, EnKF, or vanilla particle methods, it does not require explicit likelihood gradients or manual subspace identification. The use of block-triangular transport parametrized by neural networks, together with autoencoder-style variational training, leverages both the flexibility of deep generative models and the precision of measure transport.

Architectural variants illustrate the trade-offs in dimensional encoding, compositionality, and practical decoder parametrization. Empirical results establish that, for a range of nonlinear and high-dimensional tasks, substantial error reductions can be realized if latent structure is present and appropriately encoded. The approach is well-suited to Bayesian state estimation, data assimilation, and inverse problems with structured latent variables.

Future Directions

There is a need to further assess regularization sensitivity, optimizer dynamics, and to extend to sequential filtering contexts. Theoretical analysis of the conditional OT landscape, robustness to misspecification, and connections to Sinkhorn divergences and kernel-based discrepancies open avenues for both algorithmic and statistical investigation. Integration of physics-informed penalties, as in the flow case, could improve generalization to scientific domains.

Conclusion

Conditional Wasserstein autoencoders with block-triangular transport provide a principled and empirically validated framework for scalable, accurate conditional sampling in high dimensions. By systematizing low-dimensional structure discovery and exploiting efficient autoencoder-based transport maps, this approach enables practically viable data-driven nonlinear filtering and Bayesian inference with improved approximation properties over classical techniques such as LREnKF (2604.02644).

Markdown Report Issue