SDE-Conditioned Variational Autoencoders

Updated 26 May 2026

The paper introduces SDE-conditioned VAEs that combine latent stochastic dynamics with neural network parameterization to model complex temporal behaviors.
It demonstrates how integrating SDEs enables explicit modeling of change points, regime switching, and context-specific dynamics in heterogeneous data.
Empirical results show improved distribution matching, robust change-point detection, and state-of-the-art forecasting accuracy across various applications.

Variational Autoencoders Conditioned on SDE Models

Variational autoencoders (VAEs) conditioned on stochastic differential equation (SDE) models constitute a class of expressive, probabilistic generative models that combine tractable inference via the VAE framework with the capacity to represent complex temporal dynamics through SDE-driven latent processes. By parameterizing SDE coefficients (drift and diffusion) with neural networks, these approaches capture structured uncertainties and dynamic regimes across continuous time, and are extensible to handle heterogeneous dynamics, regime switching, change-point phenomena, physical constraints, and financial no-arbitrage requirements. The resulting models unify innovations in temporal representation, statistical inference, and stochastic process modeling.

1. Core Architecture: SDE-Conditioned VAE Generative Modeling

SDE-conditioned VAE models define a generative process in which the latent state evolves according to a (possibly neural) SDE, and observations are emitted from the latent trajectory via a decoder:

Latent SDE dynamics: The latent process $z_t\in\mathbb{R}^d$ evolves as

$dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$

where $f_\theta$ and $g_\theta$ are neural networks, and $W_t$ is a Brownian motion (potentially of dimension $d'$ ).

Initial state prior: The prior on the initial state is typically Gaussian, $p(z_0) = \mathcal{N}(z_0; \mu_0, \Sigma_0)$ , but may be extended to more general forms.
Observation model: Observed data $x_{t_k}$ at time $t_k$ is typically generated by

$x_{t_k} = h_\psi(z_{t_k}) + \epsilon_k,\quad \epsilon_k \sim \mathcal{N}(0, \sigma^2 I)$

with $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 0 a neural network (possibly conditioned on exogenous context).

The complete generative model, including extensions for change points and context conditioning, is given by integrating the SDE (or concatenating segments across change points), and then sampling from the observation likelihood at the observation times (El-Laham et al., 2024, Samota et al., 1 Apr 2026).

2. Variational Inference and Evidence Lower Bound (ELBO)

Inference in these models is accomplished via amortized variational approximations. The central construct is the ELBO:

$dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 1

Here, $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 2 is an encoder (recognition) network parameterizing a Gaussian approximate posterior for the initial latent state. The latent process $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 3 is generated by integrating the SDE conditioned on $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 4.

In some frameworks, the complete latent path is inferred as a temporally-structured variational distribution, e.g., factorized or CRF-style, leveraging both SDE dynamics and flexible neural encoders (Garcia et al., 2020, Rice, 8 Jan 2026). In models with change points, the lower bound depends on the current regime segmentation $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 5 and must be optimized iteratively alongside the SDE parameters (El-Laham et al., 2024).

Pathwise or nested Monte Carlo estimators are employed for terms involving SDE simulation: $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 6 where $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 7, $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 8 are integrated SDE paths (El-Laham et al., 2024).

3. Change Points, Heterogeneity, and Regime-Switching

Many applications require the ability to model structural breakpoints (change points) or regime switches:

Neural SDEs with change points implement time segmentation: for $dz_t = f_\theta(z_t, t)\,dt + g_\theta(z_t, t)\,dW_t$ 9, drift and diffusion are parameterized by $f_\theta$ 0; for $f_\theta$ 1, by $f_\theta$ 2. The change point $f_\theta$ 3 can be estimated using maximum likelihood with a bootstrap particle filter or by a sequential likelihood ratio test (SLRT):

$f_\theta$ 4

with particle filter estimators for unbiased and consistent likelihoods. The iterative procedure alternates between updating model parameters $f_\theta$ 5 (with $f_\theta$ 6 fixed) and updating $f_\theta$ 7 via marginal likelihood maximization or SLRT. Theoretical results guarantee stationary-point convergence for the ELBO and optimality for the change-point detector in the particle limit (El-Laham et al., 2024).

Regime-switching SDEs as in arbitrage-free financial modeling describe the latent process $f_\theta$ 8 as following different SDE parameters depending on a discrete-valued latent process $f_\theta$ 9 governed by a continuous-time Markov chain; such hierarchical SDEs can be embedded as the generative backbone of a VAE (Ning et al., 2021).
Conditional embeddings ( $g_\theta$ 0) or covariates can be included, augmenting both encoder and SDE drift/diffusion to yield instance- or segment-specific latent dynamics, as in V-NSDE (Samota et al., 1 Apr 2026).

4. Extensions: Physics, Finance, and Schrödinger Bridge Generalization

SDE-conditioned VAEs have been extended to incorporate structure and constraints relevant to physical and financial systems:

No-arbitrage and physics constraints: Term structure models strictly penalize arbitrage violations via an explicit PDE penalty integrated with SDE-constrained latent evolution. In yield curve modeling, a two-stage architecture decouples shape-level representation learning (via a heavy-tailed, conditional VAE) and latent SDE evolution, with the latter regulated against a no-arbitrage PDE by Itô calculus and a Girsanov-based adjustment for the market price of risk. The overall loss includes both data fit under the SDE and the PDE penalty (Luo et al., 12 May 2026).
Physics-informed generative modeling: PI-VAE integrates the decoder with the governing SDE, applying automatic differentiation to enforce satisfaction of the SDE and boundary conditions. The loss leverages Maximum Mean Discrepancy (MMD) between true sensor measurements and decoded outputs, as well as between the aggregated posterior and the latent prior (Zhong et al., 2022).
Schrödinger bridge models reinterpret diffusion-based generative pathways as infinite-dimensional VAEs. Both encoder and decoder are represented by (potentially neural) SDEs evolving forward and backward in time, with the training objective derived from the pathwise Kullback-Leibler divergence respecting the data processing inequality:

$g_\theta$ 1

This bridges classical VAEs, score-based diffusion models, and optimal transport under stochastic dynamics (Kaba et al., 2024).

5. Training Algorithms and Practical Details

The generic training cycle for SDE-conditioned VAEs is as follows:

Encoder pass: Compute posterior parameters for $g_\theta$ 2 (and possibly auxiliary variables); sample via the reparameterization trick.
SDE Sampling: Simulate $g_\theta$ 3 by integrating the neural SDE from $g_\theta$ 4, using Euler–Maruyama or higher-order integrators; possibly segment the integration at change points.
Decoder pass: Generate reconstructed observations $g_\theta$ 5, again possibly conditioned on segment or context.
Monte Carlo Estimation: Evaluate (nested) pathwise reconstruction likelihood, KL divergence, and auxiliary terms (e.g., predictive regularization, PDE penalty).
Change-point update: If relevant, compute change-point likelihood metrics and update regime segmentation using BPF or SLRT.
Backpropagation: Compute gradients of the total loss with respect to all parameters, employing the reparameterization trick through the SDE integration. For gradient-based, continuous-time models, adjoint sensitivity analysis can avoid storing full latent paths (El-Laham et al., 2024, Samota et al., 1 Apr 2026, Rice, 8 Jan 2026).

Empirical success depends on design choices regarding network architecture, regularization (e.g., $g_\theta$ 6-VAE weighting), SDE discretization, and optimization (typically Adam or AdamW).

6. Empirical Results and Theoretical Guarantees

Empirical and theoretical properties include:

Distributional fidelity: SDE-conditioned VAEs outperform direct VAEs and traditional benchmarks in matching empirical distributions of financial variables (FX implied volatility, yield curve shapes), environmental indices (air quality), and time series with structural breaks (El-Laham et al., 2024, Ning et al., 2021, Luo et al., 12 May 2026).
Change-point/local regime recovery: In synthetic and real datasets, explicit change-point modeling through the CP-SDEVAE variant yields improved ELBO values and accurate change-point detection even under multiple change scenarios (El-Laham et al., 2024).
No-arbitrage compliance and forecasting error: Physics-informed and no-arbitrage regularized frameworks strongly suppress economic inconsistencies in financial applications, achieving state-of-the-art RMSE and robust regime scenario generation (Luo et al., 12 May 2026).
Identifiability: Under mild technical conditions, learned SDEs (drift, diffusion, and decoder) are identifiable up to isometry in the infinite data regime (Hasan et al., 2020).
Theoretical optimality: Change-point detection via bootstrapped likelihood ratio tests is provably optimal under particle limits, and alternating maximization in $g_\theta$ 7 converges to stationary points for the ELBO (El-Laham et al., 2024).

7. Extensions, Limitations, and Research Directions

Score-based/diffusion models: As a limiting case, SDE-conditioned VAEs with unidirectional and fixed encoder drift recover score-matching diffusion models, linking the VAE, score-based, and Schrödinger bridge paradigms (Kaba et al., 2024).
Irregular data and heterogeneous embeddings: The expressive capacity of neural SDEs accommodates context conditioning, irregular observation times, and cross-sectional heterogeneity, extending applicability to domains as diverse as macroeconomics, climate, and molecular kinetics (Samota et al., 1 Apr 2026).
Physical interpretability: Embedding physical principles (energy landscapes, Kramers’ rates, conservation laws via autodiff-enforced SDEs) enables both diagnostic analysis and principled scenario generation (Koop et al., 2022, Zhong et al., 2022).
Algorithmic stability: Variance reduction, adjoint regularization, Lipschitz constraints, and robust scaling remain crucial for numerically stable training (especially for long horizons or stiff SDEs) (Rice, 8 Jan 2026).
Practical implementation: Detailed tables listing network/hyperparameter choices, performance metrics, and ablation study results are given in the cited works; batch sizes of 200–1000, hidden layer widths of 64–512, and latent dimensions of 3–15 are typical (Ning et al., 2021, Luo et al., 12 May 2026).

In summary, VAEs conditioned on SDE models provide a scalable and theoretically robust framework for learning and generative modeling in complex, heterogeneous temporal domains characterized by stochastic dynamical structure, regime switching, and domain-specific constraints (El-Laham et al., 2024, Samota et al., 1 Apr 2026, Hasan et al., 2020, Ning et al., 2021, Luo et al., 12 May 2026, Koop et al., 2022, Kaba et al., 2024, Zhong et al., 2022, Rice, 8 Jan 2026).