Neural SDE Learning

Updated 13 June 2026

Neural SDE Learning is a framework that models continuous-time stochastic systems by parameterizing both drift and diffusion using neural networks.
It leverages methods such as maximum likelihood, adversarial training, and variational inference to optimize system parameters and capture uncertainty.
Applications include reinforcement learning, financial modeling, robotics, and biological time series, demonstrating robust performance in complex dynamics.

Neural Stochastic Differential Equation (Neural SDE) learning is the study and development of methodologies to infer, represent, and exploit stochastic dynamical systems where both drift and diffusion coefficients are parameterized by neural networks. Neural SDEs unify classical SDE modeling with deep learning, providing expressive tools for generative modeling, latent dynamics inference, uncertainty quantification, robust time-series analysis, and high-performing model-based reinforcement learning under uncertainty.

1. Mathematical Formulation of Neural SDEs

A neural SDE models the evolution of a continuous-time state $x_t \in \mathbb{R}^d$ as

$dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$

where:

$f_\theta$ (drift) and $g_\theta$ (diffusion) are parameterized by neural networks with parameters $\theta$ ,
$a_t$ denotes possible exogenous actions or controls (in control/RL scenarios),
$W_t$ is a $q$ -dimensional standard Brownian motion.

Variations exist:

Latent neural SDEs: Hidden dynamics in a latent space $(z_t)$ with observations generated via an emission model (Han et al., 24 Mar 2026).
Physics-informed neural SDEs: $f_\theta$ encodes known physics-based components while $dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 0 models state- or distance-aware stochasticity (Djeumou et al., 2023).
Hierarchical or manifold neural SDEs: Multi-level SDE stacking for latent manifold modeling in high-dimensional time series (Rajaei et al., 29 Jul 2025).

Discrete-time data is typically related to the SDE by the Euler–Maruyama scheme: $dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 1

2. Learning Algorithms and Training Objectives

Neural SDE learning leverages different paradigms:

(a) Maximum Likelihood via Markov Transitions

For supervised time-series:

Derive per-step likelihood under Euler–Maruyama discretization, resulting in a conditional Gaussian for each step (Shen et al., 31 Jan 2025, Kałuża et al., 2023, Dridi et al., 2021, Dietrich et al., 2021).
Closed-form negative log-likelihood:

$dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 2

with $dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 3, $dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 4 (Dridi et al., 2021).

(b) Simulation-Free/Analytic Schemes

For regular or irregular grids, gradients are computed without Monte Carlo path simulations by exploiting the Gaussian step-wise structure (Shen et al., 31 Jan 2025).
Decoupled flow-and-diffusion optimization alternates updates for $dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 5 and $dx_t = f_\theta(x_t, t, a_t)\,dt + g_\theta(x_t, t, a_t)\,dW_t,$ 6 for improved conditioning.

(c) GAN/Adversarial Training in Path Space

Wasserstein-GAN objectives are employed, treating an SDE solver as a continuous-time generator (Kidger et al., 2021, Xu et al., 23 Dec 2025).
Discriminators instantiate Continuous-time Neural Controlled Differential Equations (CDEs) or parameter-efficient Hermite expansions for improved stability and path-wise discriminative power (Xu et al., 23 Dec 2025).
The gradient penalty ensures 1-Lipschitz discriminators.

(d) Variational Inference/ELBOs

Latent SDEs trained via filter/ELBO or IWAE-style bounds; inference SDEs capture posterior path measures (Liu et al., 2020, Wang et al., 2023).
Girsanov's theorem enables exact computation of likelihoods or KL-weights between neural SDEs with shared diffusion (Cameron et al., 2021, Liu et al., 2020).

(e) Numerical and Path-Space Quadrature

High-order Wiener-space cubature reduces Monte Carlo variance by deterministically sampling cubature paths and using ODE adjoint methods for gradient computation, achieving accelerated convergence rates (2502.12395).

3. Model Architectures and Practical Implementation

MLP-based parameterizations: Drift and diffusion are typically multilayer perceptrons, possibly incorporating time and action inputs for non-homogeneous or controlled processes (Han et al., 24 Mar 2026, Shen et al., 31 Jan 2025).
Constraint handling: Diffusion outputs are often enforced to be positive definite (e.g., via softplus or Cholesky parameterizations) (Dietrich et al., 2021).
Physics-informed or gray-box structures: Modular drift architectures allow embedding domain knowledge (Djeumou et al., 2023, Dietrich et al., 2021).
Latent models and emission/decoder networks: Latent ODE- or SDE-based generative processes for unobserved state modeling (Liu et al., 2020, Han et al., 24 Mar 2026).
Spline/encoder-based time embeddings: For irregular time steps, time is embedded into the input of neural networks or encoded using time-aware encoders (Rajaei et al., 29 Jul 2025, Chen et al., 20 Mar 2026).
Hypergraph-SDE systems: Higher-order connectivity (e.g., for fMRI) is modeled via SDE-reconstructed latent trajectories and SDE-driven evolution of network weights (Chen et al., 20 Mar 2026).

4. Applications and Empirical Results

Neural SDEs have demonstrated applicability across scientific and engineering domains:

Model-based reinforcement learning (MBRL): Neural SDEs as transition models in MPC/SAC frameworks enable RL agents to handle stochasticity and partial observability, outperforming deterministic neural ODE and conventional RL techniques in sample efficiency and policy robustness (Han et al., 24 Mar 2026).
Financial modeling: Neural SDE frameworks achieve significant improvements in option pricing for both European and American derivatives by accommodating rich, nonparametric volatility structures (Fan et al., 2024). SGD and PDE-based methods allow large-scale training.
Uncertainty-aware robotics and control: Physics-constrained neural SDEs permit real-time model-based control (e.g., hexacopter) and generalize far outside the training regime, with uncertainty estimates that grow off-manifold to avoid dangerous exploitation (Djeumou et al., 2023).
Biological and neural time series: Hierarchical latent-SDE models recover low-dimensional manifold structures in high-dimensional time series and scale linearly in trajectory length (Rajaei et al., 29 Jul 2025).
Structure learning: Variational methods over neural SDEs infer causal graphs from irregularly sampled data, with provable identifiability (Wang et al., 2023).
Generative modeling: GAN and Hermite-guided adversarial training approaches learn complex SDE path distributions more efficiently and with improved sample quality over classical and CDE-based discriminators (Kidger et al., 2021, Xu et al., 23 Dec 2025).

5. Theoretical Guarantees and Numerical Considerations

Expressivity and Controllability: The function class realizable by a neural SDE is related to the optimal control cost required to steer deterministic surrogates, providing upper/lower bounds on sample complexity and functional representability (Veeravalli et al., 2022).
Identifiability: Sufficient conditions such as global Lipschitz drift and nondegenerate diagonal diffusion ensure that distinct parameterizations induce distinct observable path distributions (Wang et al., 2023).
Convergence and Robustness:
- Path-integral and cubature-based estimators achieve lower gradient variance and faster rates than standard Monte Carlo (Cameron et al., 2021, 2502.12395).
- Lyapunov-style conditions quantify stability to input perturbations, with stochastic noise often improving robustness over deterministic neural ODE baselines (Liu et al., 2019).
Numerical solvers: Euler–Maruyama is standard, but Milstein or higher-order schemes are recommended for improved bias and learning of diffusion terms, especially in regimes with variable time steps or strong nonlinearities (Dietrich et al., 2021).

6. Limitations, Challenges, and Future Directions

Numerical challenges: The sequential nature of SDE solvers introduces scaling and memory bottlenecks, though recent advances (parallelized importance sampling, cubature quadrature) mitigate this (2502.12395, Cameron et al., 2021).
Diffusion parameterization: Learning non-diagonal or low-rank diffusion structures remains challenging in high dimensions (Shen et al., 31 Jan 2025, Rajaei et al., 29 Jul 2025).
Partial observability and missing data: Handling partial or noisy observations often requires amortized inference networks and sophisticated variational objectives (Liu et al., 2020, Han et al., 24 Mar 2026).
Sample complexity: Expressivity grows with network capacity and time horizon, but high stochasticity or contractive drift can make learning easier or harder depending on the system’s controllability properties (Veeravalli et al., 2022).
Open directions: Key areas include efficient online/streaming updates, scalable latent variable inference, robust out-of-manifold generalization, uncertainty calibration, and domain-specific integration with physical models, reversible SDEs, and beyond-Brownian noise models.

7. Summary Table: Core Approaches and Benchmarks

Learning Method	Key Mechanism	Representative Results / Use Cases
Maximum Likelihood (EM, step-wise)	Closed-form per-step Gaussian likelihood	Accurate recovery of drift/diffusion in GBM, SL, OU (Dridi et al., 2021, Shen et al., 31 Jan 2025)
GAN/Adversarial Training	Pathwise WGAN, CDE/Hermite discriminator	Sample-quality leader in synthetic/real SDEs (Kidger et al., 2021, Xu et al., 23 Dec 2025)
Variational Inference	Path-ELBO via Girsanov, ODE-RNN amortized inference	Best-in-class on irregular time series and latent structure (Wang et al., 2023, Liu et al., 2020)
Physics/gray-box SDEs	Hybrid models embed domain equations + learn residuals	Real-time and low-data model-based control (Djeumou et al., 2023, Dietrich et al., 2021)