Neural JSDEs: Data-Driven Jump Models

Updated 2 May 2026

Neural JSDEs are continuous-time models that parameterize drift, diffusion, and jump dynamics with neural networks for flexible, data-driven analysis.
They effectively capture heavy-tailed increments and non-stationary behaviors, improving accuracy in financial forecasting, option pricing, and neural population modeling.
The framework offers strong theoretical guarantees with advanced numerical discretization and adaptive training schemes, including Gumbel–Softmax for smooth jump approximations.

Neural Jump Stochastic Differential Equations (Neural JSDEs) constitute a class of continuous-time stochastic models in which both drift/diffusion terms and jump mechanisms are parameterized by neural networks. This framework generalizes classical jump-diffusion SDEs and neural SDEs by permitting flexible, data-driven modeling of abrupt changes, heavy-tailed increments, and non-stationary event-driven behaviors in high-dimensional or chaotic time series. Neural JSDEs have demonstrated empirical and theoretical effectiveness in diverse domains, including financial forecasting, option pricing, neural population modeling, and stochastic partial differential equations, underpinned by convergence guarantees and efficient neural parameterizations (Yang et al., 2021).

1. Mathematical Formulation of Neural JSDEs

A canonical Neural JSDE specifies the evolution of a latent process $X_t \in \mathbb{R}^d$ according to

$dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$

where $f_{\theta_f}$ , $g_{\theta_g}$ , and $h_{\theta_h}$ are neural networks for drift, diffusion, and jump coefficients respectively; $W_t$ is Brownian motion, and $L_t^\alpha$ is a symmetric $\alpha$ -stable Lévy motion, with $\alpha \in (1,2)$ . Pure jump models typically omit the Gaussian part ( $g_{\theta_g} = 0$ ) and focus on non-Gaussian heavy-tailed increments (Yang et al., 2021).

More generally, for marked point processes or compound Poisson-jump models, the formulation becomes

$dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 0

where $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 1 is a counting process (possibly state-dependent intensity $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 2) and $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 3 parameterizes event marks or jump sizes, possibly sampled from a neural distribution (Jia et al., 2019).

2. Neural Network Parameterization and Training Schemes

Neural JSDEs leverage modern neural architectures (e.g., multi-layer perceptrons) for each coefficient:

Drift $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 4: Typically a Barron-space (two-layer) MLP, enabling $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 5-approximation with rate $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 6 for width $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 7, using ReLU or sigmoid activations (Yang et al., 2021).
Jump coefficient $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 8: Often a separate MLP producing coordinatewise nonnegative outputs, sometimes implemented as a constant for fixed initial states.
Intensity $dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,$ 9 and mark distribution $f_{\theta_f}$ 0: MLPs with softplus/sigmoid output for intensity, softmax or mixture density for marks.
Training Losses: Regression/log-likelihood for drift; binary cross-entropy (via distinguishing “clean/noisy” initial conditions) for jump net; in point-process models, joint negative log-likelihood of events and state trajectory. Optimization employs gradient descent (e.g., Adam), with end-to-end differentiability achieved through techniques such as automatic differentiation and, for discrete jumps, Gumbel–Softmax relaxation (Zheng et al., 5 Jun 2025).

Adaptive ODE/SDE solvers (e.g., Euler–Maruyama) integrate paths between jumps, with special schemes for accurate treatment of cumulative jumps and error control (Gao et al., 5 Jun 2025).

3. Numerical Discretization, Error Control, and Theoretical Guarantees

A standard approach for numerical approximation is the $f_{\theta_f}$ 1-step Euler–Maruyama scheme: $f_{\theta_f}$ 2 where $f_{\theta_f}$ 3 is the symmetric $f_{\theta_f}$ 4-stable distribution (Yang et al., 2021). For Poisson-compound jumps, discrete steps simulate jump counts (possibly with Gumbel–Softmax approximation for differentiability) and jump magnitude for each interval. Likelihood truncation is adopted to cap the maximum number of jumps in small intervals, with provable bounds $f_{\theta_f}$ 5 on truncation error (Gao et al., 5 Jun 2025).

Main convergence theorem: If the true solution is unique and the neural networks are Lipschitz and have sufficient width, then as $f_{\theta_f}$ 6 and $f_{\theta_f}$ 7,

$f_{\theta_f}$ 8

in the Skorokhod metric, independent of $f_{\theta_f}$ 9—evincing no curse of dimensionality (Yang et al., 2021).

Enhanced discretization (Euler–Maruyama with analytic “restart”) further bounds error accumulation by resetting at integer times, guaranteeing

$g_{\theta_g}$ 0

versus the $g_{\theta_g}$ 1 scaling for vanilla EM (Gao et al., 5 Jun 2025).

4. Empirical Performance in Time-Series, Finance, and Partial Differential Equations

Financial Time Series and Option Pricing

For chaotic financial indices (SSE Energy, SSE 50, SSE Consumer), $g_{\theta_g}$ 2-stable neural JSDEs (LDE-Net) outperformed ARIMA, LSTM, and SDE-Net (Gaussian noise), reducing one-step MSE by ≈20% and exhibiting lower multi-step error growth; optimal $g_{\theta_g}$ 3 varies across series (Yang et al., 2021). In option pricing (e.g., S&P 500, synthetic Heston/SVCJ), neural models with Poisson-jump relaxations (Gumbel–Softmax for differentiability) achieved significantly lower in-sample and out-of-sample MAE and MSE than both classical and pure NSDE/ANN benchmarks, especially in the presence of discontinuities (Zheng et al., 5 Jun 2025).

Stochastic PDEs and Population Dynamics

Neural JSDE methodologies extend to integro-differential equations, such as FBSDEs and PIDEs, via single-network approximators with Taylor-expansion-based handling of nonlocal jump integrals. The FBSJNN achieves relative errors on the $g_{\theta_g}$ 4 scale even in $g_{\theta_g}$ 5 dimensions (Ye et al., 2024).

For neuronal populations, jump diffusion equations are transformed via characteristics into a pure Master equation in co-moving coordinates. The neural framework is universal for one-dimensional neural models, efficiently bridging between diffusion (Fokker–Planck) and jump-driven Master equations (Kamps, 2013).

5. Role and Impact of Non-Gaussian and State-Dependent Jump Mechanisms

Empirical studies and theoretical analyses underscore the significance of non-Gaussian $g_{\theta_g}$ 6-stable noise and flexible neural jump parameterization:

Heavy tails: $g_{\theta_g}$ 7-stable drivers capture rare but large fluctuations in financial series, correcting for Gaussian diffusions which underestimate tail risks.
Adaptive jump intensity: Neural jump nets ( $g_{\theta_g}$ 8, $g_{\theta_g}$ 9) enable heteroskedastic, state- or time-dependent jump activity, tuning response to local volatility and stochastic regime switches (Yang et al., 2021, Zheng et al., 5 Jun 2025).
Learning discontinuities: Neural intensity and Gumbel–Softmax layers allocate probability to discontinuous “jump” pathways, improving fit for processes (e.g., options, stock returns, neural spikes) exhibiting abrupt events (Zheng et al., 5 Jun 2025, Jia et al., 2019).

6. Theoretical Interpretability, Efficiency, and Methodological Variations

Neural JSDEs support rigorous decomposition of model error into discretization, regularity, and universal-approximation components. Single-network strategies for PIDEs reduce parameter count and optimize faster without sacrificing expressivity (Ye et al., 2024). Analytical error bounds and restart-scheme samplers deliver provable improvements in weak error and variance, crucial for forecasting and uncertainty quantification (Gao et al., 5 Jun 2025).

Limitations include potential bias when jump integrals are state-dependent or exhibit infinite activity, necessitating higher-order expansions or specialized quadrature. For heavy-tailed jumps, Taylor truncation may require refinement. Further, ODE/SDE solvers impose computational overhead relative to discrete time models (Ye et al., 2024, Jia et al., 2019).

7. Applications, Extensions, and Outlook

Neural JSDEs have been effectively applied to:

Financial forecasting of chaotic, jump-driven series (Yang et al., 2021)
Option pricing in the presence of jump risk (Zheng et al., 5 Jun 2025)
High-dimensional PIDE and FBSDE problems (Ye et al., 2024)
Population density modeling in neuroscience (Kamps, 2013)
Non-stationary time series exhibiting abrupt stochastic changes (Gao et al., 5 Jun 2025)
Temporal point processes and event-sequence modeling (Jia et al., 2019)

Anticipated extensions include path-dependent PIDEs, multi-scale Lévy activity, and scalable variational inference for latent state estimation under partial observability. The convergence properties, flexibility of representation, and empirical efficacy of Neural JSDEs establish them as foundational in the toolbox of modern stochastic modeling for both applied and theoretical research.