Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neural JSDEs: Data-Driven Jump Models

Updated 2 May 2026
  • Neural JSDEs are continuous-time models that parameterize drift, diffusion, and jump dynamics with neural networks for flexible, data-driven analysis.
  • They effectively capture heavy-tailed increments and non-stationary behaviors, improving accuracy in financial forecasting, option pricing, and neural population modeling.
  • The framework offers strong theoretical guarantees with advanced numerical discretization and adaptive training schemes, including Gumbel–Softmax for smooth jump approximations.

Neural Jump Stochastic Differential Equations (Neural JSDEs) constitute a class of continuous-time stochastic models in which both drift/diffusion terms and jump mechanisms are parameterized by neural networks. This framework generalizes classical jump-diffusion SDEs and neural SDEs by permitting flexible, data-driven modeling of abrupt changes, heavy-tailed increments, and non-stationary event-driven behaviors in high-dimensional or chaotic time series. Neural JSDEs have demonstrated empirical and theoretical effectiveness in diverse domains, including financial forecasting, option pricing, neural population modeling, and stochastic partial differential equations, underpinned by convergence guarantees and efficient neural parameterizations (Yang et al., 2021).

1. Mathematical Formulation of Neural JSDEs

A canonical Neural JSDE specifies the evolution of a latent process XtRdX_t \in \mathbb{R}^d according to

dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,

where fθff_{\theta_f}, gθgg_{\theta_g}, and hθhh_{\theta_h} are neural networks for drift, diffusion, and jump coefficients respectively; WtW_t is Brownian motion, and LtαL_t^\alpha is a symmetric α\alpha-stable Lévy motion, with α(1,2)\alpha \in (1,2). Pure jump models typically omit the Gaussian part (gθg=0g_{\theta_g} = 0) and focus on non-Gaussian heavy-tailed increments (Yang et al., 2021).

More generally, for marked point processes or compound Poisson-jump models, the formulation becomes

dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,0

where dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,1 is a counting process (possibly state-dependent intensity dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,2) and dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,3 parameterizes event marks or jump sizes, possibly sampled from a neural distribution (Jia et al., 2019).

2. Neural Network Parameterization and Training Schemes

Neural JSDEs leverage modern neural architectures (e.g., multi-layer perceptrons) for each coefficient:

  • Drift dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,4: Typically a Barron-space (two-layer) MLP, enabling dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,5-approximation with rate dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,6 for width dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,7, using ReLU or sigmoid activations (Yang et al., 2021).
  • Jump coefficient dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,8: Often a separate MLP producing coordinatewise nonnegative outputs, sometimes implemented as a constant for fixed initial states.
  • Intensity dXt=fθf(Xt)dt+gθg(Xt)dWt+hθh(Xt)dLtα,dX_t = f_{\theta_f}(X_{t-})\,dt + g_{\theta_g}(X_{t-})\,dW_t + h_{\theta_h}(X_{t-})\,dL_t^\alpha,9 and mark distribution fθff_{\theta_f}0: MLPs with softplus/sigmoid output for intensity, softmax or mixture density for marks.
  • Training Losses: Regression/log-likelihood for drift; binary cross-entropy (via distinguishing “clean/noisy” initial conditions) for jump net; in point-process models, joint negative log-likelihood of events and state trajectory. Optimization employs gradient descent (e.g., Adam), with end-to-end differentiability achieved through techniques such as automatic differentiation and, for discrete jumps, Gumbel–Softmax relaxation (Zheng et al., 5 Jun 2025).

Adaptive ODE/SDE solvers (e.g., Euler–Maruyama) integrate paths between jumps, with special schemes for accurate treatment of cumulative jumps and error control (Gao et al., 5 Jun 2025).

3. Numerical Discretization, Error Control, and Theoretical Guarantees

A standard approach for numerical approximation is the fθff_{\theta_f}1-step Euler–Maruyama scheme: fθff_{\theta_f}2 where fθff_{\theta_f}3 is the symmetric fθff_{\theta_f}4-stable distribution (Yang et al., 2021). For Poisson-compound jumps, discrete steps simulate jump counts (possibly with Gumbel–Softmax approximation for differentiability) and jump magnitude for each interval. Likelihood truncation is adopted to cap the maximum number of jumps in small intervals, with provable bounds fθff_{\theta_f}5 on truncation error (Gao et al., 5 Jun 2025).

Main convergence theorem: If the true solution is unique and the neural networks are Lipschitz and have sufficient width, then as fθff_{\theta_f}6 and fθff_{\theta_f}7,

fθff_{\theta_f}8

in the Skorokhod metric, independent of fθff_{\theta_f}9—evincing no curse of dimensionality (Yang et al., 2021).

Enhanced discretization (Euler–Maruyama with analytic “restart”) further bounds error accumulation by resetting at integer times, guaranteeing

gθgg_{\theta_g}0

versus the gθgg_{\theta_g}1 scaling for vanilla EM (Gao et al., 5 Jun 2025).

4. Empirical Performance in Time-Series, Finance, and Partial Differential Equations

Financial Time Series and Option Pricing

For chaotic financial indices (SSE Energy, SSE 50, SSE Consumer), gθgg_{\theta_g}2-stable neural JSDEs (LDE-Net) outperformed ARIMA, LSTM, and SDE-Net (Gaussian noise), reducing one-step MSE by ≈20% and exhibiting lower multi-step error growth; optimal gθgg_{\theta_g}3 varies across series (Yang et al., 2021). In option pricing (e.g., S&P 500, synthetic Heston/SVCJ), neural models with Poisson-jump relaxations (Gumbel–Softmax for differentiability) achieved significantly lower in-sample and out-of-sample MAE and MSE than both classical and pure NSDE/ANN benchmarks, especially in the presence of discontinuities (Zheng et al., 5 Jun 2025).

Stochastic PDEs and Population Dynamics

Neural JSDE methodologies extend to integro-differential equations, such as FBSDEs and PIDEs, via single-network approximators with Taylor-expansion-based handling of nonlocal jump integrals. The FBSJNN achieves relative errors on the gθgg_{\theta_g}4 scale even in gθgg_{\theta_g}5 dimensions (Ye et al., 2024).

For neuronal populations, jump diffusion equations are transformed via characteristics into a pure Master equation in co-moving coordinates. The neural framework is universal for one-dimensional neural models, efficiently bridging between diffusion (Fokker–Planck) and jump-driven Master equations (Kamps, 2013).

5. Role and Impact of Non-Gaussian and State-Dependent Jump Mechanisms

Empirical studies and theoretical analyses underscore the significance of non-Gaussian gθgg_{\theta_g}6-stable noise and flexible neural jump parameterization:

  • Heavy tails: gθgg_{\theta_g}7-stable drivers capture rare but large fluctuations in financial series, correcting for Gaussian diffusions which underestimate tail risks.
  • Adaptive jump intensity: Neural jump nets (gθgg_{\theta_g}8, gθgg_{\theta_g}9) enable heteroskedastic, state- or time-dependent jump activity, tuning response to local volatility and stochastic regime switches (Yang et al., 2021, Zheng et al., 5 Jun 2025).
  • Learning discontinuities: Neural intensity and Gumbel–Softmax layers allocate probability to discontinuous “jump” pathways, improving fit for processes (e.g., options, stock returns, neural spikes) exhibiting abrupt events (Zheng et al., 5 Jun 2025, Jia et al., 2019).

6. Theoretical Interpretability, Efficiency, and Methodological Variations

Neural JSDEs support rigorous decomposition of model error into discretization, regularity, and universal-approximation components. Single-network strategies for PIDEs reduce parameter count and optimize faster without sacrificing expressivity (Ye et al., 2024). Analytical error bounds and restart-scheme samplers deliver provable improvements in weak error and variance, crucial for forecasting and uncertainty quantification (Gao et al., 5 Jun 2025).

Limitations include potential bias when jump integrals are state-dependent or exhibit infinite activity, necessitating higher-order expansions or specialized quadrature. For heavy-tailed jumps, Taylor truncation may require refinement. Further, ODE/SDE solvers impose computational overhead relative to discrete time models (Ye et al., 2024, Jia et al., 2019).

7. Applications, Extensions, and Outlook

Neural JSDEs have been effectively applied to:

Anticipated extensions include path-dependent PIDEs, multi-scale Lévy activity, and scalable variational inference for latent state estimation under partial observability. The convergence properties, flexibility of representation, and empirical efficacy of Neural JSDEs establish them as foundational in the toolbox of modern stochastic modeling for both applied and theoretical research.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural Jump Stochastic Differential Equations (Neural JSDEs).