Neural Jump ODEs: Continuous-Time Hybrid Models

Updated 2 October 2025

Neural Jump ODEs are continuous-time hybrid models that integrate neural ODE evolution with instantaneous jump updates for optimal filtering in irregular time series.
They alternate between neural-network-defined continuous flows and memoryless event-triggered jumps to accurately capture stochastic and abrupt dynamics.
NJODEs provide rigorous convergence guarantees and outperform traditional filtering methods in diverse applications such as finance, medical monitoring, and climate forecasting.

Neural Jump Ordinary Differential Equations (NJODEs) are a class of continuous-time models that combine neural ODE evolution with instantaneous state updates (“jumps”) at event times, allowing data-driven prediction, filtering, and estimation in systems with both continuous dynamics and abrupt, irregular inputs or observations. NJODEs generalize neural ODEs to hybrid systems by alternating between neural-network-defined flow and jump mechanisms, providing both rigorous theoretical guarantees and competitive empirical performance, especially for irregularly sampled time series and stochastic dynamical systems.

1. Mathematical Foundations and Core Principles

Neural Jump ODEs operate on an internal state $H_t$ that evolves in continuous time via a neural ordinary differential equation (ODE) between events and is reset by a jump map (typically a neural network) when new observations arrive. Mathematically, the model alternates between

Continuous evolution:

$\frac{d H_t}{dt} = f_{\theta_1}(H_{t-}, X_{\tau(t)}, \tau(t), t - \tau(t))$

where $f_{\theta_1}$ is a neural network that receives the last observation $X_{\tau(t)}$ , its time $\tau(t)$ , and the elapsed time since the last event. This function is theoretically sufficient for optimal filtering in Markovian systems (Herrera et al., 2020).

Jump update at observation times $t_i$ :

$H_{t_i} = \rho_{\theta_2}(X_{t_i})$

The jump is “memoryless” in that $H_{t_i}$ depends only on the new observation $X_{t_i}$ , not the previous hidden state, reflecting the Markov property and ensuring optimality in the L² sense.

Output mapping:

$Y_t = g_{\theta_3}(H_t)$

The supervised loss function enforces the model output to closely track the observed process and penalizes the jump size:

$\Psi(Z) = \mathbb{E}_{P\times \tilde{P}}\left[ \frac{1}{n} \sum_i \left( |X_{t_i} - Z_{t_i}| + |Z_{t_i} - Z_{t_{i-}}| \right)^2 \right]$

The minimizer of this loss is the conditional expectation, i.e., the L²-optimal filter for the underlying process (Herrera et al., 2020, Heiss et al., 4 Dec 2024).

Extensions, such as input-output NJODEs, allow direct modeling of general filtered targets $V$ given irregularly observed input $U$ :

$G_t^\theta(U) \approx \mathbb{E}[V_t | \mathcal{A}_t]$

where $\mathcal{A}_t$ encodes the available history and conditional expectation is taken over output space (Heiss et al., 4 Dec 2024).

2. Theoretical Guarantees and Convergence

NJODEs feature rigorous convergence proofs absent from prior neural ODE-RNN hybrids. For sufficiently expressive networks and training data, the output of the NJODE architecture converges in $L^1$ to the true conditional expectation of the process (or filtered output):

Asymptotic convergence:

$\Phi(\theta_M^{\text{min}}) \rightarrow \Psi(\hat{X})$

as network size $M \rightarrow \infty$ , and for each observation index $k$ , $Y^{\theta_M^{\text{min}}} \rightarrow \hat{X}$ in $L^1$ .

Universal Approximation:

For input-output NJODEs, using truncated signature transforms of input paths yields uniform approximation of any continuous functional, ensuring theoretical universality even under weak regularity assumptions (Heiss et al., 4 Dec 2024).

Monte Carlo approximation:

The empirical loss computed from sampled data converges uniformly to the true expected loss, justifying practical training schemes.

These properties ensure that NJODEs deliver the $L^2$ -optimal estimator for online prediction and filtering in time series data with arbitrary irregularity and missingness.

3. Model Architecture and Design Patterns

The canonical NJODE model consists of three principal neural components:

Component	Role	Mathematical Formulation
ODE Network	Continuous evolution between jumps	$f_{\theta_1}(H_{t-}, ...)$
Jump Encoder	Memoryless update at event/observation	$\rho_{\theta_2}(X_{t_i})$
Readout Network	Mapping from latent state to output	$g_{\theta_3}(H_t)$

Inputs to $f_{\theta_1}$ include not only the hidden state, but the last observation value and its time, and the elapsed time, enforcing Markovian information flow. In input-output NJODEs for general filtering/classification, additional features such as truncated signature transforms $\pi_m(X)$ of input trajectories are used (Heiss et al., 4 Dec 2024).

Variants and related models integrate jump mechanisms at the latent state (as in neural jump SDEs (Jia et al., 2019)), use attention to the event sequence memory, or learn jump and continuous maps for nonparametric inference (Bouchereau et al., 2023).

4. Empirical Performance and Applications

NJODEs have demonstrated effectiveness across synthetic and real-world tasks:

Synthetic stochastic models: Accurate online forecasting for Black–Scholes, Ornstein–Uhlenbeck, Heston, and Cox–Ingersoll–Ross models with irregular and partial observations (Herrera et al., 2020, Heiss et al., 4 Dec 2024).
Parameter filtering: Nonparametric filtering of drift or volatility parameters, outperforming classical procedures (e.g., Kalman or particle filters), especially for non-Gaussian or time-dependent distributions (Heiss et al., 4 Dec 2024).
Online classification: NJODEs learn conditional class probabilities via indicator outputs, achieving robust real-time decision-making even with missing data.
Medical monitoring: NJODEs have improved continuous-time vital sign forecasting using patient data with variable and missing measurements.
Finance and algorithmic trading: Filtering in Black–Scholes models with uncertain coefficients; NJODEs adapt to regime changes and abrupt transitions more robustly than ODE-RNN baselines.
Climate forecasting: Efficiently leveraging incomplete weather station data; handling complex regimes with fewer parameters than ODE-RNN or GRU-ODE-Bayes architectures (Herrera et al., 2020).

NJODEs consistently match or outperform parametric and classical filtering methodologies, particularly when data are irregular, partially observed, or exhibit complex temporal dependencies.

NJODEs generalize and complement several frameworks:

Neural ODEs: NJODEs build upon neural ODE continuous-time representation by embedding jump events, an essential extension for hybrid systems (Ruthotto, 8 Jan 2024).
Neural Jump SDEs: These models combine ODE-driven latent flows and stochastic event-triggered jumps, learning both conditional intensity and jump embeddings for marked point process data. The adjoint method for backpropagation in the presence of discontinuities is central for efficient parameter estimation (Jia et al., 2019).
Markov Jump Processes: Variational inference with neural ODEs for master equations, as in NeuralMJP, leverages learned, time-dependent transition rates and neural network encoders to approximate posterior jump process distributions (Seifner et al., 2023). Such approaches are applicable for nonparametric inference and rare event sensitivity.
Modified Equation Techniques: High-order neural approximations of modified vector fields can be extended to NJODEs, allowing for low-error numerical integration between jumps and learned corrections at event times (Bouchereau et al., 2023).
Event Transition Tensors: High-order Taylor expansions on event manifolds offer a framework for uncertainty quantification and interpretability in neural differential systems involving events, complementing NJODE modeling for explainability (Izzo et al., 2 Apr 2025).

6. Advantages, Limitations, and Open Challenges

Advantages

Universality: With correct architectural design and theoretical loss, NJODEs provide universal, nonparametric approximators for L²-optimal filtering.
Irregular and missing data: Intrinsic compatibility with asynchronous, incomplete observation sets; no need for time regularization or masking.
Low parameter count: Empirical results on real-world data show competitive or superior accuracy with fewer trainable parameters than encoder–ODE–decoder or ODE-RNN hybrids.
Flexible output: The framework supports real-time filtering, regression, and classification, with provable optimality under mild assumptions.

Limitations

Finite sample regimes: The theoretical guarantees hold in the asymptotic limit; finite-sample performance depends on training data, inductive bias, and model complexity.
Hyperparameter sensitivity: Selection of latent dimension, jump map architecture, and signature truncation level significantly affects practical accuracy.
Computational cost: Signature computation (for input-output models) and large neural network training can be computationally demanding.
Event discontinuities: While jump handling is explicit in NJODE design, convergence and stability proofs for high-order or non-smooth dynamics involving jumps remain an active area.

Open Questions

Structured uncertainty quantification: Recent advances in high-order expansions at event times (see Event Transition Tensors (Izzo et al., 2 Apr 2025)) suggest complementary methods for uncertainty propagation and rigorous certification.
Loss function design: Proper construction for regression, classification, and filtering, especially in input-output settings where output is not part of input, is critical to retain theoretical guarantees—see IO loss versus “old” loss formulations (Heiss et al., 4 Dec 2024).
Nonparametric jump map learning: Extending high-order neural modifications to jump-induced discontinuities warrants further investigation for convergence and efficiency.

7. Historical Context and Significance

Neural Jump ODEs were developed to address the limitations of pure neural ODEs and ODE-RNN hybrids in modeling hybrid systems that require rapid adaptation to new information—such as medical monitoring, financial forecasting, and biological dynamics. The universality of their theoretical construction, coupled with empirical advances in irregularly sampled, partial observation environments, positions NJODEs as a foundational model class for contemporary continuous-time deep learning and data-driven dynamical system identification.

Recent work demonstrates robust convergence properties and optimality guarantees for both canonical prediction (same input-output process) and general input-output filtering/classification settings, summarized by (Herrera et al., 2020) and (Heiss et al., 4 Dec 2024). Connections to variational inference for Markov jump processes (Seifner et al., 2023), neural SDEs (Jia et al., 2019), and high-order expansion methods (Izzo et al., 2 Apr 2025) highlight their versatility and ongoing relevance in both theory and industry applications.

A plausible implication is that future advances will further integrate rigorous uncertainty quantification, improved discretization schemes for jump corrections, and data-efficient training objectives optimized for real-world noisy, sparse, or multimodal time series.