Path-Dependent Neural Jump ODEs

Updated 29 October 2025

The paper introduces a model that unifies continuous latent ODE flows with discrete jump resets to achieve L2-optimal online prediction under irregular observations.
It employs signature transforms for history encoding, enabling robust non-Markovian modeling of long-memory time series and chaotic dynamics.
Empirical results demonstrate superior forecasting, filtering, and generative performance over traditional ODE-based approaches in diverse application domains.

Path-Dependent Neural Jump Ordinary Differential Equations (PD-NJ-ODEs) are a class of neural sequence models tailored for continuous-time prediction, filtering, and generative modeling of dynamical systems exhibiting both continuous evolution and discrete event-driven discontinuities, typically under irregular and incomplete observational regimes. These models generalize classical Neural ODEs and Neural Jump ODEs to admit arbitrary path-dependent dynamics, permitting non-Markovianity, jumps, and optimal estimation properties, with strong theoretical guarantees.

1. Mathematical Foundations and Model Architecture

PD-NJ-ODEs combine piecewise-continuous latent ODE flows with discrete jumps triggered by stochastic or observed events, where the full past trajectory (rather than only the latest state) determines both the evolution and event intensities. Let $(X_t)_{t \in [0,T]}$ denote a stochastic process, observed at random times $t_i$ with masks $M_i$ indicating observed coordinates.

Define the filtration $\mathcal{A}_t = \sigma(X_{t_i, j}, t_i, M_{t_i} \mid t_i \leq t; (M_{t_i})_j = 1 )$ encoding all available information up to $t$ . The $L^2$ -optimal online predictor is the conditional expectation: $\hat{X}_t = \mathbb{E}[ X_t \mid \mathcal{A}_t ].$

PD-NJ-ODE models the hidden state using a latent dynamics equation: $\frac{d}{dt} H_t = f_{\theta_1}\left( H_{t-}, t, \tau(t), \text{history summary}, X_0, \cdots \right)$ with jump resets: $H_{t_i} = \rho_{\theta_2}( H_{t_i-}, t_i, \text{history summary}, X_0, \cdots ),$ where “history summary” is often implemented by the truncated path signature $\pi_m(\tilde{X}^{\leq \tau(t)} - X_0)$ , or other universal path encodings (Krach et al., 2022). The output is $Y_t = g_{\theta_3}(H_t)$ .

Jumps are triggered at observation times or stochastic event times, and the latent state is adjusted via a neural map.

2. Path-Dependence and Conditional Expectation

Unlike Markovian models, PD-NJ-ODEs allow the evolution to depend on the entire available path. Universal approximation is achieved by encoding the observed trajectory (possibly incomplete and irregular) with the signature transform: $\pi_m(\tilde{X}^{\leq \tau(t)} - X_0)$ which captures all algebraic information of the path up to truncation degree $m$ , ensuring that the model can represent any functional of the observed history (Krach et al., 2022).

Consequently, the model admits non-Markovian behaviors such as long-memory, self-excitation, delayed inhibition, and more general path-functional dependencies.

3. Training Objective and Theoretical Guarantees

The objective is to minimize empirical risk aggregating the squared error at observed times, focusing on pre-jump predictions for noisy observations: $\Psi_\text{noisy}(Y) = \mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^n \| M_i \odot (O_{t_i} - Y_{t_i-}) \|_2^2 \right]$ where $O_{t_i}$ denotes noisy observations.

For noiseless or complete observations, the classical NJ-ODE loss is used: $\Psi(Y) = \mathbb{E}\left[\frac{1}{n} \sum_{i=1}^n ( \| X_{t_i} - Y_{t_i} \|_2 + \| Y_{t_i} - Y_{t_i-} \|_2 )^2 \right]$

Under mild regularity and boundedness, minimization yields convergence in $L^2$ to the optimal predictor $\hat{X}_t = \mathbb{E}[ X_t \mid \mathcal{A}_t ]$ (Krach et al., 2022, Andersson et al., 2023). Conditional independence of observations replaces previous restrictive independence assumptions, aligning theory with realistic data acquisition mechanisms.

4. Extensions: Noisy and Dependent Observations

PD-NJ-ODEs are extended to noisy settings by using a noise-adapted loss and pre-jump predictions, ensuring the estimator targets the conditional expectation given noisy data and not the raw noisy observations (Andersson et al., 2023).

For dependent observation times (e.g., clinical triggers based on patient state), the model and convergence proof are generalized to conditional independence: observation mechanisms may depend on the past, but not the unobserved present. The same universal estimation is obtained without algorithmic changes (Andersson et al., 2023).

5. Generative Modeling and Online Filtering

PD-NJ-ODEs and their generative variants (NJODE as generative models (Crowell et al., 3 Oct 2025)) approximate both drift and diffusion coefficients for path-dependent Itô processes from data alone. Training on conditional prediction tasks (e.g., next state, increments, quadratic increments) allows the model to recover instantaneous coefficients: $\hat{\mu}_t^\Delta = \frac{1}{\Delta} \mathbb{E}[ X_{t+\Delta} - X_t \mid \mathcal{A}_t ]$ which are then used for Euler–Maruyama simulation to generate new sample paths under the learned law.

The framework supports conditional path generation based on any discrete, irregular observational history, robustly handling missing data and incomplete sampling without imputation.

6. Practical Applications and Empirical Results

PD-NJ-ODEs have been empirically validated across domains:

Marked and classical point processes: Event intensity estimation, path-dependent Hawkes/self-correcting/seismic processes (Jia et al., 2019).
Chaotic dynamical systems: Learning of double pendulum dynamics from samples, improving long-term prediction via input-skipping and output-feedback mechanisms (Krach et al., 26 Jul 2024).
Non-Markovian, long-memory time series: Fractional Brownian motion, limit order book, and medical time-series forecasting (Krach et al., 2022).
Irregular and incomplete data: Robustness to missingness, noisy measurements, and dependent sampling schedules (Andersson et al., 2023, Crowell et al., 3 Oct 2025).

In all cases, PD-NJ-ODEs demonstrated either theoretical or strong empirical advantage over standard NJ-ODEs, ODE-RNN, GRU-ODE-Bayes, and other neural sequence models, particularly when path dependence or event-driven jumps are statistically relevant.

7. Model Components and Summary Table

Component	Function	Neural Parameterization
Latent ODE flow	Continuous-time evolution (path-dependent)	$f_{\theta_1}$ (MLP, signature/summary input)
Jump/Reset module	Discrete update at event/observation times	$\rho_{\theta_2}$ (MLP/Encoder, path/signature input)
Output map	Prediction from latent state	$g_{\theta_3}$ (MLP)
History encoding	Path summary (signature, mask, stats)	$\pi_m(\cdot)$ or equivalent

PD-NJ-ODEs enable principled, universal, and optimal online forecasting, filtering, and generative modeling for hybrid systems exhibiting both continuous flows and discrete, path-dependent jumps—operating effectively even in imperfect, non-Markovian, and irregular observational scenarios, with robust theoretical foundations and empirical efficacy (Krach et al., 2022, Andersson et al., 2023, Krach et al., 26 Jul 2024, Crowell et al., 3 Oct 2025).