Neural ODEs: Continuous-Time Deep Learning

Updated 23 November 2025

NeuralODEs are continuous-time neural networks that parameterize ODE systems, enabling flexible and scalable modeling of dynamical processes.
They leverage continuous adjoint sensitivity for efficient backpropagation and use adaptive numerical solvers to balance computational cost and accuracy.
Extensions incorporating physics-based corrections, Hamiltonian structures, and intervention models broaden NeuralODE applications in scientific machine learning and forecasting.

Neural Ordinary Differential Equations (NeuralODE) are a class of neural network models that parameterize the right-hand side of a continuous-time ordinary differential equation (ODE) using a neural network, allowing end-to-end learning of dynamical systems from data. By interpreting network depth or sequential updates as a discretization of an underlying ODE, they unify perspectives from deep learning and the theory of differential equations. NeuralODEs generalize residual networks to the continuum limit and enable flexible modeling, principled treatment of irregular data, and incorporation of domain-specific constraints, positioning them as foundational models in continuous-time deep learning and scientific machine learning.

1. Mathematical Formulation and Core Principle

The canonical NeuralODE models the evolution of a state vector $x(t)\in\mathbb{R}^d$ as a continuous-time ODE: $\frac{\mathrm{d}x(t)}{\mathrm{d}t} = f_\theta(x(t), t), \qquad x(t_0) = x_0$ where $f_\theta$ is typically a neural network parameterized by $\theta$ and may depend explicitly on time (non-autonomous form) or only on state (autonomous form) (Ruthotto, 8 Jan 2024, Ott et al., 2023, Djeumou et al., 2022).

The forward solution at time $t_1$ is given by integrating: $x(t_1) = x_0 + \int_{t_0}^{t_1} f_\theta(x(t), t)\, dt$ This formalism subsumes discrete-layer models: a residual network layer $x_{l+1}=x_l + f_{\theta_l}(x_l)$ is the forward Euler discretization of the above ODE, and the continuum limit (as layer width $\Delta t\to 0$ ) yields the NeuralODE (Ruthotto, 8 Jan 2024, Zhu et al., 2022).

The training objective for supervised learning is generally to minimize a loss

$\mathcal{L}(\theta) = \ell(x(t_1), y)$

where $\ell$ is an application-specific error metric (e.g., squared error, negative log-likelihood, etc.).

2. Continuous-Time Backpropagation and Adjoint Sensitivity

Rather than backpropagating through a sequence of discrete layers as in conventional deep networks, NeuralODEs leverage the continuous-time adjoint sensitivity method. For a trained solution $x(t)$ , the adjoint state $a(t)=\frac{\partial \mathcal{L}}{\partial x(t)}$ obeys the backward ODE: $\frac{da(t)}{dt} = - a(t)^T \frac{\partial f_\theta}{\partial x}(x(t), t), \qquad a(t_1) = \frac{\partial \ell}{\partial x(t_1)}$ The parameter gradient is computed as: $\frac{d\mathcal{L}}{d\theta} = -\int_{t_0}^{t_1} a(t)^T \frac{\partial f_\theta}{\partial \theta}(x(t), t)\, dt$ This technique allows memory-efficient backpropagation by reconstructing or recomputing the forward trajectory as needed (Ruthotto, 8 Jan 2024, Ott et al., 2023). Hybrid trade-offs (e.g., checkpointing or reversible integration) balance recomputation cost and memory allocation (McCallum et al., 15 Oct 2024).

3. Numerical Integration and the Inverse Modified Differential Equation

Practical deployment of NeuralODEs requires discretizing the ODE for numerical integration. Choices include explicit (Euler, Runge–Kutta), implicit (backward Euler, Radau IIA), and adaptive-step schemes (Dormand–Prince). Training with a discrete solver of order $p$ does not identify the true ODE $f$ but a modified vector field $f_h$ called the inverse modified differential equation (IMDE), whose flow matches the discrete solver trajectory: $f_h(y) = f(y) + h\,\phi_1(y) + h^2\,\phi_2(y) + \cdots$ where the corrections $\phi_k$ are computable from the original $f$ and the solver's structure. The model error due to discretization can be bounded by $O(h^p) +$ optimization loss. For Hamiltonian or symplectic systems, only symplectic integrators yield IMDEs that preserve conservation laws (Zhu et al., 2022).

Adaptive-step and learned integration strategies (e.g., Taylor-Lagrange NODEs) trade off numerical accuracy, stability, and computational cost, with learned correction networks providing data-driven control of truncation error (Djeumou et al., 2022).

4. Extensions for Scientific and Structured Modeling

NeuralODEs provide a flexible architecture for scientific modeling, system identification, and hybrid learning scenarios:

Physics-Augmented Neural ODEs: Classical mechanistic dynamics $g(x,t)$ are complemented by a neural correction $h_\theta$ :

$\frac{\mathrm{d}x}{\mathrm{d}t} = g(x, t) + h_\theta(x, t)$

This approach absorbs unmodeled or uncertain effects while preserving known structure (Ott et al., 2023, Thöni et al., 11 Feb 2025).

Hamiltonian and Symplectic Neural ODEs: For systems with conservation laws, the vector field is parameterized to ensure symplecticity or energy conservation:

$\frac{\mathrm{d}}{\mathrm{d}t}\begin{pmatrix} q\p\end{pmatrix} = J\nabla H(q, p)$

Training may employ symplectic integrators and Bayesian Laplace approximations for uncertainty quantification (Ott et al., 2023).

Intervention Modeling (IMODE): For systems with interventions (e.g., medical treatments, external forces), state variables are split into "autonomous" and "intervention-effect" latents, each with separate flows and jump updates on interventions:

$\begin{aligned} &\dot z_x = f^x_\theta(z_x), \;\; \dot z_a = f^a_\phi(z_a), \ &\dot h = f^h_\psi(h, z_x, z_a) \end{aligned}$

Discrete "jumps" update only the latents associated with new observations or interventions at specified times, yielding improved counterfactual prediction (Gwak et al., 2020).

Universal Differential Equations (UDE): NeuralODEs can be hybridized with known ODE terms, as in CRN modeling, yielding models of the form:

$\frac{dx}{dt} = f_{\rm emp}(x; \theta) + f_{\rm NN}(x; \phi)$

Here $f_{\rm emp}$ encodes known mechanisms, $f_{\rm NN}$ is a neural correction that can be analyzed post hoc via symbolic regression for scientific insight (Thöni et al., 11 Feb 2025).

Constraint-Enforced and Symmetry-Regularized NODEs: Domain constraints or conservation laws (e.g., mass, charge, symmetries from Lie group theory) can be encoded as additional losses or as architectural restrictions, improving stability and interpretability (Hao, 2023, Kumar et al., 2023).

5. Applications and Empirical Findings

NeuralODEs have demonstrated efficacy across a spectrum of scientific and engineering domains:

Time Series and Forecasting: Continuous-time latent models for sequence data, including physiological time series and functional MRI, with interpretable decompositions and uncertainty quantification (Wen, 2020, Gwak et al., 2020).
Physical Systems: Models for stiff and multi-scale systems (e.g., chemical kinetics, pollution, epidemiological networks) with stabilization through scaling, adjoint remedies, or implicit integration (Kim et al., 2021, Kumar et al., 2023, Fronk et al., 8 Oct 2024, Thöni et al., 11 Feb 2025).
Continuous Normalizing Flows: Generative modeling via invertible flows parameterized by NeuralODEs, leveraging Liouville’s formula for tractable density estimation (Ruthotto, 8 Jan 2024, Ehrhardt et al., 13 Mar 2025).
Scientific Surrogates: Accelerated surrogates for expensive ODE integrations (e.g., CFD coupling), enhanced by physics-informed losses and constraints (Kumar et al., 2023).
Sequence Processing: Continuous Fast Weight Programmers, linear Transformers, and neural CDEs for irregular/missing data have extended the NeuralODE formalism to recurrent and memory-intensive domains (Irie et al., 2022).

Empirical results consistently highlight the impact of architectural and solver choices on accuracy, stability, and interpretability. For instance, stiff-aware architectures and implicit integration are essential in accurately learning stiff dynamics, while symmetry-aware constraints can regularize training and recover physically meaningful invariants.

6. Limitations, Open Problems, and Future Directions

NeuralODEs are subject to several methodological and practical challenges:

Solver Dependency: Learned models align with the IMDE dictated by the training numerical solver, not the true ODE; accuracy and the preservation of physical invariants are tightly linked to solver order/symplecticity (Zhu et al., 2022, Djeumou et al., 2022).
Stiffness and Multiscale Limitations: Vanilla explicit solvers, and standard continuous adjoint sensitivity, struggle with stiff ODEs, necessitating implicit solvers, scaling strategies, and stabilized adjoint algorithms (Kim et al., 2021, Fronk et al., 8 Oct 2024).
Expressiveness and Universality: Autonomous NeuralODEs (with time-invariant $f$ ) are non-universal as function approximators; non-autonomous parameterizations (weights $\theta(t)$ as functions of time) are necessary for full universality (Davis et al., 2020).
Counterfactual Inference and Interventions: Modeling interventions requires explicit separation of latent drivers; unobserved or confounded interventions can compromise decomposability (Gwak et al., 2020).
Generalization and Uncertainty Quantification: Bayesian inference (Laplace approximation) and structure-aware constraints are critical for well-calibrated uncertainty, especially when extrapolating beyond data regimes (Ott et al., 2023).
Computational Cost: Adaptive solvers and backpropagation through ODE solvers can become computationally intensive. Taylor-mode and algebraically reversible methods offer scalable alternatives (Djeumou et al., 2022, McCallum et al., 15 Oct 2024).

Open directions include unified treatment of ODEs and controlled differential equations, automated discovery of structure-preserving integrators, adaptive step-size algorithms optimized for NeuralODE training, and broader incorporation of mechanistic knowledge.

7. Summary Table: Key NeuralODE Innovations and Variants

Variant/Method	Core Modification	Application Area
Standard NeuralODE	$dx/dt = f_\theta(x, t)$	General time series/dynamics
Hamiltonian NODE	$dx/dt = J\nabla H_\theta(x)$	Energy-conserving physical systems
Physics-Augmented NODE	$dx/dt = g(x, t) + h_\theta(x, t)$	System identification, SciML
IMODE	Split latent: autonomous/intervention	Counterfactuals, interventions
Symmetry-Regularized NODE	Loss penalizes deviation from symmetry invariants	Conservative dynamics
Universal DE (UDE)	Hybrid mechanistic + neural vector field	SciML, CRN, astrophysics
Stiff-Aware NODE	Scaling, implicit solvers, adjoint stabilization	Multiscale, chemical kinetics
Taylor-Lagrange NODE	Fixed-order Taylor + learned remainder	Fast integration, stiff/complex ODEs
Non-autonomous NODE	$\theta(t)$ as smooth time function	Universal function approximation

Advanced NeuralODE frameworks blend data-driven learning with built-in inductive biases from numerical analysis, dynamical systems, and scientific knowledge, enabling structurally informed, uncertainty-aware, and scalable modeling of continuous-time phenomena (Ruthotto, 8 Jan 2024, Gwak et al., 2020, Zhu et al., 2022, Ott et al., 2023, Thöni et al., 11 Feb 2025, Hao, 2023).