Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 30 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 110 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Neural Controlled Differential Equations

Updated 14 August 2025
  • Neural Controlled Differential Equations are data-driven models that generalize Neural ODEs to handle irregular, partially observed time series using principles from rough path theory.
  • They leverage continuous interpolation and controlled differential equations to update hidden states in response to dynamic, non-uniform data inputs.
  • Empirical results show NCDEs achieve robust performance and superior accuracy across healthcare, speech, and trajectory tasks while significantly reducing memory usage.

Neural Controlled Differential Equations (NCDEs) are a class of neural dynamical systems that generalize Neural Ordinary Differential Equations (Neural ODEs) to provide a principled, data-driven framework for modeling irregular and partially observed multivariate time series. NCDEs leverage controlled differential equations from rough path theory, allowing the system to evolve its hidden state under the “control” of a continuously interpolated input path constructed from the observed data. This innovation directly addresses the inability of classical ODE models to assimilate new information mid-trajectory and enables both adaptability and theoretical expressiveness in learning functions on path space.

1. Mathematical Formulation and Theoretical Foundations

The core of the NCDE framework is the following evolution law for the hidden state ztz_t: zt=zt0+t0tfθ(zs)dXsz_t = z_{t_0} + \int_{t_0}^t f_\theta(z_s) \, dX_s Here:

  • ztRhz_t \in \mathbb{R}^h is the hidden state at time tt,
  • X:[t0,tn]Rd+1X: [t_0, t_n] \rightarrow \mathbb{R}^{d+1} is the “control path”, typically constructed by interpolating the observed time series with time included as an additional channel (i.e., Xti=(xi,ti)X_{t_i} = (x_i, t_i)),
  • fθ:RhRh×(d+1)f_\theta: \mathbb{R}^h \rightarrow \mathbb{R}^{h \times (d+1)} is the learnable vector field parameterized by a neural network.

The integration is in the sense of a Riemann–Stieltjes (or, when required, Young/rough paths) integral, which allows treatment of controls XtX_t with limited regularity—enabling robust modeling of observed, irregular data streams.

Notably, this formulation generalizes Neural ODEs, which can be written as

zt=zt0+t0tfθ(zs)dsz_t = z_{t_0} + \int_{t_0}^t f_\theta(z_s) \, ds

where the driving path is simply time, making the trajectory uniquely determined by the initial condition. In contrast, NCDEs are driven by the data itself, so that the trajectory evolves as new inputs arrive.

The paper establishes a universal approximation property: for any continuous function on path space, there exists a neural CDE mapping initial observations to an output that approximates it arbitrarily well (informally, “the action of a linear map on the terminal value of a Neural CDE is a universal approximator from sequences Rv\mathbb{R}^v to R\mathbb{R}”). Furthermore, every continuous data-driven ODE model of the form

zt=z0+t0thθ(zs,Xs)dsz_t = z_0 + \int_{t_0}^t h_\theta(z_s, X_s) ds

can be exactly represented by an NCDE, but the converse is not true. Thus, NCDEs are strictly more expressive.

2. Modeling Irregular and Partially Observed Time Series

Classic RNNs and many ODE-based models presuppose uniform sampling in time or rely on imputation/binning to handle irregularity. NCDEs natively encode both irregular timestamp information and partially observed channels by:

  • Interpolating each observed feature to construct a continuous XtX_t,
  • Concatenating time and optional observation-frequency indicators (improving the model’s awareness of when and how often each feature is measured).

Because the hidden state is updated via integration with respect to XtX_t (and not simply in discrete increments), NCDEs continuously and naturally assimilate incoming information. Adaptive ODE solvers can be used, with function evaluations concentrated where the path varies more.

Partially observed variables are handled by independent (or jointly masked) interpolation, and the addition of “observational intensity” channels enables the model to capture informative cross-channel temporal sampling patterns.

3. Memory-Efficient Adjoint-Based Backpropagation

A salient advantage of the NCDE framework is the ability to use adjoint sensitivity methods for memory-efficient training across the entire irregular observation window. Unlike ODE-RNN/hybrid models, which require storing intermediate states at every observation (incurring O(H)O(H) memory per observation), the smooth, uninterrupted integration of NCDE enables use of the adjoint method across the entire interval. The memory footprint remains proportional to the hidden state size, with a small overhead depending on the number of ODE solver steps.

Implementation commonly proceeds by reducing the NCDE to an equivalent ODE (valid for controls with sufficient regularity),

gθ,X(z,s)=fθ(z)dXds(s)g_{\theta,X}(z,s) = f_\theta(z) \cdot \frac{dX}{ds}(s)

and then integrating

zt=zt0+t0tgθ,X(z,s)dsz_t = z_{t_0} + \int_{t_0}^t g_{\theta,X}(z,s) ds

allowing the use of standard ODE solvers such as Runge–Kutta and adjoint-based differentiation libraries (e.g., torchdiffeq).

4. Empirical Performance Across Datasets

Empirical results demonstrate the efficacy of NCDEs across diverse domains and noise regimes:

  • CharacterTrajectories: Under extensive random dropping of data (30–70%), NCDEs retain stable accuracy, outperforming baselines such as GRU-ODE, ODE-RNN, GRU-Δt, and GRU-D.
  • PhysioNet Sepsis Prediction: On this ICU patient dataset with high irregularity and partial observability, the NCDE model achieves the highest AUC when using observational intensity; performance remains robust even without it, often exceeding that of ODE- and RNN-based models.
  • Speech Commands: For fully observed, regularly sampled audio time series, NCDE surpasses all tested alternatives, showing negligible training variance and high accuracy, whereas RNNs can fail (high variance, occasional instability).

Furthermore, NCDEs consistently require significantly less memory during training, sometimes by an order of magnitude, compared to integration-interrupted ODE or RNN hybrids.

5. Implementation Strategies and Trade-Offs

A practical NCDE pipeline consists of the following steps:

  1. Interpolation: Construct the continuous path XtX_t (e.g., via cubic splines or Hermite interpolation) from data. The interpolation must be sufficiently regular for ODE solvers and must preserve timing information for irregular series.
  2. Vector Field Construction: Define a neural network mapping fθf_\theta which outputs an h×(d+1)h \times (d+1) matrix at every zz.
  3. Numerical Solution: Integrate the NCDE using high-quality adaptive ODE solvers; the model can be trained using memory-efficient adjoint-based backpropagation.
  4. Readout: Map the terminal state zTz_T to the output via a downstream (often linear) layer.

Crucial implementation trade-offs include:

  • Interpolant smoothness versus causality; some tasks (e.g., online prediction) require causally-adapted interpolants rather than natural splines.
  • Hidden state and vector field dimensionality, affecting both expressivity and per-step computational cost.
  • Choice of ODE solver: higher-order methods improve accuracy for smooth interpolants but may not offer gains with rough controls.

6. Broader Impact and Theoretical Implications

NCDEs formally bridge rough path theory, deep learning, and classical control. Their theoretical advantage—being universal approximators for functions of time series—provides coverage of all continuous functionals and justifies their use for diverse sequential modeling.

The result that NCDEs “subsume” all ODE-based models which nonlinearly use data as input underscores their generality. The explicit dependence on the derivative dXsdX_s of the control path means that NCDEs can model more complex, non-Markovian, and non-stationary data streams than standard ODE or RNN-based models.

From a practical perspective, NCDEs’ ability to handle missing, irregular, and multivariate data without discrete binning, while offering low memory cost and robust training dynamics, positions them as an influential and theoretically principled tool in sequential data modeling across domains such as healthcare, audio, finance, and beyond.