Neural Controlled Differential Equations
- Neural Controlled Differential Equations are data-driven models that generalize Neural ODEs to handle irregular, partially observed time series using principles from rough path theory.
- They leverage continuous interpolation and controlled differential equations to update hidden states in response to dynamic, non-uniform data inputs.
- Empirical results show NCDEs achieve robust performance and superior accuracy across healthcare, speech, and trajectory tasks while significantly reducing memory usage.
Neural Controlled Differential Equations (NCDEs) are a class of neural dynamical systems that generalize Neural Ordinary Differential Equations (Neural ODEs) to provide a principled, data-driven framework for modeling irregular and partially observed multivariate time series. NCDEs leverage controlled differential equations from rough path theory, allowing the system to evolve its hidden state under the “control” of a continuously interpolated input path constructed from the observed data. This innovation directly addresses the inability of classical ODE models to assimilate new information mid-trajectory and enables both adaptability and theoretical expressiveness in learning functions on path space.
1. Mathematical Formulation and Theoretical Foundations
The core of the NCDE framework is the following evolution law for the hidden state : Here:
- is the hidden state at time ,
- is the “control path”, typically constructed by interpolating the observed time series with time included as an additional channel (i.e., ),
- is the learnable vector field parameterized by a neural network.
The integration is in the sense of a Riemann–Stieltjes (or, when required, Young/rough paths) integral, which allows treatment of controls with limited regularity—enabling robust modeling of observed, irregular data streams.
Notably, this formulation generalizes Neural ODEs, which can be written as
where the driving path is simply time, making the trajectory uniquely determined by the initial condition. In contrast, NCDEs are driven by the data itself, so that the trajectory evolves as new inputs arrive.
The paper establishes a universal approximation property: for any continuous function on path space, there exists a neural CDE mapping initial observations to an output that approximates it arbitrarily well (informally, “the action of a linear map on the terminal value of a Neural CDE is a universal approximator from sequences to ”). Furthermore, every continuous data-driven ODE model of the form
can be exactly represented by an NCDE, but the converse is not true. Thus, NCDEs are strictly more expressive.
2. Modeling Irregular and Partially Observed Time Series
Classic RNNs and many ODE-based models presuppose uniform sampling in time or rely on imputation/binning to handle irregularity. NCDEs natively encode both irregular timestamp information and partially observed channels by:
- Interpolating each observed feature to construct a continuous ,
- Concatenating time and optional observation-frequency indicators (improving the model’s awareness of when and how often each feature is measured).
Because the hidden state is updated via integration with respect to (and not simply in discrete increments), NCDEs continuously and naturally assimilate incoming information. Adaptive ODE solvers can be used, with function evaluations concentrated where the path varies more.
Partially observed variables are handled by independent (or jointly masked) interpolation, and the addition of “observational intensity” channels enables the model to capture informative cross-channel temporal sampling patterns.
3. Memory-Efficient Adjoint-Based Backpropagation
A salient advantage of the NCDE framework is the ability to use adjoint sensitivity methods for memory-efficient training across the entire irregular observation window. Unlike ODE-RNN/hybrid models, which require storing intermediate states at every observation (incurring memory per observation), the smooth, uninterrupted integration of NCDE enables use of the adjoint method across the entire interval. The memory footprint remains proportional to the hidden state size, with a small overhead depending on the number of ODE solver steps.
Implementation commonly proceeds by reducing the NCDE to an equivalent ODE (valid for controls with sufficient regularity),
and then integrating
allowing the use of standard ODE solvers such as Runge–Kutta and adjoint-based differentiation libraries (e.g., torchdiffeq).
4. Empirical Performance Across Datasets
Empirical results demonstrate the efficacy of NCDEs across diverse domains and noise regimes:
- CharacterTrajectories: Under extensive random dropping of data (30–70%), NCDEs retain stable accuracy, outperforming baselines such as GRU-ODE, ODE-RNN, GRU-Δt, and GRU-D.
- PhysioNet Sepsis Prediction: On this ICU patient dataset with high irregularity and partial observability, the NCDE model achieves the highest AUC when using observational intensity; performance remains robust even without it, often exceeding that of ODE- and RNN-based models.
- Speech Commands: For fully observed, regularly sampled audio time series, NCDE surpasses all tested alternatives, showing negligible training variance and high accuracy, whereas RNNs can fail (high variance, occasional instability).
Furthermore, NCDEs consistently require significantly less memory during training, sometimes by an order of magnitude, compared to integration-interrupted ODE or RNN hybrids.
5. Implementation Strategies and Trade-Offs
A practical NCDE pipeline consists of the following steps:
- Interpolation: Construct the continuous path (e.g., via cubic splines or Hermite interpolation) from data. The interpolation must be sufficiently regular for ODE solvers and must preserve timing information for irregular series.
- Vector Field Construction: Define a neural network mapping which outputs an matrix at every .
- Numerical Solution: Integrate the NCDE using high-quality adaptive ODE solvers; the model can be trained using memory-efficient adjoint-based backpropagation.
- Readout: Map the terminal state to the output via a downstream (often linear) layer.
Crucial implementation trade-offs include:
- Interpolant smoothness versus causality; some tasks (e.g., online prediction) require causally-adapted interpolants rather than natural splines.
- Hidden state and vector field dimensionality, affecting both expressivity and per-step computational cost.
- Choice of ODE solver: higher-order methods improve accuracy for smooth interpolants but may not offer gains with rough controls.
6. Broader Impact and Theoretical Implications
NCDEs formally bridge rough path theory, deep learning, and classical control. Their theoretical advantage—being universal approximators for functions of time series—provides coverage of all continuous functionals and justifies their use for diverse sequential modeling.
The result that NCDEs “subsume” all ODE-based models which nonlinearly use data as input underscores their generality. The explicit dependence on the derivative of the control path means that NCDEs can model more complex, non-Markovian, and non-stationary data streams than standard ODE or RNN-based models.
From a practical perspective, NCDEs’ ability to handle missing, irregular, and multivariate data without discrete binning, while offering low memory cost and robust training dynamics, positions them as an influential and theoretically principled tool in sequential data modeling across domains such as healthcare, audio, finance, and beyond.