Signature & Neural RDE Backbones

Updated 24 April 2026

Signature and Neural RDE Backbones are continuous-time machine learning architectures that integrate rough path signatures with neural differential equations to encode sequential data.
They leverage windowed embeddings and controlled ODE/RDE solvers to efficiently reduce dimensionality and capture path-dependent, non-Markovian dynamics.
Empirical results demonstrate significant improvements over RNNs, including up to 50% error reduction and enhanced stability in applications like financial modeling and dynamical systems.

Signature and Neural RDE Backbones are a class of continuous-time machine learning architectures that combine the theoretical foundations of path signatures from rough path theory with the dynamical modeling capabilities of neural rough differential equations (RDEs). These methodologies provide a rigorous, universally expressive, and scalable basis for encoding sequential or time-series data in a manner that preserves path-dependent (non-Markovian) information while enabling continuous-time hidden state evolution. They are typically used in place of, or as alternatives to, recurrent neural network (RNN)-based encoders, offering enhanced stability, expressivity, and computational efficiency for modeling systems and processes where memory and history are fundamental.

1. Signature Transform: Mathematical Framework and Properties

Let $X:[0,T]\to\mathbb{R}^d$ be a continuous path of bounded variation. The signature of the path, denoted $\mathrm{Sig}(X)$ , is defined as the collection of all iterated integrals

$\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$

The $k$ th level is given by

$\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$

The signature up to time $t$ , $\mathrm{Sig}(X_{[0,t]})$ , is a lossless, universal summary of the path, with uniqueness ensured after time augmentation. For paths of bounded variation, the norm of the $k$ th term decays as $\Vert X\Vert_{\mathrm{TV}}^k / k!$ , facilitating practical truncation. The log-signature is the formal logarithm of the signature tensor series and yields a minimal, non-redundant coordinate (basis in the free Lie algebra) for fixed truncation level $N$ (Pradeleix et al., 15 Sep 2025, Fang et al., 2023, Alzahrani, 12 Oct 2025, Morrill et al., 2020, Fermanian et al., 2021).

A central property is universality: any continuous functional on path space can be approximated arbitrarily well by a linear functional of the signature up to suitably high order. This underpins the use of signatures as continuous-time analogues to the discrete hidden-state in RNNs (Pradeleix et al., 15 Sep 2025, Fermanian et al., 2021).

2. Encoder–Dynamics–Decoder Architectures with Signature and Neural RDEs

Signature-based backbones insert the signature transform at the point of input encoding, followed by a continuous-time latent dynamics model. Given pathwise inputs $\mathrm{Sig}(X)$ 0,

Compute a truncated signature $\mathrm{Sig}(X)$ 1, with $\mathrm{Sig}(X)$ 2.
For large $\mathrm{Sig}(X)$ 3 or $\mathrm{Sig}(X)$ 4, use windowed embeddings: apply a small MLP $\mathrm{Sig}(X)$ 5 to short windows, compute signatures (or log-signatures) for each, concatenate, and project to a lower-dimensional latent vector $\mathrm{Sig}(X)$ 6.

In the latent core, a state $\mathrm{Sig}(X)$ 7 (or $\mathrm{Sig}(X)$ 8) evolves via controlled ODE/RDE: $\mathrm{Sig}(X)$ 9 or, for RDEs,

$\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 0

with a decoder $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 1 recovering the output as $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 2 (Pradeleix et al., 15 Sep 2025, Alzahrani, 12 Oct 2025).

In "neural RDE" (NRDE) backbones, the update is driven by increments of the log-signature over a coarser grid, using the log-ODE method. The vector field $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 3 is typically a small feedforward network, and integration is performed over each window with a standard ODE solver, where backpropagation is memory-efficient via continuous adjoint methods (Fang et al., 2023, Morrill et al., 2020).

3. Universality, Expressivity, and Theoretical Guarantees

Signatures serve as the continuous-time analogue of RNN hidden states: any RNN applied to continuously sampled data is approximable by a linear transformation of the path signature, establishing the universality of signature features (Pradeleix et al., 15 Sep 2025, Fermanian et al., 2021). The unique representation property ensures no information loss (in the time-augmented setting). The factorial decay governs truncation error, making low-order truncations practical (Pradeleix et al., 15 Sep 2025).

The role of random neural vector fields in RDEs is clarified by signature reconstruction theory. If the $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 4 hidden dimension exceeds the signature order $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 5, and depth-two neural fields with real-analytic activation (e.g., $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 6) are used, the RDE endpoint map can recover all $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 7 signature features of order $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 8, with the number of recoverable features growing exponentially in $\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.$ 9 (Glückstad et al., 5 Feb 2025).

Moreover, in the infinite-width and -depth limit, controlled ResNets and Neural CDEs (with identity activation) converge to signature kernel machines, with model outputs determined solely by the inner product of signatures ("signature kernel"), justifying their interpretation as kernel mean embeddings in the signature space (Cirone et al., 2023, Fermanian et al., 2021).

4. Implementation, Scalability, and Stability

Signature (or log-signature) truncation level $k$ 0 is typically set in $k$ 1 for practical tasks. Feature growth is exponential: dimension scales as $k$ 2 for signatures, and as $k$ 3 for log-signatures. Efficient implementations employ dedicated libraries (e.g., iisignature, signatory) supporting CPU/GPU batch processing.

For large $k$ 4 or $k$ 5, windowed embedding and dimension reduction are standard; linear projections are commonly applied before signature computation for scalability in high-dimensional path spaces (Pradeleix et al., 15 Sep 2025, Fang et al., 2023, Alzahrani, 12 Oct 2025).

Signature-based models are immune to vanishing/exploding gradients associated with deep RNNs, as signature features are computed directly/algebraically rather than through recurrent unrolling. Adjoint-based gradient computation in neural ODE/RDE solvers enables constant memory scaling with respect to sequence length, compared to the $k$ 6 scaling of classic RNNs (Morrill et al., 2020, Fang et al., 2023).

5. Empirical Performance and Applications

Empirical results demonstrate the superior performance, efficiency, and robustness of signature and neural RDE backbones in diverse domains:

In controlled dynamical systems (delayed Lotka–Volterra, spiral DDE, FitzHugh–Nagumo, Rössler), signature-based encoders reduced RMSE by up to $k$ 7 versus GRU encoders and exhibited lower variance across seeds. Training was $k$ 8 faster per epoch and exhibited enhanced robustness to noise and history truncation (Pradeleix et al., 15 Sep 2025).
For path-dependent PDEs/BSDEs in high dimensions ( $k$ 9 to $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 0), neural RDEs with log-signature features drastically reduced error and memory usage compared to LSTM+signature baselines, achieving relative errors of $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 1 at $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 2 and memory cost stable in $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 3 (Fang et al., 2023).
In financial option pricing and portfolio optimization, signature–neural RDE solvers outperformed both Neural CDEs and RNNs in relative error, tail risk (CVaR), and HJB residuals for $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 4 up to $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 5, with ablations showing practical signature depth $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 6– $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 7 as optimal (Alzahrani, 12 Oct 2025).
For very long time series (length up to $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 8k), neural RDEs achieved higher predictive performance and >10 $\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.$ 9 runtime and memory reduction compared to full-sequence CDEs and RNN baselines (Morrill et al., 2020).

Continuous-time models such as Neural ODEs, Neural CDEs, and Neural SDEs can be viewed as special cases or natural partners of the signature–RDE backbone, differing in the nature of their vector fields and drivers. Windowed and fully non-Markovian architectures (e.g., latent ODEs with signature encoders, stochastic control solvers) benefit directly from the universality and lossless memory of signatures (Pradeleix et al., 15 Sep 2025, Fang et al., 2023, Fermanian et al., 2021).

Multiple works establish direct correspondence between RNNs and kernel machines on path signatures: residual RNNs in the continuous-time limit reduce to linear predictors on signature features, with associated generalization bounds in signature RKHS (Fermanian et al., 2021). Controlled ResNets with random initialization converge—under proper width and depth scaling—to neural signature kernels, capturing all the high-order interaction statistics indexed by the choice of activation. When activation is identity, the signature kernel emerges explicitly; for nonlinearities, a kernel PDE governs the process, motivating the notion of "neural signature kernels" (Cirone et al., 2023).

7. Limitations and Future Directions

Current challenges include combinatorial growth in feature dimension with path/channel count and signature/log-signature order, which necessitates adaptive truncation, sparsity-promoting schemes, or log-signature compression (Pradeleix et al., 15 Sep 2025, Fang et al., 2023). Selecting optimal truncation order and combining signature levels in a data-dependent, automatic fashion remain active areas of research.

For true rough path–driven models (stochastic paths or infinite-dimensional cases), further theoretical advances and efficient numerical schemes are needed. Extending neural RDEs to stochastic drivers, infinite-dimensional paths, and real-world partially observed control problems with partially missing or highly irregular data are noted future directions (Pradeleix et al., 15 Sep 2025, Alzahrani, 12 Oct 2025).

Key Reference Papers

"Learning non-Markovian Dynamical Systems with Signature-based Encoders" (Pradeleix et al., 15 Sep 2025)
"A Neural RDE-based model for solving path-dependent PDEs" (Fang et al., 2023)
"Signature Reconstruction from Randomized Signatures" (Glückstad et al., 5 Feb 2025)
"Deep Signature and Neural RDE Methods for Path-Dependent Portfolio Optimization" (Alzahrani, 12 Oct 2025)
"Neural Rough Differential Equations for Long Time Series" (Morrill et al., 2020)
"Neural signature kernels as infinite-width-depth-limits of controlled ResNets" (Cirone et al., 2023)
"Framing RNN as a kernel method: A neural ODE approach" (Fermanian et al., 2021)