Papers
Topics
Authors
Recent
Search
2000 character limit reached

Signature & Neural RDE Backbones

Updated 24 April 2026
  • Signature and Neural RDE Backbones are continuous-time machine learning architectures that integrate rough path signatures with neural differential equations to encode sequential data.
  • They leverage windowed embeddings and controlled ODE/RDE solvers to efficiently reduce dimensionality and capture path-dependent, non-Markovian dynamics.
  • Empirical results demonstrate significant improvements over RNNs, including up to 50% error reduction and enhanced stability in applications like financial modeling and dynamical systems.

Signature and Neural RDE Backbones are a class of continuous-time machine learning architectures that combine the theoretical foundations of path signatures from rough path theory with the dynamical modeling capabilities of neural rough differential equations (RDEs). These methodologies provide a rigorous, universally expressive, and scalable basis for encoding sequential or time-series data in a manner that preserves path-dependent (non-Markovian) information while enabling continuous-time hidden state evolution. They are typically used in place of, or as alternatives to, recurrent neural network (RNN)-based encoders, offering enhanced stability, expressivity, and computational efficiency for modeling systems and processes where memory and history are fundamental.

1. Signature Transform: Mathematical Framework and Properties

Let X:[0,T]RdX:[0,T]\to\mathbb{R}^d be a continuous path of bounded variation. The signature of the path, denoted Sig(X)\mathrm{Sig}(X), is defined as the collection of all iterated integrals

Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.

The kkth level is given by

Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.

The signature up to time tt, Sig(X[0,t])\mathrm{Sig}(X_{[0,t]}), is a lossless, universal summary of the path, with uniqueness ensured after time augmentation. For paths of bounded variation, the norm of the kkth term decays as XTVk/k!\Vert X\Vert_{\mathrm{TV}}^k / k!, facilitating practical truncation. The log-signature is the formal logarithm of the signature tensor series and yields a minimal, non-redundant coordinate (basis in the free Lie algebra) for fixed truncation level NN (Pradeleix et al., 15 Sep 2025, Fang et al., 2023, Alzahrani, 12 Oct 2025, Morrill et al., 2020, Fermanian et al., 2021).

A central property is universality: any continuous functional on path space can be approximated arbitrarily well by a linear functional of the signature up to suitably high order. This underpins the use of signatures as continuous-time analogues to the discrete hidden-state in RNNs (Pradeleix et al., 15 Sep 2025, Fermanian et al., 2021).

2. Encoder–Dynamics–Decoder Architectures with Signature and Neural RDEs

Signature-based backbones insert the signature transform at the point of input encoding, followed by a continuous-time latent dynamics model. Given pathwise inputs Sig(X)\mathrm{Sig}(X)0,

  • Compute a truncated signature Sig(X)\mathrm{Sig}(X)1, with Sig(X)\mathrm{Sig}(X)2.
  • For large Sig(X)\mathrm{Sig}(X)3 or Sig(X)\mathrm{Sig}(X)4, use windowed embeddings: apply a small MLP Sig(X)\mathrm{Sig}(X)5 to short windows, compute signatures (or log-signatures) for each, concatenate, and project to a lower-dimensional latent vector Sig(X)\mathrm{Sig}(X)6.

In the latent core, a state Sig(X)\mathrm{Sig}(X)7 (or Sig(X)\mathrm{Sig}(X)8) evolves via controlled ODE/RDE: Sig(X)\mathrm{Sig}(X)9 or, for RDEs,

Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.0

with a decoder Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.1 recovering the output as Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.2 (Pradeleix et al., 15 Sep 2025, Alzahrani, 12 Oct 2025).

In "neural RDE" (NRDE) backbones, the update is driven by increments of the log-signature over a coarser grid, using the log-ODE method. The vector field Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.3 is typically a small feedforward network, and integration is performed over each window with a standard ODE solver, where backpropagation is memory-efficient via continuous adjoint methods (Fang et al., 2023, Morrill et al., 2020).

3. Universality, Expressivity, and Theoretical Guarantees

Signatures serve as the continuous-time analogue of RNN hidden states: any RNN applied to continuously sampled data is approximable by a linear transformation of the path signature, establishing the universality of signature features (Pradeleix et al., 15 Sep 2025, Fermanian et al., 2021). The unique representation property ensures no information loss (in the time-augmented setting). The factorial decay governs truncation error, making low-order truncations practical (Pradeleix et al., 15 Sep 2025).

The role of random neural vector fields in RDEs is clarified by signature reconstruction theory. If the Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.4 hidden dimension exceeds the signature order Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.5, and depth-two neural fields with real-analytic activation (e.g., Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.6) are used, the RDE endpoint map can recover all Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.7 signature features of order Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.8, with the number of recoverable features growing exponentially in Sig(X)=(1,0TdXt,0<t1<t2<TdXt1dXt2,)k=0(Rd)k.\mathrm{Sig}(X) = \left(1,\, \int_{0}^{T} dX_t,\, \int_{0<t_1<t_2<T} dX_{t_1} \otimes dX_{t_2},\, \dots \right) \in \prod_{k=0}^\infty (\mathbb{R}^d)^{\otimes k}.9 (Glückstad et al., 5 Feb 2025).

Moreover, in the infinite-width and -depth limit, controlled ResNets and Neural CDEs (with identity activation) converge to signature kernel machines, with model outputs determined solely by the inner product of signatures ("signature kernel"), justifying their interpretation as kernel mean embeddings in the signature space (Cirone et al., 2023, Fermanian et al., 2021).

4. Implementation, Scalability, and Stability

Signature (or log-signature) truncation level kk0 is typically set in kk1 for practical tasks. Feature growth is exponential: dimension scales as kk2 for signatures, and as kk3 for log-signatures. Efficient implementations employ dedicated libraries (e.g., iisignature, signatory) supporting CPU/GPU batch processing.

For large kk4 or kk5, windowed embedding and dimension reduction are standard; linear projections are commonly applied before signature computation for scalability in high-dimensional path spaces (Pradeleix et al., 15 Sep 2025, Fang et al., 2023, Alzahrani, 12 Oct 2025).

Signature-based models are immune to vanishing/exploding gradients associated with deep RNNs, as signature features are computed directly/algebraically rather than through recurrent unrolling. Adjoint-based gradient computation in neural ODE/RDE solvers enables constant memory scaling with respect to sequence length, compared to the kk6 scaling of classic RNNs (Morrill et al., 2020, Fang et al., 2023).

5. Empirical Performance and Applications

Empirical results demonstrate the superior performance, efficiency, and robustness of signature and neural RDE backbones in diverse domains:

  • In controlled dynamical systems (delayed Lotka–Volterra, spiral DDE, FitzHugh–Nagumo, Rössler), signature-based encoders reduced RMSE by up to kk7 versus GRU encoders and exhibited lower variance across seeds. Training was kk8 faster per epoch and exhibited enhanced robustness to noise and history truncation (Pradeleix et al., 15 Sep 2025).
  • For path-dependent PDEs/BSDEs in high dimensions (kk9 to Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.0), neural RDEs with log-signature features drastically reduced error and memory usage compared to LSTM+signature baselines, achieving relative errors of Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.1 at Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.2 and memory cost stable in Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.3 (Fang et al., 2023).
  • In financial option pricing and portfolio optimization, signature–neural RDE solvers outperformed both Neural CDEs and RNNs in relative error, tail risk (CVaR), and HJB residuals for Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.4 up to Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.5, with ablations showing practical signature depth Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.6–Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.7 as optimal (Alzahrani, 12 Oct 2025).
  • For very long time series (length up to Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.8k), neural RDEs achieved higher predictive performance and >10Sigk(X)={0<t1<<tk<TdXt1i1dXtkik:1i1,,ikd}.\mathrm{Sig}^k(X) = \left\{ \int_{0<t_1<\cdots<t_k<T} dX_{t_1}^{i_1}\cdots dX_{t_k}^{i_k} : 1\leq i_1,\dots,i_k\leq d \right\}.9 runtime and memory reduction compared to full-sequence CDEs and RNN baselines (Morrill et al., 2020).

Continuous-time models such as Neural ODEs, Neural CDEs, and Neural SDEs can be viewed as special cases or natural partners of the signature–RDE backbone, differing in the nature of their vector fields and drivers. Windowed and fully non-Markovian architectures (e.g., latent ODEs with signature encoders, stochastic control solvers) benefit directly from the universality and lossless memory of signatures (Pradeleix et al., 15 Sep 2025, Fang et al., 2023, Fermanian et al., 2021).

Multiple works establish direct correspondence between RNNs and kernel machines on path signatures: residual RNNs in the continuous-time limit reduce to linear predictors on signature features, with associated generalization bounds in signature RKHS (Fermanian et al., 2021). Controlled ResNets with random initialization converge—under proper width and depth scaling—to neural signature kernels, capturing all the high-order interaction statistics indexed by the choice of activation. When activation is identity, the signature kernel emerges explicitly; for nonlinearities, a kernel PDE governs the process, motivating the notion of "neural signature kernels" (Cirone et al., 2023).

7. Limitations and Future Directions

Current challenges include combinatorial growth in feature dimension with path/channel count and signature/log-signature order, which necessitates adaptive truncation, sparsity-promoting schemes, or log-signature compression (Pradeleix et al., 15 Sep 2025, Fang et al., 2023). Selecting optimal truncation order and combining signature levels in a data-dependent, automatic fashion remain active areas of research.

For true rough path–driven models (stochastic paths or infinite-dimensional cases), further theoretical advances and efficient numerical schemes are needed. Extending neural RDEs to stochastic drivers, infinite-dimensional paths, and real-world partially observed control problems with partially missing or highly irregular data are noted future directions (Pradeleix et al., 15 Sep 2025, Alzahrani, 12 Oct 2025).


Key Reference Papers

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Signature and Neural RDE Backbones.