Papers
Topics
Authors
Recent
2000 character limit reached

Random Fourier Signature Features (RFSF)

Updated 4 December 2025
  • Random Fourier Signature Features (RFSF) are a scalable, randomized approximation of the signature kernel that captures high-order sequence interactions via tensorized increments.
  • The method employs random Fourier features and projection variants (RFSF-DP and RFSF-TRP) to reduce quadratic computational complexity to linear time while ensuring rigorous error bounds.
  • Empirical results show that RFSF achieves competitive classification accuracy and can efficiently process large datasets with up to one million sequences using GPU acceleration.

Random Fourier Signature Features (RFSF) are scalable, randomized feature-map approximations to the signature kernel for sequence data. The signature kernel, derived from tensor algebras, offers powerful similarity measures for sequences by capturing higher-order interactions via its sequence signature. Exact computation of the signature kernel is computationally intensive, scaling quadratically with both sequence length and dataset size. RFSF employs random Fourier features (RFF), enabling a linear-time, unbiased approximation of the signature kernel with rigorous uniform error bounds. Further speedups and dimensionality reductions are achieved via projection-based variants Diagonal-Projection (RFSF-DP) and Tensor Random Projection (RFSF-TRP), supporting empirical scalability to datasets of up to one million sequences while retaining high statistical fidelity (Toth et al., 2023).

1. Signature Kernel Foundations

Let XX denote the input space (typically Rd\mathbb{R}^d) and k:X×XRk: X \times X \to \mathbb{R} a static positive-definite kernel with reproducing kernel Hilbert space (RKHS) H\mathcal{H} and feature map φ:XH\varphi: X \rightarrow \mathcal{H}, φ(x)=k(x,)\varphi(x) = k(x, \cdot). A discrete sequence x=(x1,,xL)XLx = (x_1, \ldots, x_L) \in X^L is "lifted" to the free tensor-algebra T(H)=m=0HmT(\mathcal{H}) = \bigoplus_{m=0}^\infty \mathcal{H}^{\otimes m} using the discrete signature map:

S(x)=i=1L1(1φ(xi+1)φ(xi))=(S0,S1,S2,),S(x) = \prod_{i=1}^{L-1} (1 \oplus \varphi(x_{i+1}) - \varphi(x_i)) = (S_0, S_1, S_2, \ldots),

where S0=1S_0 = 1 and each level mm contains all mm-fold tensor products of increments in RKHS. The truncated signature kernel of depth MM is defined as:

KsigM(x,y):=S(x),S(y)T(H)=m=0MSm(x),Sm(y)Hm.K_{\text{sig}}^{\leq M}(x, y) := \langle S(x), S(y) \rangle_{T(\mathcal{H})} = \sum_{m=0}^M \langle S_m(x), S_m(y) \rangle_{\mathcal{H}^{\otimes m}}.

An equivalent "kernel trick" expansion involves cross-differences δi,j2\delta^2_{i, j} applied to base kernel kk, which results in computation cost O(L2N2)O(L^2 N^2) among NN sequences of length LL (Toth et al., 2023).

2. Random Fourier Signature Features: Construction and Theoretical Guarantees

RFSF approximates KsigMK_{\text{sig}}^{\leq M} with a finite-dimensional randomized map based on the Bochner theorem. For a continuous, translation-invariant k0(x,y)=k0(xy)k_0(x,y) = k_0(x - y),

k0(τ)=RdeiωTτdΛ(ω),k_0(\tau) = \int_{\mathbb{R}^d} e^{i \omega^T \tau} \, d\Lambda(\omega),

where Λ\Lambda is the spectral measure. Standard random Fourier features for k0k_0 are

φRFF(x)=1d[cos(ω1Tx),,cos(ωdTx),sin(ω1Tx),,sin(ωdTx)],\varphi_{\text{RFF}}(x) = \frac{1}{\sqrt{d}} [\cos(\omega_1^T x), \ldots, \cos(\omega_d^T x), \sin(\omega_1^T x), \ldots, \sin(\omega_d^T x)]^\top,

with ωiΛ\omega_i \sim \Lambda iid.

For RFSF, at each signature level mm, independent RFF maps φm\varphi_m are drawn:

  • W(m)W^{(m)} is a d×dd \times d matrix of iid draws from Λ\Lambda,
  • φm(x)=1d[cos(W(m)Tx),sin(W(m)Tx)]R2d\varphi_m(x) = \frac{1}{\sqrt{d}} [\cos(W^{(m)T}x), \sin(W^{(m)T}x)] \in \mathbb{R}^{2d},
  • The RFSF signature S^m(x)\widehat{S}_m(x) accumulates tensorized increments δφm(xi)=φm(xi+1)φm(xi)\delta\varphi_m(x_{i}) = \varphi_m(x_{i+1}) - \varphi_m(x_i).

The RFSF kernel is

K^sigM(x,y)=m=0MS^m(x),S^m(y),\widehat{K}_{\text{sig}}^{\leq M}(x, y) = \sum_{m=0}^M \langle \widehat{S}_m(x), \widehat{S}_m(y) \rangle,

an unbiased estimator of KsigMK_{\text{sig}}^{\leq M}; i.e., E[K^sigM(x,y)]=KsigM(x,y)\mathbb{E}[\widehat{K}_{\text{sig}}^{\leq M}(x, y)] = K_{\text{sig}}^{\leq M}(x, y).

Uniform sup-norm approximation bounds are established under Bernstein conditions on Λ\Lambda and compactness assumptions on XX (Theorem 3.3), with probability tails (for ϵ<β\epsilon < \beta)

P[supx,y:x1,y1VKsig(m)(x,y)K^sig(m)(x,y)ϵ]Cd,Xm(βd,V(m)ϵ)d/(d+1)exp(d2(d+1)(S2+R)ϵβd,V(m)2),\mathbb{P} \bigl[ \sup_{x, y : |x|_1, |y|_1 \leq V} |K_{\text{sig}}^{(m)}(x, y) - \widehat{K}_{\text{sig}}^{(m)}(x, y)| \geq \epsilon \bigr] \leq C_{d,X} m \biggl( \frac{\beta_{d,V}^{(m)}}{\epsilon} \biggr)^{d/(d+1)} \exp\Bigl( -\frac{d}{2(d+1)(S^2 + R)} \frac{\epsilon}{\beta_{d,V}^{(m)}{}^2}\Bigr),

where βd,V(m)=m2V2max(L2,1)max(σΛ2,d)m\beta_{d,V}^{(m)} = m 2V^2 \max(L^2, 1) \max(\sigma_\Lambda^2, d)^m.

3. Algorithmic Variants via Tensor Projections

The feature space (R2d)m(\mathbb{R}^{2d})^{\otimes m} for S^m(x)\widehat{S}_m(x) has dimension (2d)m(2d)^m, which becomes prohibitive for moderate mm. Two scalable dimensionality reduction variants are introduced:

RFSF-DP (Diagonal-Projection):

  • Considers only "diagonal" tensor components, reducing feature size per level to d2md 2^m,
  • Dynamic programming computes all levels in O(NLd2M)O(NL d 2^M) time and O(Nd2M)O(N d 2^M) memory,
  • Error (Theorem 3.5) for fixed x,yx, y, P[K^sigDP,(m)(x,y)Ksig(m)(x,y)ϵ]2exp{12min((dϵ/C)2,(dϵ/C)1/m)}\mathbb{P}[|\widehat{K}_{\text{sig}}^{\text{DP},(m)}(x, y) - K_{\text{sig}}^{(m)}(x, y)| \geq \epsilon] \leq 2 \exp\{-\frac{1}{2}\min((\sqrt{d} \epsilon / C)^2, (d \epsilon / C)^{1/m})\}.

RFSF-TRP (Tensor Random Projection):

  • Applies Johnson–Lindenstrauss–style CP-rank-1 sketches using random Gaussian projections,
  • Each level yields a dd-dimensional feature, for total O(Md)O(Md),
  • Time complexity O(NLMd2)O(N L M d^2),
  • Tail bound (Theorem 3.7) is of hypercontractive type: P[K^sigTRP,(m)(x,y)Ksig(m)(x,y)ϵ]Cd,Λexp([m2d1/(2m)ϵ1/m22e3Rx1y1]1/2)\mathbb{P}[|\widehat{K}_{\text{sig}}^{\text{TRP},(m)}(x, y) - K_{\text{sig}}^{(m)}(x, y)| \geq \epsilon] \leq C_{d,\Lambda}\exp\left( - \left[ \frac{m^2 d^{1/(2m)} \epsilon^{1/m}}{2\sqrt{2} e^3 R x_1 y_1} \right]^{1/2} \right ).
Variant Feature Dimension Time Complexity Memory
Full RFSF (2d)M(2d)^M O(NLdM)O(NL d^M) --
RFSF-DP d2Md 2^M O(NLd2M)O(NL d 2^M) O(Nd2M)O(N d 2^M)
RFSF-TRP MdM d O(NLMd2)O(NL M d^2) O(NMd)O(N M d)

4. Empirical Performance and Scalability

RFSF and its variants are benchmarked on multivariate time-series classification from the UEA archive and large-scale datasets. For moderate-size datasets (N1000N \leq 1000, L18000L \leq 18000, d30d \leq 30):

  • RFSF-DP and RFSF-TRP achieve average accuracies $0.740$ and $0.738$, close to the full signature kernel ($0.756$), and consistently match or outperform the signature kernel on certain tasks,
  • RFSF methods are $5$–10×10\times faster than exact quadratic-time signature kernels.

On large-scale tasks (NN up to 10610^6):

  • Only feature-based methods (RFSF-DP, RFSF-TRP, RWS, RFF) are feasible,
  • RFSF-TRP ranks first in accuracy (average $0.699$) against RWS ($0.655$) and RFF ($0.635$), and exhibits the lowest average rank,
  • For the SITS1M task (N=106N = 10^6), RFSF-TRP trains in approximately $3$ minutes (on GPU), compared to >2>2 hours for RWS (Toth et al., 2023).

5. Limitations and Practical Considerations

The RFSF approximation guarantees hold only with high probability; the number of random features dd must be chosen sufficiently large relative to the desired error tolerance ϵ\epsilon and truncation depth MM. At higher MM:

  • TRP’s tail exponent degrades as d1/(2m)d^{1/(2m)},
  • DP’s error bound weakens polynomially in mm.

Other considerations include:

  • Choice of static feature embedding and truncation depth MM (with M=3M=3–$5$ typically sufficient),
  • Randomization/variants of RFF (orthogonal, quasi-Monte-Carlo, leverage-score reweighting) and alternative embeddings are viable,
  • All algorithms are efficiently vectorizable on GPUs (e.g., with the KSig library),
  • Memory usage: O(Nd2M)O(N d 2^M) for DP, O(NMd)O(N M d) for TRP,
  • Practical recommendations: d=500d=500–$2000$ for M5M \leq 5, TRP for very large NN and low memory, DP for moderate MM, full RFSF if O(dM)O(d^M) is affordable.

6. Extensions and Significance

RFSF provides a scheme to approximate infinite-dimensional sequence similarity using finite-dimensional, tractable random features. Computational gains are realized by reducing the complexity from O(N2L2)O(N^2 L^2) to O(NLpoly(d,M))O(NL \cdot \text{poly}(d,M)), without sacrificing the uniform error concentration properties of classical RFFs. The framework is extensible to improved random feature distributions (e.g., orthogonal/quasi-Monte-Carlo, leverage-score-based sampling) and can employ alternative kernel randomizations. Streaming implementations and GPU vectorization are straightforward given the structure of the algorithms. Empirically, RFSF achieves state-of-the-art or near state-of-the-art accuracy while supporting applications to datasets unattainable by competing kernel methods (Toth et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Random Fourier Signature Features (RFSF).