Random Fourier Signature Features (RFSF)

Updated 4 December 2025

Random Fourier Signature Features (RFSF) are a scalable, randomized approximation of the signature kernel that captures high-order sequence interactions via tensorized increments.
The method employs random Fourier features and projection variants (RFSF-DP and RFSF-TRP) to reduce quadratic computational complexity to linear time while ensuring rigorous error bounds.
Empirical results show that RFSF achieves competitive classification accuracy and can efficiently process large datasets with up to one million sequences using GPU acceleration.

Random Fourier Signature Features (RFSF) are scalable, randomized feature-map approximations to the signature kernel for sequence data. The signature kernel, derived from tensor algebras, offers powerful similarity measures for sequences by capturing higher-order interactions via its sequence signature. Exact computation of the signature kernel is computationally intensive, scaling quadratically with both sequence length and dataset size. RFSF employs random Fourier features (RFF), enabling a linear-time, unbiased approximation of the signature kernel with rigorous uniform error bounds. Further speedups and dimensionality reductions are achieved via projection-based variants Diagonal-Projection (RFSF-DP) and Tensor Random Projection (RFSF-TRP), supporting empirical scalability to datasets of up to one million sequences while retaining high statistical fidelity (Toth et al., 2023).

1. Signature Kernel Foundations

Let $X$ denote the input space (typically $\mathbb{R}^d$ ) and $k: X \times X \to \mathbb{R}$ a static positive-definite kernel with reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ and feature map $\varphi: X \rightarrow \mathcal{H}$ , $\varphi(x) = k(x, \cdot)$ . A discrete sequence $x = (x_1, \ldots, x_L) \in X^L$ is "lifted" to the free tensor-algebra $T(\mathcal{H}) = \bigoplus_{m=0}^\infty \mathcal{H}^{\otimes m}$ using the discrete signature map:

$S(x) = \prod_{i=1}^{L-1} (1 \oplus \varphi(x_{i+1}) - \varphi(x_i)) = (S_0, S_1, S_2, \ldots),$

where $S_0 = 1$ and each level $m$ contains all $m$ -fold tensor products of increments in RKHS. The truncated signature kernel of depth $M$ is defined as:

$K_{\text{sig}}^{\leq M}(x, y) := \langle S(x), S(y) \rangle_{T(\mathcal{H})} = \sum_{m=0}^M \langle S_m(x), S_m(y) \rangle_{\mathcal{H}^{\otimes m}}.$

An equivalent "kernel trick" expansion involves cross-differences $\delta^2_{i, j}$ applied to base kernel $k$ , which results in computation cost $O(L^2 N^2)$ among $N$ sequences of length $L$ (Toth et al., 2023).

2. Random Fourier Signature Features: Construction and Theoretical Guarantees

RFSF approximates $K_{\text{sig}}^{\leq M}$ with a finite-dimensional randomized map based on the Bochner theorem. For a continuous, translation-invariant $k_0(x,y) = k_0(x - y)$ ,

$k_0(\tau) = \int_{\mathbb{R}^d} e^{i \omega^T \tau} \, d\Lambda(\omega),$

where $\Lambda$ is the spectral measure. Standard random Fourier features for $k_0$ are

$\varphi_{\text{RFF}}(x) = \frac{1}{\sqrt{d}} [\cos(\omega_1^T x), \ldots, \cos(\omega_d^T x), \sin(\omega_1^T x), \ldots, \sin(\omega_d^T x)]^\top,$

with $\omega_i \sim \Lambda$ iid.

For RFSF, at each signature level $m$ , independent RFF maps $\varphi_m$ are drawn:

$W^{(m)}$ is a $d \times d$ matrix of iid draws from $\Lambda$ ,
$\varphi_m(x) = \frac{1}{\sqrt{d}} [\cos(W^{(m)T}x), \sin(W^{(m)T}x)] \in \mathbb{R}^{2d}$ ,
The RFSF signature $\widehat{S}_m(x)$ accumulates tensorized increments $\delta\varphi_m(x_{i}) = \varphi_m(x_{i+1}) - \varphi_m(x_i)$ .

The RFSF kernel is

$\widehat{K}_{\text{sig}}^{\leq M}(x, y) = \sum_{m=0}^M \langle \widehat{S}_m(x), \widehat{S}_m(y) \rangle,$

an unbiased estimator of $K_{\text{sig}}^{\leq M}$ ; i.e., $\mathbb{E}[\widehat{K}_{\text{sig}}^{\leq M}(x, y)] = K_{\text{sig}}^{\leq M}(x, y)$ .

Uniform sup-norm approximation bounds are established under Bernstein conditions on $\Lambda$ and compactness assumptions on $X$ (Theorem 3.3), with probability tails (for $\epsilon < \beta$ )

$\mathbb{P} \bigl[ \sup_{x, y : |x|_1, |y|_1 \leq V} |K_{\text{sig}}^{(m)}(x, y) - \widehat{K}_{\text{sig}}^{(m)}(x, y)| \geq \epsilon \bigr] \leq C_{d,X} m \biggl( \frac{\beta_{d,V}^{(m)}}{\epsilon} \biggr)^{d/(d+1)} \exp\Bigl( -\frac{d}{2(d+1)(S^2 + R)} \frac{\epsilon}{\beta_{d,V}^{(m)}{}^2}\Bigr),$

where $\beta_{d,V}^{(m)} = m 2V^2 \max(L^2, 1) \max(\sigma_\Lambda^2, d)^m$ .

3. Algorithmic Variants via Tensor Projections

The feature space $(\mathbb{R}^{2d})^{\otimes m}$ for $\widehat{S}_m(x)$ has dimension $(2d)^m$ , which becomes prohibitive for moderate $m$ . Two scalable dimensionality reduction variants are introduced:

RFSF-DP (Diagonal-Projection):

Considers only "diagonal" tensor components, reducing feature size per level to $d 2^m$ ,
Dynamic programming computes all levels in $O(NL d 2^M)$ time and $O(N d 2^M)$ memory,
Error (Theorem 3.5) for fixed $x, y$ , $\mathbb{P}[|\widehat{K}_{\text{sig}}^{\text{DP},(m)}(x, y) - K_{\text{sig}}^{(m)}(x, y)| \geq \epsilon] \leq 2 \exp\{-\frac{1}{2}\min((\sqrt{d} \epsilon / C)^2, (d \epsilon / C)^{1/m})\}$ .

RFSF-TRP (Tensor Random Projection):

Applies Johnson–Lindenstrauss–style CP-rank-1 sketches using random Gaussian projections,
Each level yields a $d$ -dimensional feature, for total $O(Md)$ ,
Time complexity $O(N L M d^2)$ ,
Tail bound (Theorem 3.7) is of hypercontractive type: $\mathbb{P}[|\widehat{K}_{\text{sig}}^{\text{TRP},(m)}(x, y) - K_{\text{sig}}^{(m)}(x, y)| \geq \epsilon] \leq C_{d,\Lambda}\exp\left( - \left[ \frac{m^2 d^{1/(2m)} \epsilon^{1/m}}{2\sqrt{2} e^3 R x_1 y_1} \right]^{1/2} \right )$ .

Variant	Feature Dimension	Time Complexity	Memory
Full RFSF	$(2d)^M$	$O(NL d^M)$	--
RFSF-DP	$d 2^M$	$O(NL d 2^M)$	$O(N d 2^M)$
RFSF-TRP	$M d$	$O(NL M d^2)$	$O(N M d)$

4. Empirical Performance and Scalability

RFSF and its variants are benchmarked on multivariate time-series classification from the UEA archive and large-scale datasets. For moderate-size datasets ( $N \leq 1000$ , $L \leq 18000$ , $d \leq 30$ ):

RFSF-DP and RFSF-TRP achieve average accuracies $0.740$ and $0.738$, close to the full signature kernel ($0.756$), and consistently match or outperform the signature kernel on certain tasks,
RFSF methods are $5$– $10\times$ faster than exact quadratic-time signature kernels.

On large-scale tasks ( $N$ up to $10^6$ ):

Only feature-based methods (RFSF-DP, RFSF-TRP, RWS, RFF) are feasible,
RFSF-TRP ranks first in accuracy (average $0.699$) against RWS ($0.655$) and RFF ($0.635$), and exhibits the lowest average rank,
For the SITS1M task ( $N = 10^6$ ), RFSF-TRP trains in approximately $3$ minutes (on GPU), compared to $>2$ hours for RWS (Toth et al., 2023).

5. Limitations and Practical Considerations

The RFSF approximation guarantees hold only with high probability; the number of random features $d$ must be chosen sufficiently large relative to the desired error tolerance $\epsilon$ and truncation depth $M$ . At higher $M$ :

TRP’s tail exponent degrades as $d^{1/(2m)}$ ,
DP’s error bound weakens polynomially in $m$ .

Other considerations include:

Choice of static feature embedding and truncation depth $M$ (with $M=3$ –$5$ typically sufficient),
Randomization/variants of RFF (orthogonal, quasi-Monte-Carlo, leverage-score reweighting) and alternative embeddings are viable,
All algorithms are efficiently vectorizable on GPUs (e.g., with the KSig library),
Memory usage: $O(N d 2^M)$ for DP, $O(N M d)$ for TRP,
Practical recommendations: $d=500$ –$2000$ for $M \leq 5$ , TRP for very large $N$ and low memory, DP for moderate $M$ , full RFSF if $O(d^M)$ is affordable.

6. Extensions and Significance

RFSF provides a scheme to approximate infinite-dimensional sequence similarity using finite-dimensional, tractable random features. Computational gains are realized by reducing the complexity from $O(N^2 L^2)$ to $O(NL \cdot \text{poly}(d,M))$ , without sacrificing the uniform error concentration properties of classical RFFs. The framework is extensible to improved random feature distributions (e.g., orthogonal/quasi-Monte-Carlo, leverage-score-based sampling) and can employ alternative kernel randomizations. Streaming implementations and GPU vectorization are straightforward given the structure of the algorithms. Empirically, RFSF achieves state-of-the-art or near state-of-the-art accuracy while supporting applications to datasets unattainable by competing kernel methods (Toth et al., 2023).

PDF Markdown Chat (Pro)

References (1)

Random Fourier Signature Features (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Random Fourier Signature Features (RFSF).