Random Fourier Signature Features

Updated 25 November 2025

Random Fourier Signature Features are a scalable method for approximating signature kernels, enabling efficient similarity measurement over sequential data.
The approach integrates tensor algebra with level-wise independent Random Fourier Feature maps to provide unbiased estimators with strong uniform error guarantees.
Reduction variants like RFSF-DP and RFSF-TRP optimize computational and memory complexity, making the method practical for large-scale time series and high-dimensional data.

Random Fourier Signature Features (RFSF) provide a scalable framework for approximating the signature kernel—a powerful similarity measure for sequential data—by leveraging random Fourier feature (RFF) methods within the tensor algebra of signature representations. This combination yields unbiased, uniform-approximation estimators for kernel methods on sequences, reducing computational barriers associated with classic signature kernel computation and enabling practical application to very large datasets while preserving expressive kernel structure (Toth et al., 2023).

1. The Signature Kernel and Tensor Algebra

Given a metric space $X \subset \mathbb{R}^d$ , a sequence $x = (x_1, \ldots, x_L)$ of points in $X$ , and a base kernel $k: X \times X \to \mathbb{R}$ with corresponding RKHS $\mathcal{H}$ , the signature kernel encodes multilevel sequential interactions through the discrete signature map. The $m$ -th level signature of $x$ is a tensor in $\mathcal{H}^{\otimes m}$ , constructed by iteratively taking tensor products of differences along the path. Truncating at level $M$ gives the truncated signature kernel: $K^{\leq M}(x, y) = \langle \sigma^{\leq M}(x), \sigma^{\leq M}(y) \rangle_{T(\mathcal{H})} = \sum_{m=0}^M \sum_{i \in \Delta_m(\ell_x-1),\, j \in \Delta_m(\ell_y-1)} \prod_{\ell=1}^m \delta^2_{i_\ell,j_\ell}k(x_{i_\ell},y_{j_\ell})$ where $\delta^2_{i,j}k(x_i, y_j)$ denotes the second-order difference of $k$ (Toth et al., 2023). Computing the Gram matrix for $N$ sequences of length $L$ incurs $O(N^2 L^2 d)$ time, rendering direct application infeasible at scale.

2. Random Fourier Features for Signature Kernels

Random Fourier Features accelerate kernel methods via a mapping $\varphi_D(x)$ (of dimension $D$ ) such that $\langle \varphi_D(x), \varphi_D(y) \rangle$ is an unbiased estimator of a translation-invariant kernel $k(x-y)$ . For the Gaussian (RBF) kernel, this mapping takes the form $(1/\sqrt{D})[e^{i\langle \omega_1,x \rangle}, ..., e^{i\langle \omega_D,x \rangle}]$ with $\omega_i$ drawn from the kernel's spectral measure.

The RFSF approach replaces the "static" feature map $\phi(x) = k(\cdot, x)$ in the discrete signature computation by level-wise independent RFF maps $\varphi_D^{(m)}(x)$ . For a truncation level $M$ and feature size $D$ , independent RFF matrices $W^{(m)}$ are drawn for each $m = 1,...,M$ . The level- $m$ RFSF kernel is then constructed using: $\tilde{k}_m(x, y) = \frac{1}{D} \sum_{\ell=1}^D e^{i\langle \omega_\ell^{(m)}, x - y \rangle}$ and signature features as

$\Phi_M(x) = \sum_{m=0}^M \sum_{i \in \Delta_m(\ell_x-1)} \delta \varphi_1(x_{i_1}) \otimes \cdots \otimes \delta \varphi_m(x_{i_m})$

where $\delta \varphi_m(x_j) = \varphi^{(m)}(x_{j+1}) - \varphi^{(m)}(x_j)$ (Toth et al., 2023). The resulting kernel,

$K^{RFSF}_M(x, y) = \langle \Phi_M(x), \Phi_M(y) \rangle$

is an unbiased estimator of $K^{\leq M}(x, y)$ .

3. Uniform Approximation Guarantees

RFSF enjoys high-probability, uniform approximation guarantees on compact domains. For fixed $M$ and sequences $x, y$ with bounded 1-variation, the supremum error between $K^{\leq M}(x, y)$ and its RFSF estimator is subexponentially small in $D$ : $\mathbb{P} \bigg( \sup_{x, y: \|x\|_1, \|y\|_1 \leq V} |K_m(x, y) - K^{RFSF}_m(x, y)| \geq \varepsilon \bigg) \leq m C_{d, X} \alpha(\varepsilon) \exp\bigg[-\frac{d}{2(d+1)(S^2+R)} \beta(\varepsilon) \bigg]$ where constants depend on $d, M, V, R, S$ and the kernel's Lipschitz constant. This enables setting $D = O((1/\varepsilon^2)\log(C_{d,X}/\delta))$ to attain error at most $\varepsilon$ with failure probability $\delta$ (Toth et al., 2023). The proof applies recursive bias-propagation in tensor levels and Bernstein-type concentration in Banach spaces.

4. Scalable Tensor Reduction Variants: RFSF-DP and RFSF-TRP

Although RFSF is linear in sequence length $L$ , feature dimension scales as $O(D^M)$ . Two reduction strategies, diagonal-projection (RFSF-DP) and tensor-random-projection (RFSF-TRP), alleviate this:

RFSF-DP: Projects onto diagonal tensor entries by averaging over $D$ independent RFFs at each level; total dimension is $D (2^{M+1}-1)$ .
RFSF-TRP: Applies Johnson–Lindenstrauss-type random projections respecting the tensor CP structure, mapping $(\mathbb{R}^{2D})^{\otimes m} \to \mathbb{R}^D$ via rank-1 CP projections, yielding total dimension $M D$ .

Both offer provable concentration inequalities: RFSF-DP has subexponential and RFSF-TRP has $1/(2m)$-subexponential tails in $D$ . Extraction costs are $O(N L D 2^M)$ for RFSF-DP and $O(N L D^2 M)$ for RFSF-TRP. These properties ensure feasibility in high-throughput settings (Toth et al., 2023).

5. Empirical Scaling, Complexity, and Accuracy

On benchmark datasets, RFSF-DP and RFSF-TRP demonstrate negligible loss relative to the exact signature kernel (KSig) on moderate sizes ( $N \leq 1000$ ) and superior performance versus alternative scalable approaches (Random Warping Series, flattened RFF) at larger scale ( $N \geq 1000$ ). On the SITS1M satellite dataset ( $N=10^6$ ), RFSF-DP training with $D \approx 1000$ , $M=4$ completes in minutes, unattainable by other signature-based or kernel approaches.

Method	Time per fit	Memory
KSig (full)	$O(N^2 L^2 (M+d))$	$O(N^2)$
Classical RFF	$O(N L d^2)$	$O(N d^2)$
RFSF-DP	$O(N L D 2^M)$	$O(N D 2^M)$
RFSF-TRP	$O(N L D^2 M)$	$O(N D M)$

Accuracy is competitive: on SITS1M, RFSF-DP achieves $\sim 74\%$ test accuracy, compared to $\sim 61\%$ for RWS and $\sim 72\%$ for classical RFF (Toth et al., 2023).

6. Relation to Classical Random Fourier Features and High-Dimensional Learning

The construction of RFSF is rooted in Random Fourier Features as introduced in prior work (Liao et al., 2020), where RFFs are shown to give unbiased estimators for shift-invariant kernels and permit high-dimensional asymptotics. In the classical regime with large feature dimension $N$ , the empirical Gram matrix of RFF converges (in expectation) to the underlying kernel matrix. However, in the joint high-dimensional setting—where data dimension $p$ , number of samples $n$ , and $N$ scale comparably—the convergence is only in expectation and requires careful analysis. The explicit integration of RFFs into sequential signatures as in RFSF extends these techniques to the tensorial, non-Euclidean context, yielding expressive yet scalable representations (Toth et al., 2023, Liao et al., 2020).

7. Summary and Outlook

Random Fourier Signature Features inherit the expressivity and theoretical guarantees of signature kernels while providing strong uniform error controls and enabling linear computational scaling in both sequence length and data set size. Reduction variants RFSF-DP and RFSF-TRP further extend applicability to million-scale time series, with consistent empirical robustness and accuracy. The methodology aligns RFF-based kernel approximation with the algebraic richness of signatures, offering a principled approach for scalable, powerful sequential similarity in machine learning and data analysis (Toth et al., 2023).

PDF Markdown Chat (Pro)

References (2)

Random Fourier Signature Features (2023)

A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Random Fourier Signature Features.