Random Fourier Signature Features (RFSF)
- Random Fourier Signature Features (RFSF) are a scalable, randomized approximation of the signature kernel that captures high-order sequence interactions via tensorized increments.
- The method employs random Fourier features and projection variants (RFSF-DP and RFSF-TRP) to reduce quadratic computational complexity to linear time while ensuring rigorous error bounds.
- Empirical results show that RFSF achieves competitive classification accuracy and can efficiently process large datasets with up to one million sequences using GPU acceleration.
Random Fourier Signature Features (RFSF) are scalable, randomized feature-map approximations to the signature kernel for sequence data. The signature kernel, derived from tensor algebras, offers powerful similarity measures for sequences by capturing higher-order interactions via its sequence signature. Exact computation of the signature kernel is computationally intensive, scaling quadratically with both sequence length and dataset size. RFSF employs random Fourier features (RFF), enabling a linear-time, unbiased approximation of the signature kernel with rigorous uniform error bounds. Further speedups and dimensionality reductions are achieved via projection-based variants Diagonal-Projection (RFSF-DP) and Tensor Random Projection (RFSF-TRP), supporting empirical scalability to datasets of up to one million sequences while retaining high statistical fidelity (Toth et al., 2023).
1. Signature Kernel Foundations
Let denote the input space (typically ) and a static positive-definite kernel with reproducing kernel Hilbert space (RKHS) and feature map , . A discrete sequence is "lifted" to the free tensor-algebra using the discrete signature map:
where and each level contains all -fold tensor products of increments in RKHS. The truncated signature kernel of depth is defined as:
An equivalent "kernel trick" expansion involves cross-differences applied to base kernel , which results in computation cost among sequences of length (Toth et al., 2023).
2. Random Fourier Signature Features: Construction and Theoretical Guarantees
RFSF approximates with a finite-dimensional randomized map based on the Bochner theorem. For a continuous, translation-invariant ,
where is the spectral measure. Standard random Fourier features for are
with iid.
For RFSF, at each signature level , independent RFF maps are drawn:
- is a matrix of iid draws from ,
- ,
- The RFSF signature accumulates tensorized increments .
The RFSF kernel is
an unbiased estimator of ; i.e., .
Uniform sup-norm approximation bounds are established under Bernstein conditions on and compactness assumptions on (Theorem 3.3), with probability tails (for )
where .
3. Algorithmic Variants via Tensor Projections
The feature space for has dimension , which becomes prohibitive for moderate . Two scalable dimensionality reduction variants are introduced:
RFSF-DP (Diagonal-Projection):
- Considers only "diagonal" tensor components, reducing feature size per level to ,
- Dynamic programming computes all levels in time and memory,
- Error (Theorem 3.5) for fixed , .
RFSF-TRP (Tensor Random Projection):
- Applies Johnson–Lindenstrauss–style CP-rank-1 sketches using random Gaussian projections,
- Each level yields a -dimensional feature, for total ,
- Time complexity ,
- Tail bound (Theorem 3.7) is of hypercontractive type: .
| Variant | Feature Dimension | Time Complexity | Memory |
|---|---|---|---|
| Full RFSF | -- | ||
| RFSF-DP | |||
| RFSF-TRP |
4. Empirical Performance and Scalability
RFSF and its variants are benchmarked on multivariate time-series classification from the UEA archive and large-scale datasets. For moderate-size datasets (, , ):
- RFSF-DP and RFSF-TRP achieve average accuracies $0.740$ and $0.738$, close to the full signature kernel ($0.756$), and consistently match or outperform the signature kernel on certain tasks,
- RFSF methods are $5$– faster than exact quadratic-time signature kernels.
On large-scale tasks ( up to ):
- Only feature-based methods (RFSF-DP, RFSF-TRP, RWS, RFF) are feasible,
- RFSF-TRP ranks first in accuracy (average $0.699$) against RWS ($0.655$) and RFF ($0.635$), and exhibits the lowest average rank,
- For the SITS1M task (), RFSF-TRP trains in approximately $3$ minutes (on GPU), compared to hours for RWS (Toth et al., 2023).
5. Limitations and Practical Considerations
The RFSF approximation guarantees hold only with high probability; the number of random features must be chosen sufficiently large relative to the desired error tolerance and truncation depth . At higher :
- TRP’s tail exponent degrades as ,
- DP’s error bound weakens polynomially in .
Other considerations include:
- Choice of static feature embedding and truncation depth (with –$5$ typically sufficient),
- Randomization/variants of RFF (orthogonal, quasi-Monte-Carlo, leverage-score reweighting) and alternative embeddings are viable,
- All algorithms are efficiently vectorizable on GPUs (e.g., with the KSig library),
- Memory usage: for DP, for TRP,
- Practical recommendations: –$2000$ for , TRP for very large and low memory, DP for moderate , full RFSF if is affordable.
6. Extensions and Significance
RFSF provides a scheme to approximate infinite-dimensional sequence similarity using finite-dimensional, tractable random features. Computational gains are realized by reducing the complexity from to , without sacrificing the uniform error concentration properties of classical RFFs. The framework is extensible to improved random feature distributions (e.g., orthogonal/quasi-Monte-Carlo, leverage-score-based sampling) and can employ alternative kernel randomizations. Streaming implementations and GPU vectorization are straightforward given the structure of the algorithms. Empirically, RFSF achieves state-of-the-art or near state-of-the-art accuracy while supporting applications to datasets unattainable by competing kernel methods (Toth et al., 2023).