Random Fourier Signature Features
- Random Fourier Signature Features are a scalable method for approximating signature kernels, enabling efficient similarity measurement over sequential data.
- The approach integrates tensor algebra with level-wise independent Random Fourier Feature maps to provide unbiased estimators with strong uniform error guarantees.
- Reduction variants like RFSF-DP and RFSF-TRP optimize computational and memory complexity, making the method practical for large-scale time series and high-dimensional data.
Random Fourier Signature Features (RFSF) provide a scalable framework for approximating the signature kernel—a powerful similarity measure for sequential data—by leveraging random Fourier feature (RFF) methods within the tensor algebra of signature representations. This combination yields unbiased, uniform-approximation estimators for kernel methods on sequences, reducing computational barriers associated with classic signature kernel computation and enabling practical application to very large datasets while preserving expressive kernel structure (Toth et al., 2023).
1. The Signature Kernel and Tensor Algebra
Given a metric space , a sequence of points in , and a base kernel with corresponding RKHS , the signature kernel encodes multilevel sequential interactions through the discrete signature map. The -th level signature of is a tensor in , constructed by iteratively taking tensor products of differences along the path. Truncating at level gives the truncated signature kernel: where denotes the second-order difference of (Toth et al., 2023). Computing the Gram matrix for sequences of length incurs time, rendering direct application infeasible at scale.
2. Random Fourier Features for Signature Kernels
Random Fourier Features accelerate kernel methods via a mapping (of dimension ) such that is an unbiased estimator of a translation-invariant kernel . For the Gaussian (RBF) kernel, this mapping takes the form with drawn from the kernel's spectral measure.
The RFSF approach replaces the "static" feature map in the discrete signature computation by level-wise independent RFF maps . For a truncation level and feature size , independent RFF matrices are drawn for each . The level- RFSF kernel is then constructed using: and signature features as
where (Toth et al., 2023). The resulting kernel,
is an unbiased estimator of .
3. Uniform Approximation Guarantees
RFSF enjoys high-probability, uniform approximation guarantees on compact domains. For fixed and sequences with bounded 1-variation, the supremum error between and its RFSF estimator is subexponentially small in : where constants depend on and the kernel's Lipschitz constant. This enables setting to attain error at most with failure probability (Toth et al., 2023). The proof applies recursive bias-propagation in tensor levels and Bernstein-type concentration in Banach spaces.
4. Scalable Tensor Reduction Variants: RFSF-DP and RFSF-TRP
Although RFSF is linear in sequence length , feature dimension scales as . Two reduction strategies, diagonal-projection (RFSF-DP) and tensor-random-projection (RFSF-TRP), alleviate this:
- RFSF-DP: Projects onto diagonal tensor entries by averaging over independent RFFs at each level; total dimension is .
- RFSF-TRP: Applies Johnson–Lindenstrauss-type random projections respecting the tensor CP structure, mapping via rank-1 CP projections, yielding total dimension .
Both offer provable concentration inequalities: RFSF-DP has subexponential and RFSF-TRP has $1/(2m)$-subexponential tails in . Extraction costs are for RFSF-DP and for RFSF-TRP. These properties ensure feasibility in high-throughput settings (Toth et al., 2023).
5. Empirical Scaling, Complexity, and Accuracy
On benchmark datasets, RFSF-DP and RFSF-TRP demonstrate negligible loss relative to the exact signature kernel (KSig) on moderate sizes () and superior performance versus alternative scalable approaches (Random Warping Series, flattened RFF) at larger scale (). On the SITS1M satellite dataset (), RFSF-DP training with , completes in minutes, unattainable by other signature-based or kernel approaches.
| Method | Time per fit | Memory |
|---|---|---|
| KSig (full) | ||
| Classical RFF | ||
| RFSF-DP | ||
| RFSF-TRP |
Accuracy is competitive: on SITS1M, RFSF-DP achieves test accuracy, compared to for RWS and for classical RFF (Toth et al., 2023).
6. Relation to Classical Random Fourier Features and High-Dimensional Learning
The construction of RFSF is rooted in Random Fourier Features as introduced in prior work (Liao et al., 2020), where RFFs are shown to give unbiased estimators for shift-invariant kernels and permit high-dimensional asymptotics. In the classical regime with large feature dimension , the empirical Gram matrix of RFF converges (in expectation) to the underlying kernel matrix. However, in the joint high-dimensional setting—where data dimension , number of samples , and scale comparably—the convergence is only in expectation and requires careful analysis. The explicit integration of RFFs into sequential signatures as in RFSF extends these techniques to the tensorial, non-Euclidean context, yielding expressive yet scalable representations (Toth et al., 2023, Liao et al., 2020).
7. Summary and Outlook
Random Fourier Signature Features inherit the expressivity and theoretical guarantees of signature kernels while providing strong uniform error controls and enabling linear computational scaling in both sequence length and data set size. Reduction variants RFSF-DP and RFSF-TRP further extend applicability to million-scale time series, with consistent empirical robustness and accuracy. The methodology aligns RFF-based kernel approximation with the algebraic richness of signatures, offering a principled approach for scalable, powerful sequential similarity in machine learning and data analysis (Toth et al., 2023).