RFF-based Gaussian Processes

Updated 10 September 2025

Random Fourier Features-based Gaussian Processes are scalable kernel learning methods that approximate covariance matrices via randomized trigonometric maps derived from Bochner's theorem.
They combine Monte Carlo sampling with deterministic quadrature to achieve faster convergence and extend kernel approximations beyond traditional RBF models.
Recent advances enable hardware acceleration, adaptive spectral sampling, and integration with deep and compositional models for diverse real-world applications.

Random Fourier Features-based Gaussian Processes (RFF-GPs) constitute a scalable family of kernel machine approaches that use finite-dimensional randomized feature maps to efficiently approximate the covariance matrices of Gaussian process (GP) models. The core enabling principle is Bochner’s theorem, which provides a spectral representation of shift-invariant positive definite kernels as the Fourier transform of a nonnegative measure. This foundational link justifies approximating the kernel integral by a Monte Carlo sum over trigonometric basis functions sampled from the spectral density, reducing kernel computations to linear algebra in the random feature space. Recent advances significantly broaden the scope of RFF-GPs, refining convergence rates, generalizing the spectral mixture construction to cover diverse isotropic kernels, connecting to deterministic quadrature feature maps, and extending to compositional models and hardware-accelerated frameworks.

1. Theoretical Foundations and Spectral Representations

Random Fourier Features (RFF) were originally motivated by Rahimi and Recht's work, where a shift-invariant kernel $K(u) = k(x - x')$ is represented via

$K(u) = \int_{\mathbb{R}^d} \cos(\eta^\top u)\, f(\eta)\, d\eta,$

with $f(\eta)$ being the spectral density. Bochner’s theorem ensures that any positive definite, shift-invariant kernel admits such a representation. The RFF approach draws $M$ independent samples $\eta_j$ from $f(\eta)$ and defines the feature map

$\phi(x) = \sqrt{\tfrac{2}{M}} \left[ \cos(\eta_1^\top x + b_1),\ldots, \cos(\eta_M^\top x + b_M) \right]^\top,$

with $b_j$ uniformly random in $[0,2\pi]$ . The kernel is thus approximated as $k(x,x') \approx \phi(x)^\top \phi(x')$ .

A recent and significant advance in (Langrené et al., 5 Nov 2024) demonstrates that the spectral density $f(\eta)$ of any isotropic kernel in $\mathbb{R}^d$ can be expressed as a scale mixture of $\alpha$ -stable distributions: $\eta = (\lambda R)^{1/\alpha} S_\alpha,$ where $S_\alpha$ is a symmetric $\alpha$ -stable random vector, $R$ is a nonnegative mixing random variable specific to the kernel class, and $\lambda,\alpha$ are kernel parameters. Through this, kernels such as generalized Cauchy, exponential power, Matérn, and novel beta/Kummer/Tricomi types are incorporated, generalizing RFF sampling to a much wider functional class.

2. Methodological Developments: Sampling, Quadrature, and Deterministic Features

Traditional RFF schemes use i.i.d. Gaussian sampling for the RBF kernel, but the spectral mixture framework allows efficient sampling for a broad kernel set by transforming a Gaussian (or stable) random vector with a randomly scaled norm, where the scale is an explicit function of the kernel (Langrené et al., 5 Nov 2024). The scale-mixed sampling is computationally more efficient (up to 3 $\times$ faster than tensor product constructions) and enables direct RFF representations for non-Gaussian, non-Matérn, and heavy-tailed kernels.

Apart from randomized Monte Carlo integration, deterministic alternatives employ Gaussian quadrature or other quadrature rules. Deterministic Quadrature Fourier Features (DQFF) (Dao et al., 2017, Shustin et al., 2021) replace random sampling with a grid of nodes and weights tailored to the kernel's spectral density: $\tilde{k}(u) = \sum_{i=1}^D a_i \cos(\omega_i^\top u),$ achieving $\epsilon$ -approximation error with $O(\epsilon^{-1/\gamma})$ features (for any $\gamma>0$ ), which improves over the $O(\epsilon^{-2})$ for standard RFF as $\epsilon \to 0$ . For ANOVA and structurally sparse kernels, quadrature-based feature maps offer further gains, with scaling linear in the number of sub-kernels.

Trigonometric Quadrature Fourier Features (TQFF, (Li et al., 2023)) use quadrature rules that are exact for trigonometric polynomials, critical for oscillatory integrals in the spectral domain. TQFF produces practically superior error bounds and delivers well-calibrated GP uncertainty estimates with fewer features, outperforming both RFF and polynomial-based quadrature in challenging regimes (short length-scales, sparse data).

3. Convergence, Error Bounds, and Statistical Guarantees

The error rate for the RFF kernel approximation is $O(M^{-1/2})$ under standard Monte Carlo; the probability of the error exceeding a threshold $\epsilon$ drops exponentially in the number of random features and is independent of the input dimension for the Gaussian kernel (Honorio et al., 2017). Deterministic quadrature methods enjoy even faster convergence and provide explicit $\ell_\infty$ error bounds deterministically over the region of interest (Dao et al., 2017).

Spectral matrix approximation analysis (Avron et al., 2018) establishes that for kernel ridge regression, a spectral approximation of the kernel matrix $(K+\lambda I)$ by its RFF-based low-rank proxy suffices for controlling the excess risk. The required number of features depends either on the "statistical dimension" $s_\lambda = \mathrm{tr}[(K+\lambda I)^{-1}K]$ or on the data size $n/\lambda$ , with optimal sampling distributions (ridge leverage score-based) cutting the feature requirement from $O(n/\lambda)$ to $O(s_\lambda)$ in low-dimensional, bounded settings.

Systematic bias is inherent in the nonlinear use of RFF-approximate kernel matrices in log marginal likelihood calculations, leading to a tendency to overfit hyperparameters (Potapczynski et al., 2021). Debiasing strategies using randomized truncation estimators eliminate this bias at the cost of increased estimator variance.

4. Extensions: Compositional, Adaptive, and Structured Models

The compositional and deep extensions of Gaussian processes—such as deep latent force models, deep GPs, and convolutional GPs—have been accelerated and enriched by RFF and variational Fourier features (VFF, interdomain/integral features) (McDonald et al., 2021, Shi et al., 1 Jul 2024). In these models, RKHS-based Fourier features (possibly modulated by ODE or PDE-derived Green's functions) are used for inducing variables, enabling both non-stationary kernel learning and physically-informed modeling.

Adaptive random Fourier features, such as those in kernel least-mean-square adaptive filters (Gao et al., 2022), update the spectral (bandwidth) parameters online via stochastic gradient updates, alleviating the need for prior kernel selection and enhancing tracking in non-stationary domains.

In the context of multi-output GPs and latent force models (Guarnizo et al., 2018), RFFs are used to decouple the integral and convolutional structure of sophisticated kernels, yielding efficient algebraic approximations for cross-covariances, thereby reducing the cubic cost of GP regression with complex kernel forms to near-linear.

5. Practical Algorithms, Applications, and Empirical Performance

RFF-GP methods have been applied successfully in diverse high-throughput and online settings: unsupervised time-series segmentation (RFF-GP-HSMM (Saito et al., 14 Jul 2025)), kernel-based SLAM (Kapushev et al., 2020), Latent Variable Models for non-Gaussian data (Zhang et al., 2023), scalable GP emulation for global sensitivity analysis and optimization (Do et al., 19 Jul 2025), and complex dynamical system emulation (Mohammadi et al., 2021). Quantized RFFs further compress model memory by encoding features into few-bit representations with controlled distortion (Li et al., 2021).

Concretely, in time-series segmentation, RFF-GP-HSMM replaces $O(N^3)$ GP matrix inversions with $O(M^3)$ computations for much smaller $M$ , yielding $278\times$ speed-up on 39,200-frame datasets with no loss in segmentation accuracy (Saito et al., 14 Jul 2025). In Bayesian optimization and global sensitivity analysis, RFF-based posterior sampling reliably produces credible uncertainty estimates and supports fast functional evaluations in high-dimensional domains (Do et al., 19 Jul 2025).

Empirical studies on image and speech domains (Dao et al., 2017, Li et al., 2023) consistently show deterministic and TQFF-based features exceeding the accuracy or efficiency of classical RFF at moderate or high feature budgets, with demonstrable improvements in kernel approximation error, negative log likelihood, and uncertainty quantification.

6. Advanced Implementations: Orthogonal and Quantum-Assisted Random Features

Orthogonal random features (ORF) (Demni et al., 2023) select sampling directions using Haar orthogonal matrices, introducing weak negative dependence between features, resulting in strictly smaller variance (for small input distances) than standard RFF. The bias for ORF-based Gaussian kernel approximations is expressed in terms of normalized Bessel functions ( $j_{\nu}$ ), and is sandwiched between two exponential bounds, converging to that of RFF at small $z$ and to a broader bandwidth at high dimensions—thereby providing sharper approximation confidence.

Quantum-assisted Gaussian process regression using random Fourier features (Galvis-Florez et al., 30 Jul 2025) combines classical low-rank RFF-based kernel approximations with quantum subroutines (quantum principal component analysis, phase estimation, Hadamard and SWAP tests) to achieve polynomial-order speed-ups in the prediction step. Here, the RFF matrix is encoded into a quantum state, decomposed spectrally, and used to efficiently compute GP posterior means and variances through coherent rotations. This approach is particularly suited for large-scale data with small to medium feature representations.

7. Generalizations, Limitations, and Research Outlook

The generalization of RFFs to isotropic kernels as explicit mixtures of $\alpha$ -stable distributions (Langrené et al., 5 Nov 2024) provides a unified framework for spectral kernel construction, including a broad family of previously intractable kernels. The main limitation of both Monte Carlo RFF and deterministic quadrature feature maps is their curse of dimensionality in high dimensions, though sparsity-inducing constructions and sparse grid quadrature offer practical alleviation in structured settings. The statistical suboptimality of classical RFFs associated with their non-leverage-informed sampling can be partially addressed by data-adaptive optimal distribution sampling (Avron et al., 2018) and quantized feature schemes (Li et al., 2021).

Proposed research directions include:

Further integration of deterministic and randomized/quasi-Monte Carlo methods for improved high-dimensional scaling (Dao et al., 2017, Li et al., 2023),
Exploring rich structured and compositional kernels via spectral mixtures (Langrené et al., 5 Nov 2024),
Enhanced stochastic variational inference for deep, non-stationary, and physically-informed models (Shi et al., 1 Jul 2024, McDonald et al., 2021),
Hybrid algorithms utilizing quantum resources for large-scale kernel regression (Galvis-Florez et al., 30 Jul 2025),
Development of data-dependent feature selection and adaptive quadrature strategies for globally efficient approximation.

This framework collectively establishes RFF-based Gaussian processes and their deterministic and structured generalizations as cornerstones of scalable, flexible kernel machine learning, extending their relevance to an ever-growing array of real-world applications and computational settings.