Random Fourier Feature Embeddings

Updated 30 January 2026

Random Fourier Feature (RFF) embeddings are a method that approximates shift-invariant kernels using randomized Fourier maps derived from Bochner's theorem.
They employ efficient sampling and transformation techniques—including adaptive, structured, and multi-scale methods—to approximate various kernels such as RBF, Cauchy, and asymmetric variants.
Their established error bounds and computational strategies make RFF embeddings suitable for scalable machine learning applications like SVMs, ridge regression, and quantum models.

Random Fourier Feature (RFF) embeddings are a central methodology for approximating positive-definite, shift-invariant kernels in scalable machine learning. The RFF framework originated from the realization that, by Bochner's theorem, any continuous, shift-invariant kernel can be represented as the Fourier transform of a probability measure, enabling explicit, randomized feature maps that approximate kernel functions via high-dimensional inner products. Generalizations now encompass data-adaptive RFF constructions, structured and orthogonalized variants, multi-scale embeddings, extensions to indefinite and asymmetric kernels, efficient training via compression and resampling, and applicability to advanced architectures such as quantum models and interpretable networks. Below, key theoretical foundations, sampling algorithms, extensions, computational strategies, and applications are detailed.

1. Theoretical Foundations: Bochner’s Theorem and Kernel Spectral Representations

Bochner’s theorem forms the basis for RFF embeddings by relating continuous, shift-invariant positive-definite kernels $k(x, y) = \kappa(x-y)$ to their spectral density $p(\omega)$ : $k(x-y) = \int_{\mathbb{R}^d} e^{i\omega^\top (x-y)}\, p(\omega) d\omega,$ where $p(\omega)$ is a nonnegative measure (often a density). This applies to isotropic kernels $\kappa(\|x-y\|)$ , including RBF and beyond (Langrené et al., 2024).

Scale-mixture Representation: For isotropic kernels, a fundamental result is that $\kappa(r)$ , under certain conditions (complete monotonicity), admits a scale mixture representation: $\kappa(r) = \int_{0}^{\infty} e^{-\lambda r^\alpha} p(\lambda)\, d\lambda,$ where $p(\lambda)$ is a probability density and $\alpha \in (0,2]$ characterizes the kernel (Langrené et al., 2024).

A random frequency vector for the RFF map is then constructed as

$\omega = R^{1/\alpha} S_\alpha,$

where $R\sim p(\lambda)$ , and $S_\alpha$ is a symmetric $\alpha$ -stable random vector, such that $\mathbb{E}[e^{i u^\top S_\alpha}] = e^{-\|u\|^\alpha}$ .

2. Practical Algorithms for RFF Sampling and Feature Construction

Canonical RFF Map: For fixed $M$ features, and for each $j=1,\dots,M$ ,

Sample mixing variable $\lambda_j \sim p(\lambda)$ (e.g., delta, gamma, beta, $F$ ).
Sample $S_{\alpha,j}$ via Gaussian mixture representation (Devroye–Nolan method):

$S_{\alpha,j} = \sqrt{2A_{\alpha,j}} N_j, \quad N_j \sim \mathcal{N}(0, I_d),$

with $A_{\alpha,j}$ an auxiliary variable depending on $\alpha$ .

Form $\omega_j = \lambda_j^{1/\alpha} S_{\alpha, j}$ .
Sample $b_j \sim U[0,2\pi]$ .

The RFF feature is

$\phi(x) = \sqrt{\frac{2}{M}} [\cos(\omega_1^\top x + b_1), \, \sin(\omega_1^\top x + b_1), \ldots, \cos(\omega_M^\top x + b_M), \, \sin(\omega_M^\top x + b_M)].$

Examples of $p(\lambda)$ for Key Kernel Families:

Kernel type	$\kappa(r)$	Mixing Distribution $p(\lambda)$
Exponential-power/Gaussian	$\exp(-r^\beta),~\beta\in(0,2]$	$\delta(\lambda-1)$
Generalized Cauchy	$\left(1+\frac{r^\alpha}{2\beta}\right)^{-\beta}$	Gamma $(\beta, 1/(2\beta))$
Kummer/Beta	see details (Langrené et al., 2024)	Beta $(\beta, \gamma)$
Tricomi	see details (Langrené et al., 2024)	$F$ -distribution ( $F_{2\beta, 2\gamma}$ )

Features for new kernels become immediately available by changing only the $p(\lambda)$ sampler.

3. Extensions to Indefinite and Asymmetric Kernels

Signed and Complex Spectral Measures: For symmetric indefinite kernels (i.e., positive and negative spectrum), expand $p(\omega)$ into its Jordan decomposition $p_+ - p_-$ , and construct RFF embeddings using samples from both $p_+$ and $p_-$ . For asymmetric kernels, one uses complex spectral measures split into four finite positive measures, with sampling over each and assembling a cosine–sine feature map (He et al., 2022).

Algorithmic Structure for Asymmetric RFFs (AsK-RFFs):

For $M$ features, separately sample from $\tilde\mu_R^+$ , $\tilde\mu_R^-$ , $\tilde\mu_I^+$ .
Form feature blocks $\phi^+$ , $\phi^-$ , $\psi$ using cosines and sines appropriately weighted.
The kernel approximation is a sum and difference of corresponding inner products.

Rigorous uniform convergence guarantees similar to classical RFF apply provided the total mass of the spectra remains finite (He et al., 2022).

4. Structure, Compression, and Efficiency Techniques

Orthogonal and Structured RFFs:

Orthogonal Random Features (ORF) employ sampling from random orthogonal matrices to reduce variance, using QR decompositions and scale corrections (Yu et al., 2016).
Structured Orthogonal Random Features (SORF) employ fast transforms (e.g., Hadamard, sign-flip, permutation matrices) to achieve $O(D\log d)$ projection cost and $O(d)$ space (Yu et al., 2016).

Compression via Teacher–Learner Framework and CERF:

High-precision teacher RFF embeddings are constructed, and a compact learner embedding minimizes reconstruction error via constrained variational EM, with explicit orthogonal mixing. Masked and blocked CERF variants enable fast evaluation and reduced arithmetic complexity, e.g. $O(D\cdot S)$ or $O(D \log d')$ per point (Wangni et al., 2017).

5. Adaptive, Multi-Scale, and Data-Dependent Approaches

Adaptive RFF (ARFF): Particle filter–style resampling aligns the sampling of RFF frequencies to the inferred spectral density of the target function, stabilizing convergence and reducing hyperparameter sensitivity. Metropolis steps may be omitted in favor of pure random walks when accompanied by systematic resampling (Kammonen et al., 2024).

Multi-Scale RFF: In reservoir computing, multi-scale RFF maps concatenate features from multiple kernel bandwidths to capture both fast and slow dynamics in time series or biological signals. This mixture yields universal approximators for multi-scale functions and outperforms single-scale RFF in long-horizon forecasting (Laha, 4 Nov 2025).

Pseudo-Bayesian RFF: PAC-Bayesian learning interprets the spectral density as a prior on trigonometric hypotheses, learning posteriors with explicit generalization bounds and enabling data-dependent selection of informative frequencies, landmarks-based representations, and kernel alignment optimizations (Letarte et al., 2018).

6. Applications and Implementation Recommendations

RFF embeddings have wide applicability to kernel SVMs, ridge regression, GPs, reservoir computing (RC), interpretable deep networks (KAN/KAF), and explainable mixtures of additive models (Langrené et al., 2024, Zhang et al., 9 Feb 2025, Huang et al., 22 Dec 2025).

Computational Guidance:

Identify kernel family, express as mixture $\kappa(r) = \int e^{-\lambda r^{\alpha}} p(\lambda)\,d\lambda$ .
Sample mixing variable and stable vector per feature, assemble trigonometric map.
Extensions to asymmetric/indefinite kernels require appropriate spectral decomposition.
For efficiency, employ orthogonality and structure in the projection matrix when $d$ is large.
Multi-scale and adaptive strategies augment expressivity and optimize sample efficiency.

7. Error Bounds, Sample Complexity, and Limitations

Monte Carlo error of the RFF approximation is $O(1/\sqrt{M})$ if the second moment of $\|\omega\|$ is finite. For heavy-tailed (low $\alpha$ ) mixtures, more features may be required. Concentration inequalities guarantee uniform kernel error given sufficient sample size. Compressed representations (CERF) maintain this accuracy with reduced evaluation complexity. However, for highly non-Euclidean or non-stationary kernels, adaptation or tailored sampling laws may be necessary.

References

(Langrené et al., 2024) "A spectral mixture representation of isotropic kernels to generalize random Fourier features"
(He et al., 2022) "Random Fourier Features for Asymmetric Kernels"
(Yu et al., 2016) "Orthogonal Random Features"
(Wangni et al., 2017) "Learning Random Fourier Features by Hybrid Constrained Optimization"
(Laha, 4 Nov 2025) "Reservoir Computing via Multi-Scale Random Fourier Features for Forecasting Fast-Slow Dynamical Systems"
(Kammonen et al., 2024) "Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression"
(Letarte et al., 2018) "Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior"
(Zhang et al., 9 Feb 2025) "Kolmogorov-Arnold Fourier Networks"
(Huang et al., 22 Dec 2025) "Cluster-Based Generalized Additive Models Informed by Random Fourier Features"

This comprehensive framework places RFF embeddings as a universally extensible, implementable, and theoretically justified technology across both classical and emerging kernel-based learning paradigms.