Papers
Topics
Authors
Recent
2000 character limit reached

Random Fourier Feature Representations

Updated 23 December 2025
  • Random Fourier Feature representations are randomized finite-dimensional maps that approximate shift-invariant kernels using Bochner's theorem.
  • The scale-mixture approach generalizes classical kernels, extending RFF methods to models such as generalized Gaussian, Matérn, and Cauchy kernels.
  • These techniques enable efficient computation in SVM, kernel ridge regression, and Gaussian process regression while preserving theoretical error guarantees.

Random Fourier Feature (RFF) representations are randomized finite-dimensional feature maps constructed to efficiently approximate positive-definite, shift-invariant kernels. These representations enable scalable kernel methods for large-scale machine learning by replacing implicit, infinite-dimensional feature spaces with explicit linear models while preserving the geometry induced by the kernel. The classical construction applies Bochner’s theorem to relate a shift-invariant kernel to its spectral measure, then samples randomized bases accordingly. Recent advances extend the RFF principle to broad classes of isotropic kernels beyond the classical Gaussian and Laplacian forms by realizing their spectral distributions as explicit scale mixtures of α-stable distributions, thereby generalizing RFFs to new families of kernels with provable approximation guarantees (Langrené et al., 5 Nov 2024).

1. Foundations: Bochner’s Theorem and the RFF Principle

At the heart of RFFs lies Bochner’s theorem: any continuous, positive-definite, shift-invariant kernel k(x,y)=k(xy)k(x, y) = k(x - y) on Rd\mathbb{R}^d is the inverse Fourier transform of a finite nonnegative measure μ\mu: k(xy)=Rdeiω(xy)μ(dω).k(x-y) = \int_{\mathbb{R}^d} e^{i\omega^\top(x-y)}\,\mu(d\omega). For kernels admitting an absolutely continuous spectral measure, this yields an explicit density p(ω)p(\omega), known as the kernel’s spectral distribution. Sampling frequencies ω1,,ωMp(ω)\omega_1, \dots, \omega_M \sim p(\omega) and random phases biUniform[0,2π]b_i \sim \text{Uniform}[0, 2\pi], the RFF map is: φ(x)=2M[cos(ω1x+b1),,cos(ωMx+bM)].\varphi(x) = \sqrt{\frac{2}{M}}\,\bigl[\cos(\omega_1^\top x + b_1), \dots, \cos(\omega_M^\top x + b_M)\bigr]^\top. The empirical kernel estimate is then φ(x),φ(y)k(x,y)\langle \varphi(x), \varphi(y) \rangle \approx k(x, y). This construction underlies the classical RFF approach for Gaussian, Laplace, and related kernels (Langrené et al., 5 Nov 2024).

2. Scale-Mixture Representations for Isotropic Kernels

A comprehensive generalization of RFF to wide classes of isotropic kernels is achieved by representing their spectral measures as scale mixtures of symmetric α\alpha-stable random vectors. The central result states that for any d1d \geq 1, α(0,2]\alpha \in (0, 2], scale λ>0\lambda > 0, and nonnegative random variable RR with characteristic function ϕR(t)\phi_R(t), the random vector

η=(λR)1/αSα\boldsymbol\eta = (\lambda R)^{1/\alpha}\, \boldsymbol S_\alpha

(with Sα\boldsymbol S_\alpha a standard symmetric α\alpha-stable random vector in Rd\mathbb{R}^d) has

E[eiηu]=ϕR(iλuα).\mathbb{E}\left[e^{i \boldsymbol\eta^\top u}\right] = \phi_R\big(i \lambda \|u\|^\alpha\big).

Thus, every kernel of the form k(u)=ϕR(iλuα)k(\|u\|) = \phi_R(i\lambda\|u\|^\alpha) is positive-definite, shift-invariant, and isotropic. This encapsulates a broad family of models, including generalized Gaussian, Matérn, generalized Cauchy, Beta, Kummer, and Tricomi kernels (Langrené et al., 5 Nov 2024).

3. RFF Algorithm for General Isotropic Kernels

The construction of RFFs for these generalized kernels proceeds as follows:

  • Given a target kernel k(r)=ϕR(iλrα)k(r) = \phi_R(i\lambda r^\alpha), identify the “mixing law” mRm_R: the distribution of RR.
  • For each feature:
    • Sample RimRR_i \sim m_R.
    • Sample Aα(i)A_{\alpha}^{(i)} (a positive scalar for stable mixture; e.g., via the specified random variable construction in Proposition 3 of (Langrené et al., 5 Nov 2024)).
    • Sample N(i)N(0,Id)N^{(i)} \sim \mathcal{N}(0, I_d).
    • Set wi=(λRi)1/α2Aα(i)N(i)w_i = (\lambda R_i)^{1/\alpha} \sqrt{2 A_{\alpha}^{(i)}} N^{(i)}.
    • Sample phase biUniform[0,2π]b_i \sim \text{Uniform}[0, 2\pi].
  • The feature map is φ(x)=2/M[cos(wix+bi)]i=1M\varphi(x) = \sqrt{2/M} [\cos(w_i^\top x + b_i)]_{i=1}^M.

The kernel is then estimated by averaging over these features. The computational complexity is O(Md)O(M d), matching that of classical RFFs for Gaussian kernels.

4. Examples: Major Isotropic Kernel Families and Their Mixtures

The general construction unifies and extends RFF applicability. Notable kernel families and their mixing distributions mRm_R include:

Kernel Type k(r)k(r) Form RR Distribution mR(s)m_R(s) (density) λ\lambda
Exponential-power erαe^{-r^\alpha} {1}\{1\} (degenerate) δs=1\delta_{s=1} $1$
Generalized Cauchy (1+rα2β)β(1+\tfrac{r^\alpha}{2\beta})^{-\beta} Γ(β,1)\Gamma(\beta,1) sβ1es/Γ(β)s^{\beta-1} e^{-s}/\Gamma(\beta) 1/(2β)1/(2\beta)
Kummer (hypergeometric 1st kind) 1F1(β;β+γ;rα){}_1F_1(\beta;\beta+\gamma;-r^\alpha) Beta(β,γ\beta,\gamma) sβ1(1s)γ1/B(β,γ)s^{\beta-1}(1-s)^{\gamma-1}/B(\beta,\gamma) $1$
Beta kernel B(β+rα,γ)/B(β,γ)B(\beta+r^\alpha,\,\gamma)/B(\beta,\gamma) ln(Beta(β,γ))-\ln(\mathrm{Beta}(\beta,\gamma)) eβs(1es)γ1/B(β,γ)e^{-\beta s} (1-e^{-s})^{\gamma-1}/B(\beta,\gamma) $1$
Tricomi (hypergeometric 2nd kind) Γ(β+γ)/Γ(γ)U(β,1γ,(γ/β)rα)\Gamma(\beta+\gamma)/\Gamma(\gamma)\, U(\beta,1-\gamma,(\gamma/\beta) r^\alpha) Fisher-Snedecor F2β,2γF_{2\beta,2\gamma} [1/B(β,γ)](β/γ)βsβ1(1+βs/γ)(β+γ)[1/B(\beta,\gamma)](\beta/\gamma)^\beta s^{\beta-1} (1+\beta s/\gamma)^{-(\beta+\gamma)} $1$

In each case, the RFF algorithm draws RR according to mRm_R, then a stable vector as described, thus reducing general kernel approximation to standard procedures.

5. Applications: SVM, Kernel Ridge Regression, and Gaussian Processes

These generalized RFFs enable scalable kernel machines for arbitrary isotropic kernels:

  • In support vector machines (SVM) and kernel ridge regression (KRR), the explicit finite-dimensional feature map allows direct use of efficient linear solvers. All known concentration and error decay guarantees for standard RFFs extend.
  • In Gaussian process (GP) regression, f(x)f(x) can be represented approximately as a sum over random features, with posterior inference performed in the corresponding linear model.
  • The choice of kernel (via the mixture law mRm_R) is decoupled from algorithmic complexity; only the random feature sampling changes, not the downstream computational machinery.

Empirical studies confirm that the convergence rate O(M1/2)O(M^{-1/2}) of the RFF approximation is preserved regardless of the kernel family used, provided the sampling follows the correct spectral mixture (Langrené et al., 5 Nov 2024).

6. Implementation and Practical Considerations

Implementation requires:

  1. Derivation of the kernel’s mixture law mRm_R (from the Laplace–Stieltjes transform, as explicitly provided for all cases in (Langrené et al., 5 Nov 2024)).
  2. Efficient procedures for sampling from mRm_R and for generating stable random vectors, the latter being possible using Gaussian mixtures and simple univariate random variates.
  3. Downstream code for feature generation and model training (e.g., in SVM, KRR, GP) requires no change from the Gaussian RFF case except for the feature sampling step.

All known theoretical and empirical concentration results for RFFs, including error rates and uniform bounds, extend verbatim to this generalized setting.

7. Significance and Scope of the Scale-Mixture RFF Paradigm

The scale-mixture RFF paradigm subsumes the classical RFF model and dramatically enlarges the set of kernels for which efficient random feature approximations are available. The framework covers essentially all kernels expressible as

k(r)=ϕR(iλrα)k(r) = \phi_R(i \lambda r^\alpha)

and thus unifies the Gaussian, Laplace, Cauchy, Matérn, and even more exotic kernels (Beta, Kummer, Tricomi). All of these admit efficient, direct RFF sampling routines with matching computational and approximation guarantees (Langrené et al., 5 Nov 2024).

This development enables theoretically rigorous, computationally tractable kernel learning for a much wider class of models, with broad utility in support vector machines, Gaussian process regression, spectral methods, and operator learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Random Fourier Feature (RFF) Representations.