Fourier Feature Embeddings

Updated 9 October 2025

Fourier feature embeddings are techniques that map data into higher-dimensional spaces using sinusoidal functions to approximate complex, high-frequency signals.
Recent advancements include learnable frequency projections, teacher–learner frameworks, and computation-efficient variants that enhance embedding accuracy and speed.
Applications span kernel methods, implicit neural representations, and graph learning, demonstrating their impact across fields like computer vision and reinforcement learning.

Fourier feature embeddings are a class of techniques that map input data into higher-dimensional spaces using sinusoidal (Fourier) basis functions, enabling machine learning models—particularly neural networks and kernel methods—to efficiently approximate complex, often high-frequency, functions. These embeddings are pivotal in areas such as implicit neural representations, kernel approximation, positional encoding, and structural embedding for both Euclidean and non-Euclidean data.

1. Fundamentals of Fourier Feature Embeddings

The foundational principle behind Fourier feature embeddings is Bochner’s theorem, which states that any continuous, shift-invariant kernel $k(x,y)$ on $\mathbb{R}^d$ with a nonnegative spectral measure $\Lambda$ can be represented as:

$k(x, y) = \int_{\mathbb{R}^d} \exp(i w^\top (x-y)) d\Lambda(w).$

Random Fourier Features (RFFs) (Wangni et al., 2017, Toth et al., 2023) operationalize this by sampling frequencies $w_1, \dots, w_d$ from $\Lambda$ and defining the embedding:

$\phi(x) = \frac{1}{\sqrt{d}} [\cos(w_1^\top x), \sin(w_1^\top x), \dots, \cos(w_d^\top x), \sin(w_d^\top x)].$

In this scheme, inner products in the embedded space approximate the kernel:

$k(x, y) \approx \langle \phi(x), \phi(y) \rangle.$

Fourier feature mappings extend beyond kernels to neural architectures (e.g., neural networks for coordinate-based representations (Benbarka et al., 2021)), positional or number embeddings (Li et al., 2021, Zhou et al., 13 Feb 2025), and structured data such as graphs (Sheng et al., 4 Aug 2025). The critical advantage is the efficient approximation of intricate functions, particularly those with high-frequency content that standard methods fail to capture due to spectral bias.

2. Advancements in Embedding Design and Optimization

Building on basic RFFs, several methods focus on adaptivity, structure, or computational efficiency:

Learnable Fourier Features: Instead of fixing frequencies, one introduces trainable frequency projections. Learnable Fourier features for positional encoding (Li et al., 2021) use trainable $W \in \mathbb{R}^{D/2 \times M}$ , mapping each $x \in \mathbb{R}^M$ by

$r_x = \frac{1}{\sqrt{D}} \left[ \cos(x W^\top) \,\|\, \sin(x W^\top) \right],$

where $\|$ denotes concatenation; $W$ is updated during training.

Teacher–Learner Framework and Hybrid Optimization: A two-stage strategy (Wangni et al., 2017) leverages a "teacher" embedding (high-fidelity, possibly costly) to supervise a "learner" embedding (computationally efficient). The optimization objective is

$L(\phi_L, \phi_T) = \|\phi_T(x)-\phi_L(x)\|^2 + \lambda R(\phi_L),$

solved using Constrained Variational Expectation Maximization (CVEM) and Alternating Direction Method of Multipliers (ADMM), allowing integration of constraints like sparsity or block structure into learned kernel approximations.

Computation-Efficient RFF (CERF): The masked CERF imposes random binary masks for sparsity, while the blocked CERF divides embeddings into blocks, supporting parallelism and structured matrix operations suitable for Fastfood-type algorithms (Wangni et al., 2017).
Bias-free MLP Filtering: Robustifying Fourier embeddings for INRs involves replacing standard MLPs with bias-free MLPs, leveraging strict linearity and scale invariance. Adaptive filtering modulates Fourier channels, accentuating relevant frequencies and suppressing noise (Ma et al., 8 Feb 2025).

3. Theoretical Perspectives: Fourier Embeddings and Neural Networks

Fourier feature embeddings in neural networks establish a direct link with classical Fourier analysis. For example, one-layer (bias-free) perceptrons with sinusoidal inputs can be interpreted as truncated Fourier series expansions (Benbarka et al., 2021):

$y(x) = W \cdot [\cos(2\pi B x), \sin(2\pi B x)] + b$

matches a truncated expansion:

$f(x) = \sum_n [a_n \cos(2\pi n x) + b_n \sin(2\pi n x)].$

The choice of frequency matrix $B$ critically determines both expressivity and optimization tractability. Methods include:

Integer Lattice Mapping: Fix $B$ to integer frequencies for strictly periodic, spectrally interpretable representations.
Progressive Frequency Scheduling: Gradually unmask higher frequency components during training to avoid overfitting and improve generalization (Benbarka et al., 2021).
NTK-guided Regularization: Fourier preprocessing preconditions the initial neural tangent kernel, stabilizing gradients and improving convergence in tabular deep learning (Sergazinov et al., 3 Jun 2025).
Diagonal Feature Gating: Introducing a diagonal layer after sinusoidal embedding enables the model to amplify only Fourier modes corresponding to significant signal content, thus learning a sparse representation robust to noise (Jeong et al., 3 Sep 2024).

4. Extensions to Structured and Non-Euclidean Data

Fourier features are not restricted to Euclidean settings:

Graph Spectral Embedding: Standard Graph Fourier Transform (GFT) uses Laplacian eigenvectors as basis; this is generalized to the fractional spectral domain via the Graph Fractional Fourier Transform (GFRFT), expanding the embedding space to a continuum indexed by fractional order $\alpha$ (Sheng et al., 4 Aug 2025). The Generalized Fractional Filtering Embedding (GEFRFE) leverages fractional eigenvectors and filter banks $H_k$ , with $\alpha$ selected by search or adaptive learning:

$x^\alpha = F^\alpha x, \quad F^\alpha = P J_F^\alpha P^{-1}$

This allows for dynamic adaptation of the embedding space to best capture graph structure.

Knowledge Graphs via FFT: Embeddings in complex hyperbolic spaces are transformed between real and complex domains with FFT and IFFT, enabling hyperbolic transformations and attention mechanisms. Operations are performed in the real domain (Poincaré ball) after FFT, followed by a return to the complex hyperbolic domain (Xiao et al., 2022).
Random Fourier Embeddings for Signatures: In sequence modeling, the Random Fourier Signature Features (RFSF) approximate path signature kernels for sequences, reducing quadratic scaling (in sequence length and number) to linear, with provable concentration bounds and scalable tensor projections (Toth et al., 2023).

5. Applications Across Domains

Fourier feature embeddings have catalyzed advances across both classical and emerging applications:

Application Domain	Embedding Motivation/Role	Reference
Kernel Methods	Kernel approximation, efficiency for large-scale data	(Wangni et al., 2017)
Spatial Positional Encoding	Inductive, continuous image and layout encoding for attention models	(Li et al., 2021)
Implicit Neural Representations	High-frequency detail in graphics, vision, audio	(Benbarka et al., 2021, Ma et al., 8 Feb 2025)
Reinforcement Learning	Functional regularization, stability, and sample efficiency	(Li et al., 2021)
Physical Modeling (PINNs)	Hard-enforcing Neumann boundary conditions, multiscale resolution	(Straub et al., 1 Apr 2025)
Tabular Deep Learning	Bounded kernel preconditioning for faster and more stable convergence	(Sergazinov et al., 3 Jun 2025)
Knowledge Graph Embedding	FFT-mediated transformations between geometric domains	(Xiao et al., 2022)
Graph Representation Learning	Fractional spectral embeddings for richer structure capture	(Sheng et al., 4 Aug 2025)
Numeracy in LLMs	Single-token, digit-precise number embeddings via scaled periodic bases	(Zhou et al., 13 Feb 2025)

Additional applications include ultrasound image segmentation (using Fourier descriptors) (Chen et al., 2023), EEG emotion recognition (extracting periodicities with Fourier attention) (Wang et al., 28 Feb 2025), and high-fidelity prediction of oscillatory optical field perturbations (Jandrell et al., 27 Aug 2025).

6. Limitations, Practical Considerations, and Outlook

Although Fourier feature embeddings have demonstrated remarkable performance improvements, several limitations and open challenges persist:

Noise and Representation Limitations: Embeddings can inject spurious high-frequency noise, with inherent lower bounds on achievable accuracy due to finite frequency sampling (Ma et al., 8 Feb 2025).
Sparsity and Overfitting Control: Techniques such as diagonal gating (Jeong et al., 3 Sep 2024) and bias-free adaptive filtering (Ma et al., 8 Feb 2025) are necessary to ensure that only relevant frequencies are activated.
Parameter Selection: The selection of embedding dimension, frequency range, kernel bandwidth (in RFF), and inclusion of high-frequency components must be tailored to signal characteristics and application domain.
Scalability: While efficient variants (e.g., RFSF-TRP, GEFRFE) reduce computational burden for large-scale and high-dimensional data, operations such as eigendecomposition for graph embeddings or block masking for fast transformations demand careful resource management.
Integration with Deep Architectures: Fourier feature mappings can be seamlessly integrated into most architectures as plug-and-play preprocessing or embedding layers (Sergazinov et al., 3 Jun 2025); however, joint training of Fourier parameters and downstream model weights requires thoughtful optimization strategies.

Future avenues include more principled and adaptive frequency selection, multimodal embedding strategies that combine periodic, positional, and semantic information, deeper connections with NTK and spectral analysis, and extensions to more general manifold data.

7. Representative Mathematical Formulations

Commonly used Fourier embedding constructions:

Random Fourier Features (RFF):

$z(x) = \sqrt{\frac{2}{D}}\; [\cos(W x + b)],$

with $W \sim p(w)$ and $b \sim \mathcal{U}[0,2\pi)$ .

Learnable Fourier Features:

$r_x = \frac{1}{\sqrt{D}}\left[\cos(x W^\top) \,\|\, \sin(x W^\top)\right]$

Integer Lattice Mapping:

$\gamma(x) = [\cos(2\pi B x), \sin(2\pi B x)]$

where $B$ contains integer rows corresponding to desired lattice points.

Bias-Free MLP Filtering:

$y_\text{filtered}(v) = f_A(y(v)) \odot y(v)$

FFT-based Domain Conversion (for KGs):

$z_q = \frac{1}{\sqrt{N}}\sum_{p=0}^{N-1} x_p \exp\left(-i\frac{2\pi pq}{N}\right)$

Graph Fractional Fourier Transform:

$x^\alpha = F^\alpha x, \quad F^\alpha = P J_F^\alpha P^{-1}$

In summary, Fourier feature embeddings constitute a mathematically rigorous, flexible, and widely applicable framework for representing complex, structured information across diverse domains in modern machine learning. Advances in their construction, optimization, and integration continue to expand their efficacy for high-frequency modeling, structural representation, and efficient large-scale learning.