Fourier Position Embedding (FoPE)

Updated 11 November 2025

Fourier Position Embedding (FoPE) is a technique that encodes positions using superpositions of sine and cosine functions to capture periodic and high-frequency relationships.
It leverages fixed, random, or learnable frequency banks to overcome spectral bias and efficiently model data in applications like image reconstruction, point cloud analysis, and language modeling.
Empirical implementations demonstrate that FoPE can closely approximate bandlimited functions and improve performance in challenging tasks through controlled frequency bandwidth and progressive training.

Fourier Position Embedding (FoPE) is a positional encoding strategy that utilizes parametric or analytical Fourier feature mappings—typically superpositions of sinusoids—to represent coordinates or positions in machine learning models. FoPE directly encodes periodic and high-frequency relationships, stationary distances, or spectral structure in tasks ranging from implicit representations of functions to language modeling and point cloud analysis. In both fixed and learnable forms, Fourier Position Embeddings have enabled models to overcome spectral bias, generalize to unseen inputs, and efficiently encode positional dependencies at arbitrary resolutions or geometries.

1. Mathematical Foundations and General Formalism

The core of Fourier Position Embedding is the representation of an input coordinate $x \in \mathbb{R}^d$ by a vector of sine and cosine components at prescribed frequencies. This is most generally written as

$\gamma(x) = \left[\cos(2\pi B x)\,,\, \sin(2\pi B x)\right]^\top \in \mathbb{R}^{2m}$

with $B \in \mathbb{R}^{m \times d}$ acting as a frequency projection matrix (“frequency bank”). For scalar $x$ and a list of frequencies $b_j$ , this reduces to

$\gamma(x) = \left( \cos(2\pi b_1 x), \ldots, \cos(2\pi b_m x), \sin(2\pi b_1 x), \ldots, \sin(2\pi b_m x) \right)^\top.$

Several variants and extensions of this formulation exist:

Integer-Lattice/Fourier Series: $B$ enumerates all frequencies up to a cutoff $N$ to correspond exactly to a truncated $d$ -dimensional Fourier series of the form

$f(x) = (a_B, b_B) \cdot \left[\cos(2\pi B x), \sin(2\pi B x)\right]^\top,$

where $B$ is an integer lattice in $\mathbb{Z}^d$ (Benbarka et al., 2021).

Random Fourier Features: $B$ is sampled with IID Gaussian rows $b_j \sim \mathcal{N}(0, \sigma^2 I)$ , approximating shift-invariant kernels such as the Gaussian RBF via

$\mathbb{E}\left[\cos(2\pi Bx - 2\pi By)\right] = \exp(-\|x-y\|^2 / (2\sigma^2))$

(Zheng et al., 2023, Sojitra et al., 15 Sep 2025, Zheng et al., 2021).

Learnable Fourier Features: $B$ and possibly post-hoc MLPs are learned end-to-end, permitting the model to adapt to dataset-specific geometric/structural relations and capture more complex positional interactions (Li et al., 2021, Jabareen et al., 2 Sep 2025).

2. Theoretical Characterization and Model Equivalence

A key finding is the structural equivalence between networks using Fourier Position Embeddings and models with periodic activation functions (notably SIRENs):

One-Layer SIREN Equivalence: For input $x$ and frequency bank $B$ , a perceptron on Fourier embeddings,

$y(x) = W \gamma(x) + b,$

is equivalent to a one-layer SIREN (sinusoidal-activated network) with fixed phase and frequencies, as composition of cosines and sines over a frequency-bank can be rewritten in phase-shifted sinusoid form. For a true SIREN, frequency matrix and phase can be learned, while for FoPE, these are fixed or constructed (Benbarka et al., 2021).

Bandlimited Function Approximation: On bounded domains, FoPE with an integer-lattice $B$ is exactly the real Fourier series basis; with random $B$ , the induced embedding approximates band-limited kernels, such as the Gaussian, by Bochner’s theorem (kernel approximation) (Li et al., 2021, Zheng et al., 2023).
Stable Rank and Distance Preservation: The expressivity and generalization properties of a positional embedding are governed by two quantitative metrics: its stable rank, which upper-bounds learning capacity (memorization), and the degree to which pairwise distances in input space are preserved in embedding space (generalization). FoPE tightly controls this trade-off through the bandwidth and diversity of its frequency bank (Zheng et al., 2021).

3. Embedding Construction Variants and Anisotropy

The flexibility of the Fourier feature approach allows it to be adapted and extended:

Isotropic/Anisotropic Embeddings: In high-dimensional or anisotropic domains (e.g., medical imaging), Fourier embeddings may use a scaling matrix $\Sigma^{1/2}$ , often diagonal or learnable, to enable per-dimension frequency scaling:

$\phi_{AFPE}(x) = \left[ \sin\left(2\pi B (\Sigma^{1/2} x)\right) ; \cos\left(2\pi B (\Sigma^{1/2} x)\right) \right]$

This enables explicit encoding of domain- or class-specific anisotropies. Regularization of the scaling parameters prevents degenerate solutions, while learnable frequency banks (LFPE) increase flexibility further at additional parameter and computation cost (Jabareen et al., 2 Sep 2025).

Learnable and Hybrid Mappings: Embedding matrices $B$ can be fixed or trainable. When combined with a non-linear MLP post-processor, the network can learn and adapt both local (e.g., Euclidean) and complex spatial relationships (e.g., IoU, aspect ratios) beyond what fixed frequency sets alone can encode (Li et al., 2021).
Spectral Pruning and Progressive Activation: Pruning lower-amplitude frequency components or progressively “masking in” higher-frequency channels during training can improve generalization and prevent overfitting, especially for high-resolution function or image regression tasks. A progressive schedule $w_\alpha(z)$ can be used to gate frequency activation by their norm during training epochs (Benbarka et al., 2021).

4. Practical Implementations and Empirical Findings

Fourier Position Embeddings have been empirically validated across diverse domains:

Implicit Neural Representations (INR): FoPE enables coordinate-based MLPs to represent high-frequency functions and reconstruct signals (e.g., images) exactly up to the Nyquist limit. On image regression, a single-layer perceptron using full integer-lattice FoPE (at Nyquist) with FFT-initialized weights can reach train PSNR $>$ 160, i.e., near-exact pixelwise recovery (Benbarka et al., 2021).
Point Cloud and Operator Learning: Random Fourier Feature (RFF)-based FoPE yields marked robustness improvements in 3D point cloud tasks under heavy noise/outliers, maintaining low error rates where learned PE-based PointNet or transformers break down (Zheng et al., 2023). In operator learning, e.g., for PDE surrogates, trunk networks with RFF-FoPE (as in FEDONet) achieve $2-3\times$ lower relative $L_2$ error than vanilla DeepONet, especially in high-frequency or geometry-sensitive regimes (Sojitra et al., 15 Sep 2025).
LLMs and Periodic Attention: In transformer LMs, RoPE implements a single-frequency per-dimension “rotary” embedding that achieves implicit periodicity but suffers “spectrum damage” due to nonlinearities and insufficient training of long-period components. FoPE generalizes this by replacing the single-frequency per-dimension rotor with a truncated Fourier series, and explicitly zeros frequencies under-trained during context windows. This restores periodic extension and enables length generalization to $8\times$ and beyond, yielding stable perplexity and passkey retrieval accuracy over extrapolated sequences (Hua et al., 23 Dec 2024).
Empirical Hyperparameter Insights: Model performance depends sharply on the embedding dimensionality $m$ or number of frequency bases $D$ , and the variance $\sigma$ of frequency sampling. Overly high bandwidth or too many frequencies increase overfit and instability, while too low bandwidth causes underfitting (loss of fine detail). Typical optimal values are $D=64$ (for context lengths up to $4\times$ train) and $\sigma$ in $[0.3, 0.6]$ , though precise values are task/model dependent (Hua et al., 23 Dec 2024, Sojitra et al., 15 Sep 2025).

5. Algorithmic Procedures

Implementation of FoPE variants is generally straightforward and enables direct plug-in or replacement of other positional encoding modules:

FoPE/MLP Pipeline (Li et al., 2021):
1. Given input $x \in \mathbb{R}^d$ , compute $h = \text{concat}(\cos 2\pi B x,\; \sin 2\pi B x)$ .
2. Pass $h$ through a small MLP (e.g., two linear layers with GeLU), outputting a $D$ -dimensional embedding.
3. Add or concatenate $E(x)$ to model content-embedding vectors.
4. All operations can be efficiently vectorized and, for random $B$ , batched in offline initialization.
Integer-Lattice & Progressive Training (Benbarka et al., 2021):
1. Build $B$ as full integer lattice $\|n\|_{\infty} \leq N$ .
2. Define training schedule $\alpha(t)$ , progressively increasing from low to high frequencies.
3. At each epoch, mask features by $w_{\alpha}(\|B_i\|)$ and forward/backpropagate as usual.
Truncated Fourier Series (Transformers) (Hua et al., 23 Dec 2024):
1. Per attention head, precompute per-dimension matrices of $D$ frequencies (first RoPE’s, rest random; zero out low $\omega$ ).
2. Initialize coefficient tensors with $\mathcal{N}(0, \sigma^2)$ .
3. At each forward step, compute and sum complex rotors across $D$ for each position and head, apply to Q/K by elementwise multiplication.
4. Proceed with usual softmax attention and downstream layers.

6. Limitations, Trade-offs, and Model-Specific Considerations

Spectral Coverage and Approximation Limits: FoPE can only represent components within the spanned frequency band. Any target with power above the frequency cutoff is irreducibly lost; in practical MLP+FoPE, persistent residual noise results from unrepresented frequencies (Ma et al., 8 Feb 2025).
Robustness and OOD Generalization: RFF-based FoPE is more robust to OOD noise and outliers than learned positional embeddings, but often lags $2$–$5$ points of accuracy in clean distribution tasks (Zheng et al., 2023).
Anisotropy and Data Priors: In domains with strong anisotropy or class-specific geometric structure, standard isotropic FoPE may be suboptimal. Learnable anisotropic scaling, or explicit multi-axis frequency banks, can yield further significant performance gains but add learnable parameters and, if unconstrained, risk model overfitting (Jabareen et al., 2 Sep 2025).
Computation and Memory: While random or lattice-based FoPE with moderate embedding sizes is efficient and vectorizable, increasing frequency count or employing learnable dense $B$ matrices can increase memory and forward-pass computation, although this remains minor compared to quadratic attention or large-scale transformer layers (Li et al., 2021, Sojitra et al., 15 Sep 2025).
Interaction with Nonlinearities and Deep Networks: Spectrum leakage from linear/nonlinear activations motivates using multi-frequency (Fourier series) expansions in deep models, especially in LLMs, but excessive frequency count (very large $D$ ) adversely introduces noise and instability (Hua et al., 23 Dec 2024).

7. Extensions and Future Directions

FoPE’s spectrum-centric perspective connects positional encoding to classic signal processing and kernel approximation. This opens avenues for:

General Shifted-Basis Embeddings: Moving beyond sinusoids, theorized stable-rank and distance-preservation metrics apply to any bandlimited shifted-basis function (including Gaussians, splines, square-waves), suggesting a broader landscape of positional embedder design (Zheng et al., 2021).
Adaptive and Hybrid Filtering: Multiplicative or adaptive learned filters applied to raw Fourier features mitigate spectral bias and noise from under-sampled frequencies; bias-free filter MLPs (without additive terms) maintain scale invariance and can introduce intermodulation (sum/difference) components (Ma et al., 8 Feb 2025).
Progressive and Pruned Embeddings: Progressive gating of frequency channels and post hoc pruning of high/low amplitude modes support resource-constrained or adaptively complex models, especially in operator learning and NeRF-style rendering (Benbarka et al., 2021).
Long-Context Generalization in LLMs: The explicit frequency-domain perspective of FoPE in LLMs (clipping under-trained lows, multi-harmonic modeling) establishes a robust methodology for extrapolating model attention beyond training context length, and is compatible with further extrapolation methods such as YARN (Hua et al., 23 Dec 2024).
Domain-Specific Priors and Structure: Initialization, structure, and regularization of frequency banks and scaling parameters based on domain knowledge (e.g., known anisotropy, shape prior, or target class geometry) can improve performance, especially in highly structured environments (e.g., medical imaging) (Jabareen et al., 2 Sep 2025).

The development and deployment of Fourier Position Embedding have reshaped the theory and practice of positional encoding in neural networks, enabling expressivity and generalization far beyond fixed or learned-label alternatives. The modular, computationally efficient structure of FoPE positions it as a continuing foundation across the spectrum of coordinate-based, geometric, and periodic-input learning tasks.