FourierKAN: Fourier-Based Neural Architectures
- FourierKAN architectures are neural models that embed trainable Fourier series within the Kolmogorov–Arnold framework to efficiently approximate both low- and high-frequency functions.
- They reduce parameter cost by replacing traditional B-spline activations with sinusoidal expansions and Random Fourier Features, leading to faster convergence and improved scalability.
- Applications span vision, language, audio, and collaborative filtering, with empirical results showing superior recall, accuracy, and kernel approximation guarantees.
Fourier-based FourierKAN architectures are a principled family of neural models that embed Kolmogorov–Arnold-style superposition via parameter-efficient trainable Fourier (sinusoidal) expansions. They address the limitations of classical Kolmogorov–Arnold Networks (KANs) in parameter cost and capacity for high-frequency representation, and offer provable expressiveness paired with practical scalability across domains such as vision, language, audio, and collaborative filtering.
1. Mathematical Foundations
The core theoretical basis is the Kolmogorov–Arnold representation theorem, which asserts that any continuous function can be written as a finite superposition of univariate continuous functions as follows: with and denoting univariate inner and outer functions. Standard KANs implement this structure by parameterizing as B-splines, resulting in high flexibility but with substantial per-layer parameter cost, especially for large .
FourierKAN replaces the spline parameterization with trainable Fourier series or Random Fourier Feature (RFF) expansions. For an input and grid size (or feature count ), the Fourier-based mapping for each coordinate is
where are trainable coefficients. This allows the architecture to represent high-frequency variations with fewer parameters and simpler training than splines, while still permitting the system to approximate arbitrary continuous functions (Zhang et al., 9 Feb 2025, Xu et al., 3 Jun 2024).
Random Fourier Features further extend this capability: with and , providing kernel approximation guarantees.
2. FourierKAN Module Construction and Parameter Efficiency
FourierKAN modules embed these Fourier mappings into neural architectures by:
- Replacing the univariate activations of KANs (traditionally B-splines or splines with ) with the truncated sinusoidal expansion above, or RFF hybridized with parametric activations (such as GELU).
- Utilizing a merged weight matrix by matrix associativity, obviating the need for dual-matrix structures. For a layer with , this reduces complexity from for splines to for FourierKAN, where is the spline grid count and the spline degree (Zhang et al., 9 Feb 2025).
- In hybrid GELU–Fourier activations, the layer outputs: where are learned channel-wise scaling factors, and is a small random projection.
This dramatically reduces parameter overhead while ensuring a smooth and adaptable function space that covers both low- and high-frequency modes. Learned scaling parameters dynamically shift representation focus as training progresses.
3. Integration within Deep Learning Architectures
FourierKAN modules have been applied in:
Graph Collaborative Filtering (FourierKAN-GCF)
Within Graph Collaborative Filtering, FourierKAN replaces the MLP in interaction-feature transforms during graph convolution. In NGCF-style message passing, the MLP term
becomes a Fourier mapping: During propagation, embeddings are updated via: Analogous updates apply for item nodes. Message and node dropout are integrated for regularization (Xu et al., 3 Jun 2024).
Text Classification Heads (FR-KAN)
In classification head fine-tuning, the FR-KAN head consumes transformer embeddings and applies vectorized Fourier expansions per feature coordinate, followed by learnable outer functions. For class logits: where is the per-feature Fourier series expansion and are learnable outer activations (Imran et al., 16 Aug 2024).
Large-Scale and Modular Architectures
FourierKAN blocks have replaced feedforward or "mixer" components in architectures such as ResNet, ViT, and GPT-2, maintaining or improving accuracy and reducing training cost (Zhang et al., 9 Feb 2025).
4. Expressiveness, Kernel Approximation, and Spectral Adaptivity
FourierKAN networks inherit the universal approximation property of the Kolmogorov superposition, extended by RFF-based kernel methods. The RFF embedding provides, via Bochner’s theorem, unbiased estimation of translation-invariant kernels: with statistical concentration for .
The hybrid GELU–Fourier activation ensures that both smooth (low-frequency) and highly oscillatory (high-frequency) functions can be represented, with training dynamics enabling a shift in spectral emphasis through learned parameters. Proper RFF scaling, such as initializing with for GELU, is required to eliminate frequency distortion (Zhang et al., 9 Feb 2025).
5. Empirical Performance and Efficiency
Evaluation across modalities demonstrates, for matched parameter budgets, superior or equal performance by FourierKAN modules relative to baseline MLP or spline-based KAN models.
Graph Recommendation Benchmarks
On the MOOC and Games datasets for collaborative filtering, FourierKAN-GCF attains the highest Recall@K and NDCG@K across all baselines; on MOOC, Recall@10 of 0.2595 (FourierKAN-GCF) versus 0.2453 (best baseline), with all improvements statistically significant at (Xu et al., 3 Jun 2024).
NLP Classification
FR-KAN heads consistently outperform MLPs by approximately +10 percentage points in accuracy and +11 percentage points in F1 score, with similar or lower parameter count and 3–5× faster convergence during fine-tuning. For example, averaged across seven transformer backbones and four classification types:
- MLP: 58.2% accuracy, 55.9% F1
- FR-KAN: 67.2% accuracy, 66.9% F1 (Imran et al., 16 Aug 2024)
Vision, Audio, and Scientific Benchmarks
FourierKAN modules match or exceed the accuracy of MLP/GELU blocks on vision datasets (e.g., CIFAR-10 accuracy 91.72% vs. 91.19%), NLP (e.g., WikiText GPT-2 perplexity 180.9 vs. 184.5 for MLP), audio, tabular, and PDE-solving tasks. Replacement of all mixer blocks in ViT or MLP-Mixer models yields consistent improvements (Zhang et al., 9 Feb 2025).
Ablation confirms that dropping RFF components, learned scaling, or using random frequency inits results in performance degradation.
Computational Complexity
For FourierKAN-GCF: per edge per layer, operations (Fourier mapping dominates), with overall epoch cost (Xu et al., 3 Jun 2024). FR-KAN heads match MLPs in per-step compute and memory, but converge in 3–5× fewer epochs (Imran et al., 16 Aug 2024). KAF modules provide strict parameter cost reduction over B-spline KAN.
6. Limitations, Regularization, and Future Directions
FourierKAN architectures require careful hyperparameter tuning for grid/frequency size, RFF initialization, and scaling to avoid spectrum mismatch or instability. The use of trigonometric functions increases computational cost per operation relative to piecewise polynomials, and interpretability can be reduced compared to B-spline-based maps.
Regularization strategies such as message dropout and node dropout (in GCF), and early focus on low-frequency scaling in hybrid activations, provide robustness.
Current research directions include adaptive frequency modulation, spectrum-aware and data-driven RFF initialization, and the integration of FourierKAN blocks into transformer and convolutional backbones for universal, parameter-efficient nonlinear representation (Zhang et al., 9 Feb 2025, Xu et al., 3 Jun 2024).
7. Applications and Generalization
FourierKAN and its variants have proven effective in:
| Domain | Task Example | Advantage |
|---|---|---|
| Collaborative Filtering | User–item recommendation | Higher accuracy/recall at reduced cost |
| Text Classification | Transformer head fine-tuning | +10% accuracy, 3–5× faster convergence |
| Vision & Audio | CIFAR-10, SpeechCommand | Best accuracy per parameter budget |
| Scientific ML | PDE/Function approximation | Lowest RMSE, robust to multi-frequency data |
The architecture’s plug-and-play capacity, parameter efficiency, and theoretically grounded kernel approximation suggest extensibility to high-dimensional time-series, physical simulation, and any scenario demanding effective nonlinear aggregation of features. A plausible implication is broader adoption as a universal MLP alternative in both frozen and end-to-end finetuned models (Imran et al., 16 Aug 2024, Zhang et al., 9 Feb 2025).