FourierKAN Architectures Overview

Updated 2 March 2026

FourierKAN architectures are neural models that integrate Kolmogorov–Arnold networks with Fourier-based methods to capture high-frequency and multi-resolution features.
They employ techniques such as Fourier series expansions, random Fourier features, and adaptive spectral conditioning to boost expressivity and sample efficiency.
Applications span time series forecasting, audio processing, image reconstruction, and graph filtering, demonstrating improved performance over traditional approaches.

FourierKAN architectures constitute a family of neural models that merge the universal approximation properties of Kolmogorov–Arnold Networks (KAN) with explicit or learned frequency-domain parameterizations, leveraging Fourier analysis, random Fourier features, and spectral conditioning. These architectures are designed for a diverse set of data modalities—including time series, audio signals, images, graphs, tabular data, and spatio-temporal fields—and consistently demonstrate improved sample efficiency, expressivity, and spectral fidelity compared to traditional MLPs and spline-based KANs. Variants have been proposed for forecasting, representation learning, operator learning, and graph collaborative filtering, each introducing distinct mechanisms for frequency selection, high-frequency representation, parameter efficiency, and model conditioning.

1. Foundational Principles

FourierKAN architectures extend the Kolmogorov–Arnold representation theorem, which states that any continuous function $f : [0,1]^d \to \mathbb{R}$ can be decomposed as

$f(x_1, ..., x_d) = \sum_{q=0}^{2d} P_q\left(\sum_{p=1}^d \phi_{q,p}(x_p)\right)$

with continuous univariate functions $\phi_{q,p}$ and $P_q$ . KANs realize this by parameterizing univariate activations with splines, B-splines, or rational functions and arranging them in two-layer structures with substantial width ($2d+1$). FourierKANs replace or augments these univariate activations with trigonometric bases, random Fourier features, or explicit Fourier series, thereby enhancing the model’s ability to capture oscillatory, high-frequency, or multi-resolution structure in the data.

Several variants introduce the following core components:

Fourier-expansion of univariate activations: Replacing splines with truncated Fourier series, e.g.,

$\phi_{ij}^{\text{FR}}(x) = \sum_{k=0}^G [a_{ij,k} \cos(kx) + b_{ij,k} \sin(kx)]$

as used in FR-KAN classification heads (Imran et al., 2024).

Learned or content-adaptive Fourier kernels: Allowing the first layer to learn basis functions that are sinusoidal, windowed, onset- or band-specific, or adaptively chosen for each input (Verma, 2023).
Random Fourier Feature (RFF) embedding: Insertion of trainable Fourier random projections,

$\phi(x;W_r,b) = \sqrt{2/m} \cos(W_r x + b)$

with $W_r$ and $b$ learned end-to-end to enhance spectral bias and parameter efficiency (Zhang et al., 9 Feb 2025).

Spectral conditioning and attention: Conditioning spectral layers on global context via KAN-encoded global tokens (e.g., SpectraKAN (Cheng et al., 5 Feb 2026)), enabling non-stationary, input-modulated frequency mixing.

2. Architectural Variants and Layer Composition

FourierKANs appear in several design patterns:

a) KFS: KAN-based Adaptive Frequency Selection for Time Series

KFS integrates a FreK module for energy-based frequency denoising and a multi-scale KAN backbone for sequence modeling (Wu et al., 1 Aug 2025):

FreK module: Performs DFT, ranks frequency energies $E_k$ , and reconstructs the signal from top-K components satisfying $\sum_{i=1}^K E_{(i)} / \sum_k E_k > \delta$ (e.g., $\delta=0.9$ ). Non-dominant frequencies are zeroed before iFFT.
Per-scale KAN blocks: After FreK, each timescale is embedded, augmented by adaptive trainable vectors, and passed through two-layer group-rational KANs.
Timestamp alignment: Provides temporal context by linear projections of downsampled timestamps.
Feature mixing: Combines data-driven and temporal features with residual KAN blocks.
Fusion: Final prediction aggregates all scales via averaging and a linear readout.
Hybrid loss: Combines MSE and frequency alignment loss for both time and top-K frequency components.

b) FR-KAN: Fourier-KAN for Lightweight Classification Heads

FR-KAN layers employ truncated Fourier expansions for all “inner” univariate activations, with shallow head-only models (Imran et al., 2024).

Each hyperspace mixing channel computes sums of sines and cosines of the transformer embedding coordinates, with a single output per class.
Parameter and compute counts are comparable to MLP heads but yield smoother functions and faster convergence.

c) KAF: Kolmogorov–Arnold–Fourier Networks via Matrix Merging and RFF

KAF addresses KAN parameter explosion and spectral bias by:

Matrix merging: Collapsing the KAN’s dual-matrix structure ( $W_2 \phi(W_1 x)$ ) into a single weight matrix $W$ when possible, reducing parameter scaling from $O(d_{\text{in}} d_{\text{out}} (G+K+3))$ to $O(d_{\text{in}} d_{\text{out}})$ (Zhang et al., 9 Feb 2025).
Learnable RFF mapping: Each layer applies $\sqrt{2/m}\cos(W_rx+b)$ , with $W_r$ and $b$ trained jointly with the main weights, preserving high-frequency fidelity.
Hybrid activation: Each channel output is a convex blend of GELU and RFF, with trainable mixing coefficients $(\alpha, \beta)$ that adapt to the data’s frequency content over training.

d) SpectraKAN: Neural Operators with KAN-Conditioned Spectral Attention

SpectraKAN introduces input-adaptive spectral layers:

Multi-scale FNO trunk: Axis-separable Fourier neural operator modules process the input at multiple resolutions.
KAN encoder for modulation: Extracts a global context token from the input history.
Global modulation via cross-attention: Applies a single-query cross-attention on the spectral representations, with keys/values derived from spectral features and query from the global KAN-encoded token, yielding input-conditioned, resolution-independent integral operators (Cheng et al., 5 Feb 2026).

e) FourierKAN-GCF for Graph Collaborative Filtering

A Fourier KAN replaces feature transformation MLPs in GCN-based collaborative filtering, applying a dense Fourier feature map to elementwise embeddings: $\Phi(x) = [\cos(k x_i), \sin(k x_i) \text{ for } 1 \leq i \leq d, 1 \leq k \leq g ] \in \mathbb{R}^{2dg}$ with coefficients $a_{ik}, b_{ik}$ for scalar output, dramatically reducing per-layer parameter counts (Xu et al., 2024).

3. Training Objectives, Losses, and Spectral Alignment

FourierKAN models optimize various standard and spectral-aware objectives:

Regression and forecasting: Mean squared error (MSE) is standard. KFS adds frequency alignment loss $\mathcal{L}_F$ on dominant Fourier coefficients.
Classification: Cross-entropy loss is used in FR-KAN heads and KAF classifiers.
Operator learning/PDEs: RMSE between predicted and ground truth fields.
Spectral alignment: Many variants utilize explicit frequency alignment losses (Fourier-domain L1/L2) or regularization to promote energy preservation in specified spectral bands.

Ablation studies show that omitting frequency-based modules or reverting to standard MLPs typically degrades both predictive accuracy and ability to capture high-frequency components.

4. Empirical Performance and Parameter Efficiency

FourierKAN architectures consistently outperform MLP, vanilla KAN, and Transformer/CNN baselines across diverse tasks:

KFS: State-of-the-art in long-term forecasting on ETT, Weather, and Electricity datasets, with 10–20% lower MSE/MAE than best Transformer baselines, and resource footprints ( $\approx$ 116 MB, 21 ms/step, 1.66 GFLOPs) comparable or superior to alternatives (Wu et al., 1 Aug 2025).
FR-KAN text heads: 10% higher accuracy and 11% higher F1 than MLP heads on seven transformers (DistilBERT, RoBERTa, XLNet, etc.) and four NLP benchmarks. Converges in fewer epochs and with equal or fewer parameters (Imran et al., 2024).
KAF: Outperforms KAN, FAN, and GPKAN on MNIST, CIFAR, CoLA, AG News, SpeechCommands, and PDE regression, with stable convergence, smaller error on high-frequency or discontinuous functions, and strictly reduced parameter counts (e.g., from $O(10^5)$ in KAN to $O(10^3)$ in KAF for similar performance) (Zhang et al., 9 Feb 2025).
SpectraKAN: Lowers RMSE by up to 49% versus Fourier neural operators and prior PDE models (Cheng et al., 5 Feb 2026).
FourierKAN-GCF: Improves Recall@K and NDCG@K, reduces parameter count per layer from $O(d^2)$ (MLP) to $O(dg)$ , and demonstrates enhanced robustness to dropout (Xu et al., 2024).
Neural Fourier transform architectures: Adaptive learned kernels (via content-adaptive routers) boost task-specific performance, especially in polyphonic pitch estimation and acoustic scene classification (Verma, 2023).

5. Theoretical Analysis and Model Properties

FourierKAN architectures offer several formally established or empirically substantiated properties:

Universal approximation: Truncated Fourier expansions (G→∞) and dense sines/cosines retain the universal function approximation of classical KANs. Theoretical results verify that as the number of Fourier modes or RFF increases, the representable function class grows to encompass all continuous univariate functions (Imran et al., 2024).
Spectral adaptation: By making kernels or basis functions trainable and, in some variants, context-adaptive, the models overcome fixed-frequency biases and can emphasize or suppress frequency content in response to data-driven or task-driven demands (Verma, 2023).
Parameter efficiency: Techniques such as matrix merging, low-dimensional Fourier feature maps, and content-adaptive routing reduce the parameter burden relative to the polynomial or spline parameterizations of original KAN layers (Zhang et al., 9 Feb 2025, Xu et al., 2024).
Lipschitz control and regularization: For modulation tokens (SpectraKAN), spline-based KAN encoders permit explicit Lipschitz bounds on the conditioning map, supporting more stable training and mesh-independent operator learning (Cheng et al., 5 Feb 2026).
Ablation robustness: Across all cited works, ablations confirm the necessity of Fourier-based activations, spectral selection, or adaptive basis construction for matching peak accuracy, fast convergence, and reliable training.

6. Applications Across Modalities

FourierKAN variants have demonstrated empirical superiority and practicality in:

Time series forecasting: Multi-scale energy selection and denoising for long-horizon predictions.
Audio and signal processing: Learned spectral front ends for polyphonic pitch tracking, timbre classification, onset detection, noise rejection, and content-adaptive filtering.
Natural language processing: Efficient and expressive classification heads for transformer-based architectures under frozen or partially frozen settings.
Image and tomographic reconstruction: Fourier domain encodings (FDE) for autoregressive completion and encoder–decoder regression in ill-posed Fourier coefficient recovery (Buchholz et al., 2021).
Graph collaborative filtering: Feature transformation layers in message passing GCNs, drastically lowering model complexity on common benchmarks.
Neural operator learning: Conditioning spectral operators on global system context for PDE prediction and spatio-temporal modeling.

7. Limitations, Design Trade-offs, and Future Directions

While offering clear advantages in spectral representation, FourierKANs exhibit nontrivial trade-offs:

Spectral truncation and overfitting: Truncation order (grid size) controls frequency capacity and convergence; overlarge expansions risk overfitting, as confirmed in FR-KAN grid-size ablation (Imran et al., 2024).
Computational cost: Explicit FFT modules and large Fourier bases can increase compute or memory per example, though optimized implementations (e.g., in KFS, KAF) match or beat Transformer/CNN alternatives.
Brittleness in high-dimensional regimes: The original KAN architectures suffered from parameter explosion, motivating matrix-merge and RFF schemes in KAF (Zhang et al., 9 Feb 2025).
Domain- and task-specific tuning: Hyperparameters such as energy thresholds ( $\delta$ ), Fourier order, kernel sizes, and frequency grids require empirical tuning for each domain.

Future research is likely to focus on further parameter reductions, better theoretically controlled spectral conditioning, extension to structured data (e.g., manifolds), and deeper integration with kernel methods and operator regression.

In summary, FourierKAN architectures generalize and strengthen the Kolmogorov–Arnold Network paradigm by explicitly representing, learning, and conditioning on frequency content, enabling both theoretical flexibility and practical efficiency across a broad range of high-dimensional prediction, reconstruction, and operator learning tasks (Zhang et al., 9 Feb 2025, Wu et al., 1 Aug 2025, Imran et al., 2024, Cheng et al., 5 Feb 2026, Xu et al., 2024, Verma, 2023, Buchholz et al., 2021).