Fourier-KAN: Spectral Neural Architecture

Updated 9 March 2026

Fourier-KAN is a framework integrating parameterized Fourier expansions with Kolmogorov–Arnold Networks to enhance expressivity and training efficiency.
It leverages adaptive frequency selection and matrix association to reduce parameter complexity and improve convergence across diverse applications.
The framework demonstrates versatility in tasks such as time-series analysis, graph filtering, operator learning, and anomaly detection, offering robust spectral representations.

The Fourier-KAN framework is a family of architectures that integrates Kolmogorov–Arnold representation networks (KANs) with Fourier or spectral basis expansions for the purpose of efficient and expressive function approximation—especially in time-series analysis, graph collaborative filtering, signal representation, operator learning, and beyond. These models generalize classical neural networks by replacing standard affine or spline-based transformations with parameterized or adaptive Fourier expansions, resulting in improved spectral expressivity, parameter efficiency, robustness, and fast convergence across diverse domains.

1. Theoretical Foundations: Kolmogorov–Arnold Decomposition and Fourier Basis Integration

The central analytical tool underpinning the Fourier-KAN framework is the Kolmogorov–Arnold representation theorem, which states that any continuous multivariate function $f: [0,1]^n \to \mathbb{R}$ can be written as

$f(x_1, \ldots, x_n) = \sum_{q=1}^{2n+1} \Phi_q\left(\sum_{p=1}^n \phi_{q,p}(x_p)\right)$

with each $\phi_{q,p}$ and $\Phi_q$ univariate continuous functions. Kolmogorov–Arnold Networks (KANs) instantiate this superposition construct: each connection is parameterized not by a single scalar, but by a learnable univariate nonlinear transformation, classically instantiated as B-splines but, in Fourier-KAN, replaced by truncated Fourier or related expansions.

The essential innovation is to define each such univariate function $\phi(u)$ as a parameterized finite Fourier expansion, e.g.,

$\phi_{F}(u) = \sum_{k=1}^{g} \left[a_k \cos(k u) + b_k \sin(k u)\right]$

where the frequency grid size $g$ controls the bandwidth and $a_k, b_k$ are trainable coefficients. Multivariate interactions (such as element-wise feature combinations in graphs) can be mapped to higher-order Fourier interactions, supporting rich, globally supported nonlinearities and efficient spectral decompositions (Xu et al., 2024, Zhang et al., 9 Feb 2025).

2. Framework Variants and Architectural Instantiations

Fourier-KAN principles have been instantiated in diverse architectures, including:

Graph Collaborative Filtering (FourierKAN-GCF):

Replaces interaction feature transformations within graph convolutional networks (GCNs), specifically supplanting MLP blocks in NGCF with single-layer Fourier-KAN activations over elementwise user-item embeddings. The approach eliminates unnecessary linear transforms, employs lightweight parameter counts $O(dg)$ , incorporates per-edge message and node dropout, and concatenates multi-layer outputs for final representations (Xu et al., 2024).

Kolmogorov–Arnold Fourier Networks (KAF):

Generalizes the KAN block to high-dimensional tasks by adopting trainable random Fourier feature (RFF) embeddings in place of B-splines, and hybridizes with GELU activations to allow dynamic, learnable spectral mixing. Parameter explosion is mitigated by matrix association (merging KAN’s inner and outer matrices), substantially reducing parameter complexity. For input $x$ , the spectral mapping is:

$z(x; W, b) = \sqrt{\frac{2}{m}} [\cos(W^\top x + b);\ \sin(W^\top x + b)] \in \mathbb{R}^{2m}$

Key to training stability is the initialization of $W$ (frequency matrix) to match the data’s spectral statistics, typically $\mathcal{N}(0, \sigma^2/d)$ with $\sigma \approx 1.64$ for GELU harmonics (Zhang et al., 9 Feb 2025).

Operator Learning with SpectraKAN:

Integrates a KAN-encoded global token into a multi-scale spectral (Fourier) trunk to condition spectral operators on the input itself, as opposed to the typical static Fourier kernel in Fourier Neural Operators (FNOs). This enables nonuniform, regime-dependent, and globally modulated spectral operator behavior, implemented via KAN-based single-query cross-attention over spatial fields. The architecture attains both theoretical mesh invariance and explicit Lipschitz-controlled modulation (Cheng et al., 5 Feb 2026).

Time-Series Anomaly Detection (Fourier-KAN and Fourier-KAN-Mamba):

Fourier-KAN replaces B-spline bases by truncated Fourier expansions in the KAN pipeline for both detection and forecasting, emphasizing global trends and improving resilience to local noise or outliers (Zhou et al., 2024, Wang et al., 19 Nov 2025). The Fourier-KAN-Mamba variant hybrids a multi-scale Fourier feature extraction module with a KAN block, followed by a gated/state-space “Mamba” module for long-sequence modeling, with detailed anomaly scoring and gating mechanisms.

3. Advantages, Expressivity, and Parameter Efficiency

Shifting from B-spline or polynomial univariate basis functions to Fourier (global) or RFF (data-adaptive) bases yields several notable properties:

Spectral expressivity: Truncated Fourier expansions capture large-scale and oscillatory patterns, while learnable frequency spectra via RFFs allow for flexible, adaptive resolution.
Parameter efficiency: Matrix association and frequency selection reduce parameter counts from $O(d^2)$ or worse to $O(dg)$ or $O(d m)$ , where $g$ and $m$ are frequency resolution hyperparameters.
Robustness: Fourier or RFF-based KANs are less prone to overfitting local anomalies, as basis functions are globally supported and the capacity to interpolate noise is diminished.
Universal approximation: These networks, by combining Kolmogorov–Arnold and Bochner's theorems, retain universal function approximation capability for continuous mappings in high dimension (Zhang et al., 9 Feb 2025).
Improved training dynamics: Fourier basis and RFFs avoid optimization plateaus sometimes encountered with splines and produce superior gradient flow, leading to faster convergence.

4. Core Methodology and Training Techniques

Frequency selection and adaptive spectrum: Some Fourier-KAN frameworks (e.g., KFS) feature explicit energy-based selection of dominant frequency components using Parseval’s theorem, reconstructing denoised input via top- $K$ inverse FFT before downstream KAN modeling (Wu et al., 1 Aug 2025).
Hybrid activation functions: KAF architectures introduce hybrid GELU-Fourier activations:

$H(x) = a \odot \mathrm{GELU}(x) + b \odot \varphi(x)$

where $a, b$ are learnable scalings, $\varphi(x)$ is the RFF embedding, and contributions shift progressively towards high-frequency features during training (Zhang et al., 9 Feb 2025).

Dropout and regularization: Message and node dropout (FourierKAN-GCF), contrastive scoring (Fourier-KAN-Mamba), and $L_2$ -regularization on Fourier coefficients (KAN-AD) are emphasized to prevent overfitting and enhance robustness (Xu et al., 2024, Zhou et al., 2024, Wang et al., 19 Nov 2025).

5. Empirical Results and Benchmarks

Fourier-KAN methods consistently improve performance and efficiency across modalities:

Graph recommendation (MOOC, Amazon Games): FourierKAN-GCF improves Recall@20 by 5–25% over LightGCN and NGCF, with substantially fewer parameters and faster convergence (Xu et al., 2024).
Vision and NLP: KAF achieves higher accuracy than MLP and (B-spline) KAN models on MNIST, CIFAR-10/100, and GPT-2 benchmarks, while requiring fewer or comparable parameters (Zhang et al., 9 Feb 2025).
Time series anomaly detection: KAN-AD achieves +15% average Event-F1 over the best prior methods with only $\sim$ 300 parameters and $10\times – 50\times$ faster inference, robust under high anomaly ratio and label-free conditions (Zhou et al., 2024, Wang et al., 19 Nov 2025).
Neural operator learning: SpectraKAN reduces RMSE by up to 49% on compressible Navier–Stokes, diffusion–reaction, Darcy flow, and shallow water PDEs, remaining stable under mesh refinement unlike many alternatives (Cheng et al., 5 Feb 2026).
Audio implicit representation: Fourier-KAN achieves competitive SNR/LSD for music and speech without position encoding or task-specific tuning, outperforming B-spline KAN baselines (Li et al., 10 Jan 2026).
Forecasting: KFS sets state-of-the-art on ETTh/m, Weather, and other long-term forecasting benchmarks, combining energy-based denoising and rational function KANs (Wu et al., 1 Aug 2025).

6. Interpretability and Theoretical Properties

Spectral interpretability: Explicit parameterization of frequency bands (as in KAF’s $a$ / $b$ scaling vectors) enables inspection and interpretation of low- vs high-frequency model capacity and focus (Zhang et al., 9 Feb 2025).
Lipschitz and mesh invariance: SpectraKAN provides formal guarantees on global modulation smoothness and resolution-independent operator convergence given regularity of KAN/Lipschitz components (Cheng et al., 5 Feb 2026).
Intrinsic robustness: Fixing basis functions and limiting learning to their coefficients, as done in KAN-AD and KFS, prevents local overfitting and preserves generalization across noisy or contaminated time series (Zhou et al., 2024, Wu et al., 1 Aug 2025).

7. Limitations and Future Directions

While Fourier-KAN and its variants represent a significant advance, certain phenomena remain challenging:

Hyperparameter tuning: The optimal frequency budget (grid size $g$ , RFF dimension $m$ , RFF bandwidth $\sigma$ ) is domain- and task-dependent, often requiring empirical adjustment (Zhang et al., 9 Feb 2025).
Expressivity vs interpretability trade-off: Rational and Fourier bases are interpretable spectrally but less so in terms of localized features; learnable spectral bases (wavelets, chirplets) may bridge this trade-off.
FLOP/memory costs: RFF and hybrid activation introduce nontrivial computational overhead compared to plain MLPs, although savings over deep KANs are substantial.
Further extensions: Promising avenues include bilevel-optimization of spectral hyperparameters, fusion with attention/convolutional mechanisms, domain-specific spectral bases, and deeper theoretical analysis of sample complexity vs spectral resolution (Zhang et al., 9 Feb 2025, Cheng et al., 5 Feb 2026).