Kolmogorov-Arnold-Fourier Networks (KAF)

Updated 9 March 2026

KAFs are neural architectures that integrate Kolmogorov–Arnold principles with Fourier series to ensure universal approximation and parameter efficiency.
They employ learnable Fourier coefficients and hybrid activations that dynamically adapt spectral modes, enhancing expressivity in tasks like PDE solving and graph modeling.
Empirical results show KAFs deliver superior performance and compression across image, graph, and scientific ML benchmarks compared to standard neural approaches.

Kolmogorov-Arnold-Fourier Networks (KAF) constitute a class of neural architectures that merge the universal function approximation framework of Kolmogorov–Arnold Networks (KANs) with spectral representations based on Fourier series and learnable random Fourier features. KAF models move beyond conventional multilayer perceptrons (MLPs) by deploying edge-wise, learnable, spectrally adaptable nonlinearities parameterized in Fourier bases, yielding improved expressivity, spectral control, and parameter efficiency for scientific machine learning, implicit representations, signal modeling, and structured data domains.

1. Mathematical and Theoretical Foundations

The foundation of KAFs is the classical Kolmogorov–Arnold representation theorem, which asserts that any continuous function $f\colon [0,1]^n \to \mathbb{R}$ can be decomposed as:

$f(x_1, ..., x_n) = \sum_{q=0}^{2n+1} \Phi_{q}\left( \sum_{p=1}^{n} \varphi_{q,p}(x_p) \right)$

where $\varphi_{q,p}$ and $\Phi_q$ are univariate continuous functions. KANs instantiate this structure by assigning learnable univariate basis functions to each (node, edge) pair in a network layer, replacing the role of fixed matrix weights found in standard MLPs.

KAFs build on this by constraining or initializing these univariate edge functions $\varphi_{q,p}$ using Fourier representations:

Truncated Fourier series: $\varphi(x) = \sum_{k=1}^{K} \left[ a_k \cos(kx) + b_k \sin(kx) \right]$
Random Fourier features (RFF): $\psi_{\mathrm{RFF}}(x) = \sqrt{\frac{2}{m}} [\cos(Wx + b), \sin(Wx + b)]$ , with $W, b$ as learnable (or randomly initialized) frequency and phase vectors.

Theoretical advances establish that KAFs retain the universal approximation properties of KANs, while leveraging Fourier convergence theorems (e.g., Carleson–Fefferman) to guarantee that, for any $f \in L^2([0,2\pi]^n)$ and any $\epsilon > 0$ , a KAF with appropriate $K$ and depth $L$ can satisfy $|f(x) - \text{KAF}(x)| < \epsilon$ almost everywhere (Li et al., 2024).

2. Architectural Principles and Variants

KAF network layers generalize KANs by parameterizing each edge’s univariate map as a learnable (possibly random) Fourier expansion:

Edge activation: $\varphi_{q,p}(x) = \sum_{k} a_{k} \cos(\omega_{k} x) + b_{k} \sin(\omega_{k} x)$ for classical KAFs (Li et al., 2024, Zhang et al., 9 Feb 2025).
Hybrid activation (RFF): $\varphi(x) = \alpha\,\text{GELU}(x) + \beta^\top V\,\psi_{\mathrm{RFF}}(x)$ , with $\alpha, \beta, V$ trainable, offering adaptive and stable low/high-frequency coverage (Zhang et al., 9 Feb 2025).

Each layer’s output combines edgewise activations, typically summed per node:

$x_q^{(\ell+1)} = \sum_{p=1}^P \varphi_{q,p}^{(\ell)}(x_p^{(\ell)})$

where layerwise parameter efficiency is achieved by merging weight matrices when moving from spline-based KANs to KAFs, scaling dominant parameters as $O(d_{\text{in}} d_{\text{out}})$ instead of $O(d_{\text{in}} d_{\text{out}} G)$ for spline grid size $G$ (Zhang et al., 9 Feb 2025).

KAF mechanisms now appear across scalar regression MLPs (Zhang et al., 9 Feb 2025), message-passing GNNs (Li et al., 2024), spectral graph Transformers (Ai et al., 2024), and PINN-like architectures for PDEs (Noorizadegan et al., 28 Oct 2025).

3. Spectral Adaptivity and Frequency Learning

A defining property of KAFs is the ability to adaptively allocate network capacity across spectral modes:

Learning all Fourier coefficients end-to-end allows the model to shift spectral bias during training, focusing representational power on those frequencies that most reduce reconstruction error for a given task (Mehrabian et al., 2024).
Fixed (non-learnable) Fourier or sinusoidal bases (as in conventional Fourier feature MLPs) yield inferior empirical performance when the dataset’s spectral energy departs from the assumed basis (Mehrabian et al., 2024, Zhang et al., 9 Feb 2025).
The hybrid RFF+GELU activation in KAFs allows models to begin with low-frequency trends (weighted by the GELU branch, initially dominant) and progressively activate higher-frequency structure as the RFF weights increase during optimization (Zhang et al., 9 Feb 2025).

In high-dimensional or noisy settings, entropy minimization of Fourier coefficients and “gravitational” terms can be further introduced to compress edge representations onto sparse, interpretable Fourier bases, as in Projective KANs (P-KANs) (Poole et al., 24 Sep 2025).

4. Empirical Performance and Practical Implementation

Empirical evaluations demonstrate significant advantages for KAFs across diverse domains:

Model/Class	Benchmark/Dataset	Main Metric	KAF Result	Baseline Result(s)
FKAN (image INR)	Kodak RGB (512×768)	PSNR, SSIM	37.91 dB, 0.939	SIREN: 33.13 dB
KAF (CIFAR-10)	ResNet-18 FFN block	Top-1 Acc.	91.72%	Standard: 91.19%
KA-GNN (molecular)	BACE, HIV, …	ROC-AUC	SOTA (BACE: 0.890, HIV: 0.821)	Prev. SOTA: 0.873
P-KAN (fiber placement)	Noisy scan prediction	RMSE	0.08	LSTM-based: higher

Other domains showing robust improvements include NLP (CoLA, AG-NEWS), audio (SpeechCommands), tabular (Bean, Rice), and solution of PDEs with lower RMSE than MLP/fixed-basis PINNs (Zhang et al., 9 Feb 2025, Noorizadegan et al., 28 Oct 2025, Poole et al., 24 Sep 2025).

Optimization is typically performed with Adam or similar adaptive optimizers, with default or carefully tuned learning rates, layer normalization, and moderate layer depths (L=2–4) (Zhang et al., 9 Feb 2025, Li et al., 2024). Initialization of RFF frequency matrices follows variance-matching heuristics to ensure robust training (Zhang et al., 9 Feb 2025). Hyperparameter guidelines recommend modest RFF counts ( $m=9$ –64), early stopping, and spectral regularization for stability in high-frequency regimes (Noorizadegan et al., 28 Oct 2025).

5. Applications in Graph, Structured, and Scientific Domains

KAFs underpin several advances in structured domains:

Graph Neural Networks: KA-GNN and GrokFormer integrate KAF layers into node/edge mappings and spectral message-passing, yielding SOTA accuracy on molecular property prediction benchmarks (MoleculeNet) and outperforming both MLP and traditional KAN GNN layers (Li et al., 2024, Ai et al., 2024). Fourier-parameterized activations enable learning of spectrum-adaptive filters, capturing both low- and high-frequency graph signals.
Implicit Neural Representations (INR): FKAN shows that replacing the first linear layer of a coordinate MLP with a Fourier-Kolmogorov–Arnold block allows precise, resolution-invariant signal recovery and improved 3D scene understanding (Mehrabian et al., 2024).
Scientific ML & PDEs: KAFs and their spline/Fourier hybrids have accelerated convergence of PINN-like solvers and provided domain-decomposed, spectrally-tuned representations for challenging inverse problems and high-dimensional regression (Noorizadegan et al., 28 Oct 2025, Zhang et al., 9 Feb 2025, Poole et al., 24 Sep 2025).

KAFs have also demonstrated parameter compression (>80% parameter reduction after projection for some regression tasks) and improved noise robustness compared to standard KANs and MLPs (Poole et al., 24 Sep 2025).

6. Computational Trade-offs, Regularization, and Challenges

KAFs present several challenges and trade-offs:

Parameter efficiency: Matrix association and basis compression reduce parameters compared to high-grid KANs, with dominant scaling $O(d_{\text{in}} d_{\text{out}})$ and minor RFF overhead (Zhang et al., 9 Feb 2025).
Computational cost: Truncated Fourier and RFF expansions require $O(K)$ or $O(m)$ trigonometric evaluations per edge per forward pass (Noorizadegan et al., 28 Oct 2025). For practical $m$ and $K$ this remains competitive with standard dense layers.
Smoothness and locality: Fourier bases offer global, infinitely smooth function classes, making KAFs ideal for periodic/oscillatory targets but less effective for sharp discontinuities without hybridization or domain decomposition (Noorizadegan et al., 28 Oct 2025).
Spectral bias: KAFs reduce the spectral bias of MLPs, facilitating faster convergence of high frequencies, but may present steeper loss landscapes, requiring careful regularization and learning rate scheduling (Noorizadegan et al., 28 Oct 2025). Layernorm, weight decay, entropy/gravitational penalties, and hybrid activations mitigate instability.
Interpretability: Projection-based frameworks (P-KANs) can auto-select optimal basis (Fourier, Chebyshev, Bessel) per edge via entropy minimization, boosting interpretability and revealing problem-specific spectral structure (Poole et al., 24 Sep 2025).

Relevant open problems include theoretically quantifying the expressivity/stability trade-off between global Fourier and local bases, developing optimal basis/bandwidth selection schemas, and scaling to ultra-high-dimensional inputs (Noorizadegan et al., 28 Oct 2025, Poole et al., 24 Sep 2025).

7. Extensions and Research Directions

Recent work suggests several avenues for future development:

Adaptive and mixed bases: Use of entropy-minimization and projection mechanisms to allow automatic discovery of optimal functional representations per edge/channel, including (but not limited to) Fourier, polynomial, and Bessel bases (Poole et al., 24 Sep 2025).
Domain decomposition: Application of KAFs within subdomains tailored to heterogeneous physics or discontinuities, and hybridization with classical PINNs or wavelet components (Noorizadegan et al., 28 Oct 2025).
Graph and attention integration: KAF layers as general-purpose building blocks in graph transformers, attention mechanisms, and non-Euclidean geometric learning (Ai et al., 2024).
Scaling and regularization: Investigation of kernel fusion, sparsity, and learnable spectral distributions for high-dimensional KAFs, as well as precise analyses of frequency learning dynamics (Noorizadegan et al., 28 Oct 2025, Zhang et al., 9 Feb 2025).
Practical toolchains: Open-source implementations and guidelines for KAF and KAN variants are actively maintained and surveyed (Noorizadegan et al., 28 Oct 2025).

A plausible implication is that continued improvements in spectral adaptation, regularization, and compression will further expand the viability of KAFs for scientific machine learning, large-scale structured prediction, and interpretable deep learning.