Kolmogorov-Arnold Networks (KAN)

Updated 27 August 2025

KANs are neural architectures that decompose multivariate continuous functions into nested univariate transformations based on the Kolmogorov-Arnold theorem.
They replace fixed weights with learnable basis expansions (e.g., B-splines, Fourier series) to enhance model expressivity and control function properties.
KANs are applied in scientific computing, symbolic regression, and physics-informed tasks, delivering improved efficiency, interpretability, and robustness.

Kolmogorov-Arnold Networks (KANs) are a neural architecture class rooted in the Kolmogorov-Arnold representation theorem, which guarantees that any multivariate continuous function can be decomposed as a finite sum of nested univariate functions. Leveraging this theoretical foundation, KANs replace the conventional fixed-weight and fixed-activation structure of multilayer perceptrons (MLPs) with a composition of learnable univariate transformations on the network’s edges. Their design aims to enhance both model expressivity and interpretability, with applications spanning from symbolic regression to operator learning and scientific computing. KANs have recently emerged as a leading alternative in deep learning frameworks for both data-driven and physics-informed tasks.

1. Theoretical Foundations and Representation

The core theoretical underpinning of KAN is the Kolmogorov-Arnold representation theorem, which asserts that any continuous function $f:\mathbb{R}^n \to \mathbb{R}$ can be written in the form

$f(x_1, \ldots, x_n) = \sum_{k=1}^{2n+1} \varphi_k \left( \sum_{j=1}^n \psi_{k,j}(x_j) \right)$

where the functions $\varphi_k$ and $\psi_{k,j}$ are continuous and univariate for each $k, j$ (Liu et al., 30 Apr 2024, Faroughi et al., 30 Jul 2025). KAN explicitly operationalizes this insight: in network architectures, each edge between nodes is replaced by a learnable univariate function—commonly a B-spline or another basis expansion—rather than a scalar weight. The neuron becomes a summation unit, aggregating the outcomes of these univariate transformations.

This architecture contrasts with MLPs, which map the input through successive layers of matrix multiplications and fixed nonlinearity $\sigma$ (e.g., ReLU), yielding $l_i(x) = \sigma(W_i x + b_i)$ . In KANs, the entire transformation is parameterized functionally: $x_{L+1} = \Phi_L x_L$ where each $(j,i)$ entry of $\Phi_L$ is replaced with a learnable function $\varphi_{L,j,i}$ (Novkin et al., 19 Mar 2025).

By parameterizing activations on edges and employing basis expansions (e.g., B-splines, Chebyshev polynomials, Fourier series), KANs endow the network with universal approximation ability and allow direct control over both local and global function properties (Liu et al., 30 Apr 2024, Ji et al., 13 Jul 2024, Warin, 4 Oct 2024, Toscano et al., 21 Dec 2024, Krzywda et al., 10 Jan 2025, Kratsios et al., 21 Apr 2025).

2. Architectural Design and Variants

In canonical KANs, each “weight” is replaced by a one-dimensional, learnable function, typically represented as a linear combination of basis functions: $\phi(x) = \sum_{i} c_i B_i(x)$ with $\{B_i\}$ a basis such as B-splines or Chebyshev polynomials, and $c_i$ trainable coefficients. In Fourier KANs (FKANs), these are replaced with sine and cosine expansions (Novkin et al., 19 Mar 2025), while FC-KAN (Ta et al., 3 Sep 2024) and DeepOKAN (Faroughi et al., 30 Jul 2025) combine different basis families (e.g., B-splines, wavelets, RBFs) using element-wise or quadratic interactions.

The architectural flexibility of KANs extends to various forms:

Residual KANs (“Res-KAN”): incorporate residual/skip connections, stabilizing training and enabling approximation of Besov-space regular functions (Kratsios et al., 21 Apr 2025).
Hybrid models: embed KAN layers within MLPs or CNNs for improved performance and robustness (Dong et al., 14 Aug 2024, Krzywda et al., 10 Jan 2025).
KAN-Mixer: adapts KANs for vision tasks, replacing MLP-Mixer layers with KAN blocks (Cheon, 21 Jun 2024).
Graph KAN (GKAN), Temporal KAN (T-KAN), Deep Operator KAN (DeepOKAN), and others, each tailored for structured, temporal, or operator-learning tasks (Ji et al., 13 Jul 2024, Toscano et al., 21 Dec 2024, Faroughi et al., 30 Jul 2025).

Activation functions can be further regularized for improved robustness, e.g., via smoothness penalties (penalizing $\int (d^2S(x)/dx^2)^2 dx$ for spline $S(x)$ ) and stochastic “Segment Deactivation” (linearizing a spline segment during training at random) (Cang et al., 11 Nov 2024).

KANs can be structured either shallowly (few layers, each capturing complex univariate structure) or in deeper, modular hierarchies to capture compositional structure and richer function classes (Liu et al., 30 Apr 2024, Ji et al., 13 Jul 2024, Toscano et al., 21 Dec 2024).

3. Learning, Generalization, and Robustness

KANs exhibit favorable theoretical and practical generalization properties. For spline- or RKHS-based activations, covering number arguments yield generalization bounds that scale polynomially with the $l_1$ norms of the expansion coefficients and with the product of Lipschitz constants of functions in each layer, but only logarithmically with the number of nodes or basis functions. This is in contrast to MLPs, where parameter count enters the bounds more directly (Zhang et al., 10 Oct 2024).

Empirical studies confirm that KANs often achieve lower excess loss and improved parameter efficiency compared to MLPs on both synthetic and real-world data (MNIST, CIFAR-10) (Zhang et al., 10 Oct 2024, Faroughi et al., 30 Jul 2025). Residual KANs achieve dimension-free sample complexity for learning Besov-regular functions; that is, under appropriate smoothness, the number of required samples does not scale exponentially with input dimension (Kratsios et al., 21 Apr 2025).

Robustness is also improved: KANs and hybrid KAN/MLP architectures attain lower attack success rates under PGD-type adversarial attacks, attributed to lower Lipschitz constants and the local support of spline functions. Ablation studies demonstrate that simpler spline grid choices and the presence of a strong “base” (e.g., SiLU) component are critical for both accuracy and adversarial resilience (Dong et al., 14 Aug 2024).

4. Applications Across Domains

KANs have been validated across a diverse array of tasks:

Scientific Machine Learning and Operator Learning: Surrogate modeling for PDEs (e.g., elasto-plasticity, fluid dynamics), operator learning (via dual “branch–trunk” architecture), and data-driven discovery of physical laws (Faroughi et al., 30 Jul 2025, Toscano et al., 21 Dec 2024, Liu et al., 30 Apr 2024).
Symbolic Regression: Fitting explicit symbolic formulas from data, with “symbolic regression” built into visualization/pruning interfaces (Liu et al., 30 Apr 2024, Novkin et al., 19 Mar 2025).
Time Series Classification and Prediction: KANs, as well as Temporal KAN variants, excel in univariate and multivariate time series problems, often outperforming same-size MLPs or CNNs (Dong et al., 14 Aug 2024, Ji et al., 13 Jul 2024).
Computer Vision and Image Classification: KANs (and KAN-Mixer) demonstrate competitive or superior accuracy to MLP-Mixers and CNNs on MNIST, CIFAR10, and NEU surface defect datasets, doing so with fewer parameters and offering faster convergence (Cheon, 21 Jun 2024, Cang et al., 11 Nov 2024, Krzywda et al., 10 Jan 2025).
Physics-Informed Modeling: Physics-informed KANs (PIKANs) encode constraints and conservation laws into network losses, ensuring physically consistent predictions (Faroughi et al., 30 Jul 2025, Toscano et al., 21 Dec 2024, Novkin et al., 19 Mar 2025).
Graph Learning and Hyperspectral Image Classification: GKANs and SpectralKANs extend KAN principles to graph-structured and hyperspectral data, with notable gains in interpretability and efficiency (Ji et al., 13 Jul 2024).
Transistor Compact Modeling: KANs surpass MLPs and Fourier KANs in predicting device currents and charges, additionally yielding interpretable symbolic formulas representing learned patterns in compact models (Novkin et al., 19 Mar 2025).
Quantum Machine Learning: Variational and Enhanced Variational Quantum KANs (VQKAN/EVQKAN) transpose KAN principles onto parameterized quantum circuits, enabling efficient, robust learning on near-term quantum devices (Wakaura et al., 27 Mar 2025, Wakaura et al., 28 Mar 2025).

5. Computational Aspects and Scalability

KANs, while expressive, historically incur greater computational costs due to the need to evaluate many spline or basis expansions per data point. Conventional implementations suffer from the sequential nature of B-spline evaluation (Cox-de Boor recursion), especially for high spline degrees.

MatrixKAN resolves this bottleneck by recasting all B-spline evaluations as precomputed matrix multiplications, deeply parallelizable on modern compute hardware. This approach reduces computational complexity from $O(Lk)$ to $O(L)$ per layer in the spline degree $k$ , achieving speedups of up to $40\times$ (for large $k$ or datasets) (Coffman et al., 11 Feb 2025).

KANs can be integrated into existing deep learning frameworks, but require careful hyperparameter tuning (basis type, grid size, spline/fourier degree), and careful regularization to avoid overfitting, especially in noisy, high-dimensional, or data-sparse settings (Ji et al., 13 Jul 2024, Cang et al., 11 Nov 2024, Faroughi et al., 30 Jul 2025).

Recent work addresses architectural scalability by enabling KANs to select the number of basis functions dynamically (InfinityKAN), using variational inference and interpolated weights to grow or shrink model complexity during training (Alesiani et al., 3 Jul 2025).

6. Interpretability, Visualization, and Symbolic Regression

A hallmark of KANs is the transparent mapping of each edge’s transformation: after training, the learned univariate functions can be visualized, pruned (via $L_1$ or entropy penalties), or “symbolified” (e.g., mapped to canonical functions $\sin$ , $\exp$ , $\log$ ) (Liu et al., 30 Apr 2024, Ji et al., 13 Jul 2024, Novkin et al., 19 Mar 2025). This grants KANs natural support for scientific interpretation—mathematicians and domain scientists may “inspect” which operations are actually being applied to each input dimension.

Techniques such as interactive “symbolic snapping” and iterative fixation of the least-accurate functions into closed-form expressions are proposed to incrementally convert networks into fully symbolic models, without prohibitive loss of accuracy (Novkin et al., 19 Mar 2025).

This interpretability stands in contrast to the black-box nature of classical MLPs, and is of direct importance in scientific modeling, engineering diagnostics, and computational biomedicine (Faroughi et al., 30 Jul 2025, Ji et al., 13 Jul 2024, Toscano et al., 21 Dec 2024).

7. Future Directions and Open Challenges

Despite strong empirical and theoretical results, several challenges remain:

Computational cost: The high memory and compute demand per layer still limits practical scaling, particularly for high-dimensional problems or large basis expansions.
Hyperparameter sensitivity: The accuracy and convergence of KANs are sensitive to basis selection, grid resolution, and architecture width/depth; automating or adaptively tuning these remains a priority (Ji et al., 13 Jul 2024, Alesiani et al., 3 Jul 2025, Faroughi et al., 30 Jul 2025).
Integration with mainstream frameworks: Many libraries are optimized for dense matrix operations; KANs' modular and separable structure requires new engineering for efficient GPU/TPU usage.
Theory: While generalization and approximation results are improving, there remain open questions about convergence, expressivity, NTK dynamics, and generalization in deep KANs across arbitrary architectures (Zhang et al., 10 Oct 2024, Kratsios et al., 21 Apr 2025).
Robustness and regularization: Effective techniques (such as smoothness penalties or deactivation regularization) are critical for noisy data and adversarial contexts (Cang et al., 11 Nov 2024, Dong et al., 14 Aug 2024).
Application breadth: Ongoing work seeks to expand use cases further into operator learning, real-time scientific computing, and quantum information, with promising initial results but many unsolved scale and deployment challenges.

Looking ahead, research directions include developing hardware-aware modules, improving theoretical underpinnings (e.g., NTK analyses, approximation guarantees), automating function/basis selection, hybridizing with Fourier or convolutional operators, and advancing distributed or parallelized implementations (Coffman et al., 11 Feb 2025, Alesiani et al., 3 Jul 2025, Faroughi et al., 30 Jul 2025).

KANs thus represent a theoretically principled and practically flexible architecture bridging universal function approximation, interpretability, and efficiency across modern scientific and machine learning practice. Their ongoing evolution is poised to address key limitations of traditional MLPs and to catalyze new modes of interpretable, data-efficient deep learning in scientific and engineering domains.