Kolmogorov–Arnold Networks Overview

Updated 10 November 2025

Kolmogorov–Arnold Networks are neural architectures that replace fixed edge weights with learnable univariate functions for adaptive and interpretable function approximation.
They employ spline, Gaussian RBF, and analytic bases to capture complex behaviors in tasks such as regression, PDE modeling, and graph learning while reducing parameter count.
Empirical evaluations reveal steep error decay and enhanced interpretability, although training deeper KANs may require specialized regularization and initialization techniques.

Kolmogorov–Arnold Networks (KANs) are a class of neural architectures constructed to directly embody the Kolmogorov–Arnold representation theorem. Where conventional multi-layer perceptrons deploy fixed, node-based nonlinearities and static matrix weights, KANs replace each edge weight with a learnable univariate function—typically parameterized by a spline, polynomial, or a composition of fast analytic functions. This edge-centric formulation enables adaptive, local, and highly expressive function approximation, offering advantages in parameter efficiency, scaling, and interpretability. KANs have proven effective across diverse domains, including function regression, partial differential equation (PDE) modeling, time-series analysis, graph learning, and physics-informed neural computing. The framework admits a rapidly evolving ecosystem of architectural variants, learning strategies, and basis-function choices, and it is actively influencing next-generation research on hybrid, interpretable, and efficient deep learning.

1. Mathematical Foundations and the Kolmogorov–Arnold Theorem

The theoretical underpinning of KANs is the Kolmogorov–Arnold superposition theorem, formalized as follows: for any continuous function $f:[0,1]^n\rightarrow\mathbb{R}$ , there exist continuous univariate functions $\phi_{q,p}:[0,1]\to\mathbb{R}$ (inner maps) and $\Phi_q:\mathbb{R}\to\mathbb{R}$ (outer maps), $q=0,\ldots,2n$ , $p=1,\ldots,n$ , such that

$f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Phi_q\left( \sum_{p=1}^n \phi_{q,p}(x_p) \right).$

This theorem provides a constructive decomposition of any multivariate function into a finite sum of univariate operations and additions. Modern KANs generalize this representation to deep architectures, but the summation–composition structure remains central (Kiamari et al., 10 Jun 2024, Basina et al., 15 Nov 2024, Liu et al., 30 Apr 2024). The availability of a finite, explicit composition-sum structure is in principle robust to input dimensionality, laying the groundwork for architectures that can theoretically evade the curse of dimensionality.

2. Core KAN Architecture and Model Specification

A KAN replaces all static, scalar (matrix) weights in a neural layer with learnable univariate functions on each edge. The canonical KAN layer takes an input vector $x_l\in\mathbb{R}^{n_l}$ and outputs

$x_{l+1,j} = \sum_{i=1}^{n_l} \phi_{l,j,i}(x_{l,i}), \quad j=1, \ldots, n_{l+1},$

where each $\phi_{l,j,i}:\mathbb{R}\to\mathbb{R}$ is a learnable (often spline-parameterized) function specific to the edge from input $i$ to output $j$ (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024, Emelianova et al., 7 Aug 2025).

A typical parameterization for each univariate edge function is the residual-spline form: $\phi(x) = w_b\,b(x) + w_s \sum_{m=1}^g c_m B_m^{(k)}(x),$ where $b(x)$ is a fixed basis (e.g., $\mathrm{SiLU}(x)$ or linear), $B_m^{(k)}(x)$ are B-spline basis functions of degree $k$ over $g$ intervals, $c_m$ are trainable coefficients, and $w_b$ , $w_s$ are trainable mixing weights (Liu et al., 30 Apr 2024, Kiamari et al., 10 Jun 2024). During backpropagation, all basis coefficients and mixing weights are updated end-to-end.

The network may be shallow (as in the literal Kolmogorov–Arnold decomposition—two layers: $[n, 2n+1, 1]$ ) or deep with multiple stacked layers, each preserving the edgewise, local univariate functional structure.

KANs contrast with MLPs and GCNs, where edge contributions are fixed scalars and nonlinearity arises solely from nodewise activations. Here, all nonlinearity, locality, and expressivity reside on the edges (Basina et al., 15 Nov 2024).

3. Basis Function Choices and Computational Strategies

KAN flexibility is fundamentally controlled via the choice of univariate basis for edge activations. Prominent bases include:

B-splines: The standard, locally-supported, piecewise-polynomial choice affording high-order smoothness and local adaptivity (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024).
Gaussian RBFs: FastKAN replaces third-order B-splines with scaled Gaussians, resulting in RBF networks that have similar expressivity but dramatically improved evaluation/transformation efficiency on GPU hardware (Li, 10 May 2024).
Fast analytical bases: Recent work explores combinations of ReLU, $\sin$ , $\cos$ , and $\arctan$ , which are directly optimized in deep learning frameworks and compatible with tensor-core evaluation, leading to 2–5× speed-up per evaluation with minimal loss in accuracy for standard benchmarks (Ta et al., 16 Aug 2025).
Chebyshev, Jacobi, Fourier, and wavelet bases: Support efficient global approximations in specialized domains (e.g., periodic functions, spectral operator learning).

The basis function ensemble and their aggregation (sum vs. product) can dramatically alter both accuracy and training/inference times (Ta et al., 16 Aug 2025). B-splines provide the most interpretable, local expansions, while analytic bases optimize for computational throughput (Noorizadegan et al., 28 Oct 2025).

4. Regularization, Training Dynamics, and Theoretical Guarantees

KANs support a variety of regularization techniques tailored for structured, edge-wise parameterizations:

$\ell_1$ or entropy penalties on spline coefficients drive sparsity and contribution diversity across the edge basis functions (Liu et al., 30 Apr 2024, Bodner et al., 19 Jun 2024).
Ridge/L2 on all base weights for stability.
Grid extension mechanisms and batch/layer normalization to accommodate out-of-domain or nonstationary input statistics (Bodner et al., 19 Jun 2024).

Empirically, KANs are prone to less stable training dynamics compared to MLPs, with higher variance in gradient norm and test accuracy unless initialization and optimizer protocols are tuned carefully (e.g., Kaiming-normal, low learning rates, Adam with warm-up, moderate dropout, residual connections) (Sohail, 8 Nov 2024). Beyond four layers, KANs may overfit and exhibit degraded stability unless architectural or training refinements are introduced.

Theoretical generalization bounds for KANs have been established for both basis-function and low-rank RKHS parameterizations. In contrast to conventional VC-dimension arguments, the dominant dependencies are on $\ell_1$ norms of coefficient matrices and Lipschitz constants, with only logarithmic dependence on layer widths or basis function counts. This confirms the practical possibility of wide KANs without incurring capacity explosion, provided norms and operator smoothness are controlled (Zhang et al., 10 Oct 2024).

For functions in Besov spaces, KANs with sufficient residual layers and variable-smoothness splines achieve optimal $N^{-(s-\alpha)/d}$ rates in weaker Besov norms, and sample complexity bounds that are dimension-robust under high smoothness (Kratsios et al., 21 Apr 2025).

5. Architectural Variants and Integration with Other Neural Models

Numerous architectural innovations have extended the reach of KANs:

Temporal-KAN: Incorporates explicit time-dependence for time-series forecasting.
PDE-KAN: Utilizes physics-informed loss design to enforce residual and boundary constraints, often employing augmented Lagrangian or residual-based attention mechanisms for robustness in scientific computing (Noorizadegan et al., 28 Oct 2025).
Graph KAN (GKAN): Applies KAN principles on graphs, replacing GCN scalar weight matrices with learnable univariate edgewise functions, providing improved accuracy and parameter efficiency for node classification and graph inference tasks (Kiamari et al., 10 Jun 2024).
Convolutional KAN (CKAN): Embeds edgewise spline functions into convolutional layers for image recognition, achieving up to 50% reduction in parameter count compared to conventional CNNs while maintaining comparable test accuracy (Bodner et al., 19 Jun 2024).
Autoencoder and Operator Learning: KANs as autoencoders yield more interpretable latent features and competitive or lower reconstruction error with respect to standard CNN or MLP autoencoders (Moradi et al., 2 Oct 2024); for operator-learning tasks, KAN and hybrid KKAN architectures outperform both MLP and original KANs (Toscano et al., 21 Dec 2024).
Hybrid and Multifidelity KANs: Integrating KAN with MLP inner blocks (KKAN), domain decomposition, or multifidelity training further extend expressivity and data efficiency, particularly in physical simulation or high-dimensional regression (Toscano et al., 21 Dec 2024, Howard et al., 18 Oct 2024).

6. Empirical Results, Scaling, and Interpretability

Empirical evaluations of KANs across regression, classification, PDE solving, and graph tasks consistently demonstrate:

Superior or comparable accuracy to corresponding MLPs and CNNs at a fraction (10–90%) of the parameter count, particularly pronounced for moderate to high-dimensional problems (Liu et al., 30 Apr 2024, Emelianova et al., 7 Aug 2025, Sohail, 8 Nov 2024, Krzywda et al., 10 Jan 2025).
Steeper neural scaling laws: As parameter count increases, error decays at rates up to $N_\text{params}^{-4}$ for KANs, far outpacing the typical $N_\text{params}^{-1}$ in ReLU/SiLU MLPs (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024).
Interpretability: Each edge function can be visualized and, in some contexts, analytically fit to a small dictionary of symbolic forms. This direct interpretability enables KANs to serve as automated discovery engines for scientific and engineering relations (Liu et al., 30 Apr 2024, Emelianova et al., 7 Aug 2025, Fronzi et al., 3 Oct 2025).
Computational cost: Spline-based KANs incur higher training/inference cost relative to MLPs/CNNs; this can be mitigated via RBF/Gaussian bases (FastKAN), fast analytic bases, or eventual custom GPU kernels (Li, 10 May 2024, Ta et al., 16 Aug 2025).

7. Challenges, Limitations, and Future Directions

While KANs offer multiple technical advances, challenges remain:

Computational overhead: Spline evaluation, regularization, and dynamic grid extension slow training relative to standard neural networks; limited GPU kernel support for nontrivial bases increases wall-clock times (Bodner et al., 19 Jun 2024, Emelianova et al., 7 Aug 2025).
Training stability: Deeper KANs can be unstable without specific architectural and initialization safeguards (Sohail, 8 Nov 2024).
Extension beyond medium-scale problems: Empirical studies have chiefly centered on toy, synthetic, or moderate-scale application datasets; further work is needed on large-scale, high-dimensional, or multimodal tasks (Ta et al., 16 Aug 2025, Noorizadegan et al., 28 Oct 2025).
Open theory: Deeper theoretical understanding is needed regarding finite-width regimes, mixed bases across layers, basis selection, compositional expressivity, and identifiability (Noorizadegan et al., 28 Oct 2025).
Ecosystem integration: Ongoing research targets seamless integration with convolutional, recurrent, or transformer-based systems, as well as the synthesis of efficient open-source libraries for practical deployment (Noorizadegan et al., 28 Oct 2025).

Continued work is motivated by the prospects of efficient symbolic regression, robust operator learning, data-efficient scientific modeling, and the unification of interpretability and expressiveness in deep neural architectures.