Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 37 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 111 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 243 tok/s Pro
2000 character limit reached

Kolmogorov-Arnold Networks (KAN)

Updated 27 August 2025
  • KANs are neural architectures that decompose multivariate continuous functions into nested univariate transformations based on the Kolmogorov-Arnold theorem.
  • They replace fixed weights with learnable basis expansions (e.g., B-splines, Fourier series) to enhance model expressivity and control function properties.
  • KANs are applied in scientific computing, symbolic regression, and physics-informed tasks, delivering improved efficiency, interpretability, and robustness.

Kolmogorov-Arnold Networks (KANs) are a neural architecture class rooted in the Kolmogorov-Arnold representation theorem, which guarantees that any multivariate continuous function can be decomposed as a finite sum of nested univariate functions. Leveraging this theoretical foundation, KANs replace the conventional fixed-weight and fixed-activation structure of multilayer perceptrons (MLPs) with a composition of learnable univariate transformations on the network’s edges. Their design aims to enhance both model expressivity and interpretability, with applications spanning from symbolic regression to operator learning and scientific computing. KANs have recently emerged as a leading alternative in deep learning frameworks for both data-driven and physics-informed tasks.

1. Theoretical Foundations and Representation

The core theoretical underpinning of KAN is the Kolmogorov-Arnold representation theorem, which asserts that any continuous function f:RnRf:\mathbb{R}^n \to \mathbb{R} can be written in the form

f(x1,,xn)=k=12n+1φk(j=1nψk,j(xj))f(x_1, \ldots, x_n) = \sum_{k=1}^{2n+1} \varphi_k \left( \sum_{j=1}^n \psi_{k,j}(x_j) \right)

where the functions φk\varphi_k and ψk,j\psi_{k,j} are continuous and univariate for each k,jk, j (Liu et al., 30 Apr 2024, Faroughi et al., 30 Jul 2025). KAN explicitly operationalizes this insight: in network architectures, each edge between nodes is replaced by a learnable univariate function—commonly a B-spline or another basis expansion—rather than a scalar weight. The neuron becomes a summation unit, aggregating the outcomes of these univariate transformations.

This architecture contrasts with MLPs, which map the input through successive layers of matrix multiplications and fixed nonlinearity σ\sigma (e.g., ReLU), yielding li(x)=σ(Wix+bi)l_i(x) = \sigma(W_i x + b_i). In KANs, the entire transformation is parameterized functionally: xL+1=ΦLxLx_{L+1} = \Phi_L x_L where each (j,i)(j,i) entry of ΦL\Phi_L is replaced with a learnable function φL,j,i\varphi_{L,j,i} (Novkin et al., 19 Mar 2025).

By parameterizing activations on edges and employing basis expansions (e.g., B-splines, Chebyshev polynomials, Fourier series), KANs endow the network with universal approximation ability and allow direct control over both local and global function properties (Liu et al., 30 Apr 2024, Ji et al., 13 Jul 2024, Warin, 4 Oct 2024, Toscano et al., 21 Dec 2024, Krzywda et al., 10 Jan 2025, Kratsios et al., 21 Apr 2025).

2. Architectural Design and Variants

In canonical KANs, each “weight” is replaced by a one-dimensional, learnable function, typically represented as a linear combination of basis functions: ϕ(x)=iciBi(x)\phi(x) = \sum_{i} c_i B_i(x) with {Bi}\{B_i\} a basis such as B-splines or Chebyshev polynomials, and cic_i trainable coefficients. In Fourier KANs (FKANs), these are replaced with sine and cosine expansions (Novkin et al., 19 Mar 2025), while FC-KAN (Ta et al., 3 Sep 2024) and DeepOKAN (Faroughi et al., 30 Jul 2025) combine different basis families (e.g., B-splines, wavelets, RBFs) using element-wise or quadratic interactions.

The architectural flexibility of KANs extends to various forms:

Activation functions can be further regularized for improved robustness, e.g., via smoothness penalties (penalizing (d2S(x)/dx2)2dx\int (d^2S(x)/dx^2)^2 dx for spline S(x)S(x)) and stochastic “Segment Deactivation” (linearizing a spline segment during training at random) (Cang et al., 11 Nov 2024).

KANs can be structured either shallowly (few layers, each capturing complex univariate structure) or in deeper, modular hierarchies to capture compositional structure and richer function classes (Liu et al., 30 Apr 2024, Ji et al., 13 Jul 2024, Toscano et al., 21 Dec 2024).

3. Learning, Generalization, and Robustness

KANs exhibit favorable theoretical and practical generalization properties. For spline- or RKHS-based activations, covering number arguments yield generalization bounds that scale polynomially with the l1l_1 norms of the expansion coefficients and with the product of Lipschitz constants of functions in each layer, but only logarithmically with the number of nodes or basis functions. This is in contrast to MLPs, where parameter count enters the bounds more directly (Zhang et al., 10 Oct 2024).

Empirical studies confirm that KANs often achieve lower excess loss and improved parameter efficiency compared to MLPs on both synthetic and real-world data (MNIST, CIFAR-10) (Zhang et al., 10 Oct 2024, Faroughi et al., 30 Jul 2025). Residual KANs achieve dimension-free sample complexity for learning Besov-regular functions; that is, under appropriate smoothness, the number of required samples does not scale exponentially with input dimension (Kratsios et al., 21 Apr 2025).

Robustness is also improved: KANs and hybrid KAN/MLP architectures attain lower attack success rates under PGD-type adversarial attacks, attributed to lower Lipschitz constants and the local support of spline functions. Ablation studies demonstrate that simpler spline grid choices and the presence of a strong “base” (e.g., SiLU) component are critical for both accuracy and adversarial resilience (Dong et al., 14 Aug 2024).

4. Applications Across Domains

KANs have been validated across a diverse array of tasks:

5. Computational Aspects and Scalability

KANs, while expressive, historically incur greater computational costs due to the need to evaluate many spline or basis expansions per data point. Conventional implementations suffer from the sequential nature of B-spline evaluation (Cox-de Boor recursion), especially for high spline degrees.

MatrixKAN resolves this bottleneck by recasting all B-spline evaluations as precomputed matrix multiplications, deeply parallelizable on modern compute hardware. This approach reduces computational complexity from O(Lk)O(Lk) to O(L)O(L) per layer in the spline degree kk, achieving speedups of up to 40×40\times (for large kk or datasets) (Coffman et al., 11 Feb 2025).

KANs can be integrated into existing deep learning frameworks, but require careful hyperparameter tuning (basis type, grid size, spline/fourier degree), and careful regularization to avoid overfitting, especially in noisy, high-dimensional, or data-sparse settings (Ji et al., 13 Jul 2024, Cang et al., 11 Nov 2024, Faroughi et al., 30 Jul 2025).

Recent work addresses architectural scalability by enabling KANs to select the number of basis functions dynamically (InfinityKAN), using variational inference and interpolated weights to grow or shrink model complexity during training (Alesiani et al., 3 Jul 2025).

6. Interpretability, Visualization, and Symbolic Regression

A haLLMark of KANs is the transparent mapping of each edge’s transformation: after training, the learned univariate functions can be visualized, pruned (via L1L_1 or entropy penalties), or “symbolified” (e.g., mapped to canonical functions sin\sin, exp\exp, log\log) (Liu et al., 30 Apr 2024, Ji et al., 13 Jul 2024, Novkin et al., 19 Mar 2025). This grants KANs natural support for scientific interpretation—mathematicians and domain scientists may “inspect” which operations are actually being applied to each input dimension.

Techniques such as interactive “symbolic snapping” and iterative fixation of the least-accurate functions into closed-form expressions are proposed to incrementally convert networks into fully symbolic models, without prohibitive loss of accuracy (Novkin et al., 19 Mar 2025).

This interpretability stands in contrast to the black-box nature of classical MLPs, and is of direct importance in scientific modeling, engineering diagnostics, and computational biomedicine (Faroughi et al., 30 Jul 2025, Ji et al., 13 Jul 2024, Toscano et al., 21 Dec 2024).

7. Future Directions and Open Challenges

Despite strong empirical and theoretical results, several challenges remain:

  • Computational cost: The high memory and compute demand per layer still limits practical scaling, particularly for high-dimensional problems or large basis expansions.
  • Hyperparameter sensitivity: The accuracy and convergence of KANs are sensitive to basis selection, grid resolution, and architecture width/depth; automating or adaptively tuning these remains a priority (Ji et al., 13 Jul 2024, Alesiani et al., 3 Jul 2025, Faroughi et al., 30 Jul 2025).
  • Integration with mainstream frameworks: Many libraries are optimized for dense matrix operations; KANs' modular and separable structure requires new engineering for efficient GPU/TPU usage.
  • Theory: While generalization and approximation results are improving, there remain open questions about convergence, expressivity, NTK dynamics, and generalization in deep KANs across arbitrary architectures (Zhang et al., 10 Oct 2024, Kratsios et al., 21 Apr 2025).
  • Robustness and regularization: Effective techniques (such as smoothness penalties or deactivation regularization) are critical for noisy data and adversarial contexts (Cang et al., 11 Nov 2024, Dong et al., 14 Aug 2024).
  • Application breadth: Ongoing work seeks to expand use cases further into operator learning, real-time scientific computing, and quantum information, with promising initial results but many unsolved scale and deployment challenges.

Looking ahead, research directions include developing hardware-aware modules, improving theoretical underpinnings (e.g., NTK analyses, approximation guarantees), automating function/basis selection, hybridizing with Fourier or convolutional operators, and advancing distributed or parallelized implementations (Coffman et al., 11 Feb 2025, Alesiani et al., 3 Jul 2025, Faroughi et al., 30 Jul 2025).


KANs thus represent a theoretically principled and practically flexible architecture bridging universal function approximation, interpretability, and efficiency across modern scientific and machine learning practice. Their ongoing evolution is poised to address key limitations of traditional MLPs and to catalyze new modes of interpretable, data-efficient deep learning in scientific and engineering domains.