Papers
Topics
Authors
Recent
2000 character limit reached

Kolmogorov–Arnold Networks (KANs)

Updated 22 November 2025
  • Kolmogorov–Arnold Networks (KANs) are neural architectures that replace scalar weights with learnable univariate spline functions to represent multivariate functions via additive aggregation.
  • KANs achieve parameter efficiency and transparency by decomposing complex functions into O(n²) univariate components, offering enhanced interpretability and efficiency.
  • Empirical studies show that KANs can outperform or match conventional MLPs and CNNs across tasks like time series forecasting, computer vision, and scientific modeling.

Kolmogorov–Arnold Networks (KANs) are a class of neural architectures that replace conventional scalar weights with learnable univariate functions on each network edge, typically implemented as spline basis expansions. This design is rooted in the Kolmogorov–Arnold representation theorem, which guarantees that any continuous multivariate function can be exactly represented as a finite sum of compositions of univariate functions and additions. The KAN architecture enables parameter-efficient, universally approximating models with strong intrinsic interpretability and has been empirically demonstrated to outperform or match conventional multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) on a variety of scientific, engineering, and machine learning tasks (Liu et al., 30 Apr 2024, Noorizadegan et al., 28 Oct 2025, Vaca-Rubio et al., 14 May 2024, Liu et al., 16 Jun 2024, Krzywda et al., 10 Jan 2025).

1. Theoretical Foundations and Representation Theorem

At the core of KANs is the Kolmogorov–Arnold superposition theorem. For any continuous f ⁣:[0,1]nRf \colon [0,1]^n \to \mathbb{R}, there exist continuous univariate functions Ψp,q\Psi_{p,q} and outer functions Φq\Phi_q such that

f(x1,...,xn)=q=02nΦq(p=1nΨp,q(xp)).f(x_1, ..., x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^n \Psi_{p,q}(x_p)\right).

This decomposition reduces high-dimensional function learning to the training of O(n2)\mathcal{O}(n^2) univariate mappings and additive aggregation. In practice, KAN implementations further expand these univariate functions as B-splines or other bases, enabling efficient and controlled universal approximation (Liu et al., 30 Apr 2024, Noorizadegan et al., 28 Oct 2025).

2. Architecture and Parameterization

In KANs, every connection (edge) in the neural network graph is equipped with a learnable univariate function, fe(x)f_e(x), parameterized by a flexible basis such as cubic B-splines:

fe(x)=i=1Pce,iBi(x),f_{e}(x) = \sum_{i=1}^{P} c_{e,i} B_{i}(x),

where {Bi}i=1P\{B_i\}_{i=1}^P are fixed B-spline basis functions on a uniform grid, and ce,ic_{e,i} are learnable coefficients (Liu et al., 16 Jun 2024). At each node, incoming values are summed:

yv=evfe(xe),y_v = \sum_{e \to v} f_{e}(x_e),

with no further nonlinearity unless an explicit outer function (e.g., in certain KAN variants) is applied.

Typical KANs use spline grid sizes of P=5P=5–12 with cubic (k=3k=3) B-splines, though other bases such as Chebyshev and Jacobi polynomials, ReLU-power functions, and radial basis functions have been investigated (Noorizadegan et al., 28 Oct 2025).

Compared to MLPs, the parameter count for a KAN layer with ninn_{in} inputs and noutn_{out} outputs and GG basis elements per function is O(ninnoutG)\mathcal{O}(n_{in} n_{out} G), but effective model sizes are smaller due to increased expressivity per parameter and empirical parameter savings after pruning and basis selection (Liu et al., 30 Apr 2024, Krzywda et al., 10 Jan 2025).

3. Empirical Performance and Application Domains

KANs have been systematically benchmarked in diverse scientific and engineering tasks:

  • Time Series Analysis and Forecasting: KANs outperform same-depth MLPs in neural forecasting tasks (e.g., satellite traffic), achieving lower errors with up to threefold fewer parameters. Edgewise spline activations give KANs the ability to quickly adapt to local patterns in the data (Vaca-Rubio et al., 14 May 2024, Vaca-Rubio et al., 19 Oct 2025).
  • Computer Vision and Signal Processing: In image tasks such as metal surface defect classification and Fashion-MNIST, KAN-based models surpass CNN baselines in accuracy–parameter efficiency tradeoff, converge faster, and fit small datasets more robustly. Extensions such as convolutional KANs further broaden applicability (Bodner et al., 19 Jun 2024, Krzywda et al., 10 Jan 2025, Drokin, 1 Jul 2024).
  • Feature Extraction for Sequential Data: For IMU-based human activity recognition, KAN-based feature extractors yield 1–5% higher Macro-F1 scores than CNNs while using 5–20× fewer parameters, with best results on complex multi-IMU datasets (Liu et al., 16 Jun 2024).
  • Scientific Modeling and Constitutive Learning: Input-convex KANs capture hyperelastic material laws, ensuring polyconvexity, interpretability, and robust finite element integration. Closed-form constitutive relations can be extracted via symbolic regression of spline activations (Thakolkaran et al., 7 Mar 2025).
  • Materials Science and Property Prediction: KANs provide explicit, symbolic surrogate models for high-dimensional structure–property mappings in thermoelectric materials design, enabling both accurate prediction and transparent reverse engineering (Fronzi et al., 3 Oct 2025).
  • Imbalanced Data and Intrusion Detection: KANs can outperform MLPs on raw, severely imbalanced classification data (higher F1 and balanced accuracy), though they are highly sensitive to classical resampling (SMOTE, Tomek) and focal loss techniques, which degrade KAN performance (Yadav et al., 18 Jul 2025). In IoT intrusion detection, KANs achieve perfect recall and unique symbolic interpretability relative to standard MLPs and tree-based ensemble methods (Emelianova et al., 7 Aug 2025).
  • Reinforcement Learning: KANs replace standard actor/critic MLPs in PPO, achieving on-policy learning performance with two orders of magnitude fewer parameters, making them attractive for memory- and compute-constrained deployments (Kich et al., 9 Aug 2024).

4. Expressivity, Scaling Laws, and Theoretical Properties

The spline-based edge parametrization of KANs yields strong neural scaling laws. For kk-th order smooth activations, the test RMSE scales as O(Gk1)O(G^{-k-1}) in the number of spline grid intervals GG, independent of the input dimension. Empirical studies find that KANs achieve the fastest reported scaling exponent (α=k+1\alpha = k+1 for kk-th order splines) (Liu et al., 30 Apr 2024). Minimax-optimal convergence rates O(n2r/(2r+1))O(n^{-2r/(2r+1)}) are achieved for functions in Sobolev spaces of smoothness rr (Liu et al., 24 Sep 2025).

Smoothness, domain priors, and symmetry can be imposed explicitly (e.g., input-convexity, structural knowledge, permutation equivariance), to further constrain KAN expressivity and data efficiency (Samadi et al., 18 May 2024, Elbaz et al., 29 Sep 2025). Notably, KANs achieve expressive power at finite width with O(n2) univariate components, while interpretability is retained through transparent one-dimensional function visualization and symbolic regression (Fronzi et al., 3 Oct 2025, Liu et al., 16 Jun 2024).

5. Architectural Variants, Initialization Schemes, and Software

Several KAN variants have been developed for scientific and engineering tasks:

  • Multi-Exit KANs (ME-KANs): Attach prediction heads to each layer to provide deep supervision and automatic model parsimony selection. Multi-exit architectures consistently outperform single-exit KANs and reveal that many problems can be solved by shallower, more interpretable KANs (Bagrow et al., 3 Jun 2025).
  • Permutation Equivariant / Invariant KANs: Function sharing across group orbits implements exact equivariance/invariance for arbitrary permutation groups, matching parameter-sharing MLP expressivity while improving generalization in low-data domains (Elbaz et al., 29 Sep 2025).
  • Input-Convex and Polyconvex KANs: Enforce monotonicity and convexity of spline activations, producing physically admissible models for PDE-constrained learning and material law discovery (Thakolkaran et al., 7 Mar 2025).
  • Multifidelity and Physics-Informed KANs: Incorporate low-fidelity and physics-based priors via architectural composition, reducing high-fidelity sample needs and improving out-of-sample robustness (Howard et al., 18 Oct 2024).

Initialization strategies are critical: Glorot-inspired variance balancing and empirically tuned power-law initialization families outperform naive or LeCun-initialized splines, yielding better neural tangent kernel conditioning and faster convergence (Rigas et al., 3 Sep 2025).

Efficient implementations use Gaussian RBFs as surrogates for B-splines (FastKAN) (Li, 10 May 2024), and open-source frameworks exist for diverse KAN architectures and bases, including PyKAN, TorchKAN, FastKAN, JAX-KAN, ChebyKAN, ReLU-KAN, and modular KAN-convolutional libraries (Noorizadegan et al., 28 Oct 2025, Drokin, 1 Jul 2024).

6. Limitations, Practical Guidelines, and Open Challenges

KANs fill a distinct niche in scientific machine learning: parameter-efficient, interpretable, and compositional universal function approximators. Key limitations are (1) higher per-epoch computational cost (particularly for dense spline evaluation), (2) slow training relative to MLPs or CNNs, (3) incompatibility with generic data augmentation and resampling for imbalanced tasks, and (4) challenges in mixing KAN and conventional architectures—KAN+CNN hybrids currently underperform compared to pure KAN FEs on time-series data (Liu et al., 16 Jun 2024, Yadav et al., 18 Jul 2025).

For practical use, cubic B-splines on a uniform grid with grid extension at boundaries are the default, but alternative bases (Chebyshev polynomials, RBF, wavelets) are available to match function smoothness and locality requirements. Adaptive grid, basis selection, and domain decomposition further boost practical efficiency. For large-scale or high-throughput inference, RBF and ReLU-power bases are recommended for implementation efficiency (Noorizadegan et al., 28 Oct 2025, Li, 10 May 2024).

Further research challenges include: principled basis–problem matching, non-asymptotic generalization analyses, automatic basis/grid selection, hardware-adapted kernels for spline evaluation, and a rigorous theory of KAN interpretability and statistical identifiability.

7. Interpretability, Symbolic Regression, and Scientific Discovery

A salient property of KANs is their intrinsic interpretability. Each trainable edge function is a univariate mapping that can be visualized post-training and, if sufficiently sparse or smooth, identified with an analytic or symbolic form. This has enabled closed-form discovery in scientific applications, such as analytic constitutive laws for hyperelasticity (Thakolkaran et al., 7 Mar 2025), symbolic surrogates for thermoelectric properties (Fronzi et al., 3 Oct 2025), logic-extractable decision rules for intrusion detection (Emelianova et al., 7 Aug 2025), and automatic feature attribution in high-dimensional regressions (Liu et al., 30 Apr 2024).

KANs have demonstrated collaborative potential in AI+Science contexts. For instance, scientific users can prune and symbolically “snap” edge splines to known functional forms, iteratively refining the learned model towards physically meaningful representations (Liu et al., 30 Apr 2024, Fronzi et al., 3 Oct 2025). In computational biomedicine and complex materials science, such direct human-AI interaction promises data-efficient, audit-ready models beyond the reach of traditional black-box neural network architectures.


Key References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Kolmogorov–Arnold Networks (KANs).