Kolmogorov-Arnold Networks (KANs)

Updated 21 July 2025

KANs are a neural network class defined by the Kolmogorov-Arnold theorem, featuring learnable spline-based functions on network edges.
They reframe function approximation by replacing fixed neuron activations with adaptable univariate edge functions, boosting expressivity.
KANs deliver superior scaling, accuracy, and parameter efficiency for applications like PDE solving, forecasting, and reinforcement learning.

Kolmogorov-Arnold Networks (KANs) are a neural network class inspired by the Kolmogorov-Arnold representation theorem; they are characterized by learnable, spline-based activation functions attached to network edges rather than the nodes. KANs provide an alternative to the widely used Multi-Layer Perceptrons (MLPs) by rethinking function approximation as compositions of univariate functions, resulting in architectures with enhanced interpretability, adaptivity, and, frequently, stronger approximation properties and parameter efficiency.

1. Foundations: Kolmogorov-Arnold Representation Theorem

The theoretical basis for KANs is the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function $f: [0,1]^n \rightarrow \mathbb{R}$ can be written as a finite superposition of continuous univariate functions: $f(x_1, \ldots, x_n) = \sum_{q=1}^{2n+1} \Phi_q\left( \sum_{p=1}^n \varphi_{q,p}(x_p) \right)$ where $\Phi_q$ and $\varphi_{q,p}$ are continuous univariate functions. This representation lays the groundwork for building neural architectures that approximate complex functions through compositions of learnable univariate operations, rather than through fixed, global nonlinear activations (Liu et al., 30 Apr 2024).

KANs leverage this theorem by assigning a distinct learnable univariate function, parameterized by spline coefficients, to each connection (edge) between nodes, in place of scalar weights. Thus, the core operation within a KAN layer is

$x_{\ell+1, j} = \sum_i \varphi_{\ell, j, i}(x_{\ell, i})$

where each $\varphi_{\ell, j, i}$ is a spline-parametrized function. This shifting of nonlinearity from nodes to edges enables a finer adaptation to data and underlies the improved expressivity and interpretability that characterize KANs.

2. Architectural Design and Implementation

Unlike conventional MLPs, which apply a fixed activation function to linear combinations at each neuron, KANs have no conventional weights; every edge is associated with a learnable activation, most commonly modeled with B-splines. The forward pass through a KAN layer is performed by applying each spline-based function to the respective input and summing across the relevant dimensions: $x_{\ell+1} = \Phi_\ell(x_\ell)$ with $\Phi_\ell$ representing a function matrix whose entries are the per-edge univariate, learnable activations.

In practical implementations, splines are parameterized by their degree $k$ and grid resolution $G$ , with grid extension techniques available to dynamically increase expressivity. The B-spline coefficients are learned via backpropagation, updating the function shapes, and allowing the network to adapt to local data structure in a highly granular manner (Liu et al., 30 Apr 2024). This methodology bundles the roles of the linear transformation and nonlinear mapping within a single operation, which is both computationally direct and fundamentally more interpretable than weight-based layers.

More recent work demonstrates that 3rd-order B-spline bases in KANs can be efficiently replaced by Gaussian radial basis functions (RBFs), forming "FastKAN" models that provide a threefold increase in computational efficiency in common cases (Li, 10 May 2024).

3. Expressivity, Accuracy, and Scaling Laws

KANs are not only universal approximators (by virtue of their connection with the Kolmogorov-Arnold theorem), but their architectural design imparts superior scaling behaviors for function approximation. When the target function has compositional or univariate-dominant structure, the error for a KAN with spline basis of degree $k$ decreases at a rate proportional to $\text{(number of grid points)}^{-(k+1)}$ . With cubic splines ( $k=3$ ), this yields fourth-power scaling, substantially outpacing the typical convergence rates of MLPs (Liu et al., 30 Apr 2024).

Empirical results corroborate these theoretical gains. In tasks ranging from synthetic regression and special function approximation to PDE solving (e.g., Poisson equation using PINN techniques), small KANs have achieved accuracy and parameter efficiency surpassing MLPs by orders of magnitude. For example, a width-10, 2-layer KAN has been reported to be 100 $\times$ more accurate and parameter-efficient than a considerably larger MLP baseline (Liu et al., 30 Apr 2024).

KANs also demonstrate strong parameter efficiency in time series forecasting and reinforcement learning; for example, KANs applied within Proximal Policy Optimization (PPO) have achieved performance matching or exceeding that of MLPs with an order-of-magnitude fewer parameters (Kich et al., 9 Aug 2024). In autoencoding and transfer learning, KANs have matched or modestly improved upon established baselines, converging in fewer epochs and exhibiting improved generalization in cases where nonlinearity of mapping is prominent (Moradi et al., 2 Oct 2024, Shen et al., 12 Sep 2024).

4. Interpretability and Symbolic Recovery

A distinguishing feature of KANs is the direct interpretability of their learned functions. Since each edge is associated with a visualizable activation, KANs allow for inspection of how input variables are transformed at each intermediate step. This structure facilitates techniques such as pruning and "snapping" to symbolic forms, whereby the learned spline functions can be replaced or approximated by explicit algebraic expressions. Such symbolic recovery has been demonstrated in scientific use cases, including re-discovery of known mathematical and physical laws (e.g., symbolic form of the reaction term in the Fisher-KPP equation or the relation between knot invariants in knot theory) (Liu et al., 30 Apr 2024, Koenig et al., 5 Jul 2024).

This edge-centric interpretability also benefits scientific discovery workflows, enabling models that "collaborate" with human experts by giving them the tools to visualize, prune, and symbolically post-process the internal mechanisms of the learned model.

5. Extensions and Application Domains

The adaptability of KANs lends itself to numerous scientific and applied domains:

Time Series and Forecasting: KANs have delivered more accurate, less complex, and more robust forecasting than MLPs in realistic satellite traffic prediction scenarios, and possess strong potential for meteorological and financial forecasting contexts (Vaca-Rubio et al., 14 May 2024).
Physics-Informed Learning: KANs underpin high-precision solutions for PDEs, especially in physics-informed neural network (PINN) settings. Their structure supports extraction of interpretable source terms and symbolic laws, as demonstrated for PDEs such as Fisher-KPP and Burgers' equations (Koenig et al., 5 Jul 2024).
Molecular Dynamics and Potentials: KANs reinterpret common molecular potentials (LJ, EAM, ANN) as compositions of univariate splines, introducing improved computational efficiency and accuracy for force evaluation and simulation in materials science (Nagai et al., 25 Jul 2024).
Multifidelity and Physics-Informed Models: Multifidelity KANs combine inexpensive low-fidelity models with a small amount of high-fidelity data, enabling accurate models with reduced costly data requirements. This approach has been shown to benefit classical supervised regression, physics-informed learning, and extrapolation, with architecture explicitly distinguishing between linear and nonlinear corrections (Howard et al., 18 Oct 2024).
Reinforcement and Transfer Learning: KANs, when integrated into standard reinforcement learning pipelines, serve as highly parameter-efficient function approximators—sometimes with unmatched sample efficiency, though at a computational cost (Kich et al., 9 Aug 2024). In transfer learning, substituting linear probes with KANs improves the capacity to capture non-linear feature-label relations, though the benefits depend on dataset complexity (Shen et al., 12 Sep 2024).
Computer Vision: Convolutional and hybrid vision models that embed KAN layers (i.e., convolutional KANs) have matched or approached CNN accuracy with far fewer parameters, though achieving state-of-the-art results on more complex datasets has required increasing depth or including more expressive edge-based functions (Azam et al., 13 Jun 2024, Bodner et al., 19 Jun 2024, Krzywda et al., 10 Jan 2025).
Imbalanced and Structured Data: While KANs naturally handle certain imbalanced data scenarios better than MLPs without resampling, standard imbalance remedies (resampling, focal loss) are incompatible with the mathematical structure of KANs and can degrade performance (Yadav et al., 18 Jul 2025).

6. Recent Innovations, Limitations, and Future Directions

The KAN framework is undergoing active research and refinement. Key innovations and challenges include:

Alternative Basis Functions: Rational KANs (rKANs) explore rational and Padé-based functions as alternatives to splines, demonstrating improved effectiveness in approximating functions with asymptotic or singular behavior (Aghaei, 20 Jun 2024).
Efficient Implementations: Recasting KANs as RBF networks (FastKAN) or integrating spline interpolation for speed are prominent directions for improving computational efficiency (Li, 10 May 2024, Nagai et al., 25 Jul 2024). However, KANs remain significantly slower to train than MLPs, emphasizing the need for further optimization and hardware-aware design.
Advanced Architectures: Multi-exit KANs enable adaptive, depth-flexible learning by providing prediction branches at multiple layers, yielding both improved accuracy and parsimony in learned representations (Bagrow et al., 3 Jun 2025). Variational KANs (InfinityKAN) allow the number of basis functions per activation to be adaptively learned, enabling flexible model expressivity (Alesiani et al., 3 Jul 2025).
Mathematical Guarantees: Recent theoretical work provides optimal approximation rates for functions in Besov spaces and sample complexity guarantees for learning derivatives (Kratsios et al., 21 Apr 2025). However, generalizations for arbitrary smooth function classes (especially with smooth, finite KANs) encounter limitations inherent in classical results by Vitushkin.
Domain-Specific Extensions and Integration: KANs are being woven into graph learning, recurrent networks, transformer-based models, and hybrid PDE-operator networks, with demonstrated synergies as modular, efficient components (Somvanshi et al., 9 Nov 2024).
Interaction with Imbalance Methods and Resource Efficiency: KANs are mathematically and algorithmically sensitive to imbalance corrections such as focal loss and data resampling. These techniques may disrupt the function approximation guarantees and, in practice, can negate performance advantages relative to substantially less expensive MLPs (Yadav et al., 18 Jul 2025).

Future research priorities include:

KAN-specific architectural modifications for imbalanced and structured data.
Enhanced optimization methods for spline/edge function learning.
Hardware- and memory-efficient implementations, especially for large-scale, high-dimensional tasks.
Deeper theoretical investigations regarding function class representability and learning in the presence of noise or data sparsity.
Further development of symbolic recovery and collaborative human-in-the-loop model refinement.

KANs continue to shape the landscape of interpretable, adaptive, and efficient neural architectures for deep learning, promising important advances in scientific discovery, physics-informed computing, time series forecasting, and parameter-efficient learning frameworks (Somvanshi et al., 9 Nov 2024).