Kolmogorov-Arnold Network Architectures

Updated 23 January 2026

Kolmogorov-Arnold Networks are neural architectures that decompose complex multivariate functions into sums of learnable univariate spline functions.
They replace traditional scalar weights with adaptive, localized nonlinearities, enhancing interpretability and parameter efficiency across various tasks.
KANs achieve competitive performance through deep, compositional structures while presenting challenges in training stability, computational overhead, and regularization.

Kolmogorov-Arnold Network (KAN) Architectures

Kolmogorov-Arnold Networks (KANs) are a class of neural architectures directly inspired by the Kolmogorov-Arnold representation theorem, which states that any multivariate continuous function can be decomposed as a finite composition and summation of univariate functions. Unlike conventional neural networks, which employ fixed parametric activation functions and scalar weights, KANs place a learnable function—typically parameterized by compactly supported splines—on each directed edge of the network graph. This architectural paradigm enables highly adaptive, interpretable representations while maintaining universal function approximation capabilities.

1. Theoretical Foundations and Kolmogorov–Arnold Representation

The foundational justification for KANs is the Kolmogorov–Arnold representation theorem:

$f(x_1, \ldots, x_d) = \sum_{q=1}^{2d+1} \Phi_q\left( \sum_{p=1}^d \phi_{q,p}(x_p) \right)$

where:

$\phi_{q,p}$ are univariate continuous “inner” functions,
$\Phi_q$ are univariate continuous “outer” functions,
$f$ is any continuous function on a bounded subset of $\mathbb{R}^d$ .

This decomposition demonstrates that high-dimensional mappings can be constructed through summations and compositions of lower-dimensional functions. The KAN architecture implements this representation directly: each directed edge in the network corresponds to a learnable univariate function parameterized by adaptive basis sets (most commonly B-splines), and aggregation is performed through summation at each node (Liu et al., 2024, Sohail, 2024).

The universality of KANs is formal: given sufficient grid resolution and spline degree, the composition of such spline-based edge functions can achieve dimension-independent (i.e., non-cursed) rates for sup-norm approximation on function classes with prescribed smoothness (Kratsios et al., 21 Apr 2025).

2. Architectural Structure and Edge-Based Parameterization

Edge-Based Nonlinearities

The canonical KAN replaces all scalar weights in fully connected or convolutional architectures with univariate functions—localized, adaptive, and parameterized (usually as B-splines):

$\phi_{ij}(x) = w_b^{(ij)} b(x) + w_s^{(ij)} \left( \sum_{k=0}^{G-1} c_k^{(ij)} B_k(x) \right)$

$b(x)$ is a fixed nonpolynomial base (often SiLU: $x / (1+e^{-x})$ ),
$w_b^{(ij)}, w_s^{(ij)}$ are learned scalars,
$B_k(x)$ are degree- $\phi_{q,p}$ 0 B-spline basis functions,
$\phi_{q,p}$ 1 is the grid size (number of knots).

Each layer is thus a matrix of functions, and activations at the next layer $\phi_{q,p}$ 2 are strict sums of evaluations $\phi_{q,p}$ 3, in contrast to conventional affine transforms (Liu et al., 2024). Residual and skip connections are increasingly standard, further enhancing convergence and regularity (Kratsios et al., 21 Apr 2025).

Spline Parameterization

B-splines provide local compact support, computational efficiency, and control over smoothness. Knot vectors can be uniform or slightly adaptive; variable grid extension methods handle out-of-domain activations (Gaonkar et al., 15 Jan 2026). Other bases, such as wavelets, Chebyshev polynomials, Fourier series, or radial basis functions (as in BSRBF-KAN), have been explored for domain-specific expressivity (Bozorgasl et al., 2024, Novkin et al., 19 Mar 2025, Ta, 2024).

Edge-based nonlinearities introduce parameter redundancy compared to classic MLPs, but the representation is flexible and more interpretable, exposing structure at the function and feature level (Gaonkar et al., 15 Jan 2026, Sohail, 2024).

3. Training Methodologies and Empirical Characteristics

KAN training leverages standard stochastic gradient descent, but highlights several unique considerations:

Initialization: Kaiming-normal for spline and path scalars is generally superior (Sohail, 2024).
Optimizer: Adam (low LR ~ $\phi_{q,p}$ 4) stabilizes convergence; plain SGD is less robust.
Regularization: Dropout (p=0.2), L1 pathwise penalty, and entropy-based sparsification improve statistical efficiency and overfitting resistance (Liu et al., 2024, Bagrow et al., 13 Dec 2025).
Activation Choices: GELU consistently outperforms SiLU and ELU in both inner and outer functions; higher B-spline degree yields gains up to ~7, beyond which overfitting dominates (Sohail, 2024).
Backpropagation-Free Techniques: Alternatives such as the HSIC Bottleneck can be employed for direct dependence maximization; these offer slightly smoother, but not more accurate, training (Sohail, 2024).

Empirical studies show that small KANs can reach competitive or superior performance relative to MLPs, often with improved early-epoch parameter efficiency. However, run-to-run accuracy stability is lower, and sensitive hyperparameter tuning is required as depth increases (Sohail, 2024). Residual blocks, normalization, and adaptive connectivity are useful for scaling to deeper architectures (Bagrow et al., 13 Dec 2025).

4. Comparative Analysis, Parameter Efficiency, and Interpretability

Functional Efficiency

Benchmark evaluations consistently indicate that, for function regression, time series prediction, and classification, KANs can attain higher predictive accuracy with reduced computational cost (FLOPs) relative to parameter-matched MLPs (Gaonkar et al., 15 Jan 2026). The architectural structure enables the “curse of dimensionality” to be mitigated for suitably regular target functions, as the multivariate approximation problem is reduced to a composition of univariate splines (Liu et al., 2024, Kratsios et al., 21 Apr 2025).

Interpretability

KANs offer a direct form of mechanistic interpretability: each edge function, learned as a spline or other basis sum, can be visualized and, in some frameworks, converted to closed-form symbolic expressions via posttraining regression. This allows for the recovery of explicit domain formulas (e.g., in transistor modeling or scientific discovery) and supports human-in-the-loop model examination (Liu et al., 2024, Novkin et al., 19 Mar 2025, Bagrow et al., 13 Dec 2025).

Parameter Efficiency and Scaling

In shallow regimes or for tasks with compositional structure, KANs can exceed MLPs in parameter efficiency by 10–20%, converging to high accuracy in fewer training epochs and with sparser representations after end-to-end differentiable sparsification (Bagrow et al., 13 Dec 2025). Scaling performance in high-noise or high-dimensional data regimes is an active area of research, with regularization and architectural overprovisioning plus pruning being effective strategies (Sohail, 2024, Bagrow et al., 13 Dec 2025).

5. Variants, Extensions, and Hybridizations

KAN architectures are highly extensible, with multiple active research directions:

Basis Generalization: Beyond B-splines, KANs can employ Fourier (FKAN), wavelets (Wav-KAN), Chebyshev, radial basis, and mixed bases (BSRBF-KAN) to better adapt to input characteristics, frequency content, and noise robustness (Bozorgasl et al., 2024, Ta, 2024, Novkin et al., 19 Mar 2025).
Convolutional KANs: Convolutional layers can adopt learnable edge activations, i.e., each kernel position applies an adaptive spline, yielding strong parameter efficiency and expressivity increases over fixed-kernel CNNs on vision tasks (Bodner et al., 2024, Ferdaus et al., 2024, Cang et al., 2024).
Operator and Temporal Learning: PDE-KAN and Temporal-KAN incorporate architectures suitable for operator learning and time series forecasting, exploiting KAN adaptivity for physical and dynamical systems (Somvanshi et al., 2024, Cang et al., 2024).
InfinityKAN: Variational KAN architectures treat basis size as a learnable random variable, enabling data-driven, per-layer order selection and soft truncation of infinite expansions (Alesiani et al., 3 Jul 2025).
KKAN and symbolically regularized KANs: Alternatives such as KKAN replace or augment spline-based univariates with MLPs or other basis expansions, enhancing expressivity and facilitating PID-inspired or scientific learning tasks (Toscano et al., 2024, Bagrow et al., 13 Dec 2025).
Quantum, Photonic, and Hardware-Accelerated KANs: Quantum KANs (QuKAN) exploit parameterized quantum circuits for univariate function realization (Werner et al., 27 Jun 2025); photonic KANs implement nonlinear transfer via ring-assisted Mach–Zehnder-based devices for energy and area efficiency (Peng et al., 2024). Custom accelerators and system co-design for B-spline evaluation, such as MatrixKAN, KAN-SAs, and analog-compute-in-memory circuits, address KAN runtime bottlenecks on large-scale hardware (Huang et al., 7 Sep 2025, Coffman et al., 11 Feb 2025, Errabii et al., 20 Nov 2025).

6. Limitations, Challenges, and Future Directions

While KANs excel in flexibility and interpretability, several open challenges exist:

Training Instability and Sensitivity: Increased architectural flexibility introduces greater sensitivity to initialization, optimizer, and hyperparameters, with risk of overfitting and stability issues in deeper networks (Sohail, 2024).
Computational Overhead: Spline function evaluation, especially at high degrees or large grid sizes, incurs higher computational and memory cost than scalar weight matrices, though parallelization strategies such as MatrixKAN and hardware LUTs can alleviate bottlenecks (Coffman et al., 11 Feb 2025, Huang et al., 7 Sep 2025).
Scalability: Scaling to massive data and high-dimensional tasks requires sparse, regularized, or overprovisioned/pruned architectures; convolutional and hybrid KANs show promise in vision pipelines, but native graph, attention, or transformer-style KANs remain underexploited (Cang et al., 2024, Ferdaus et al., 2024, Bagrow et al., 13 Dec 2025).
Regularization and Generalization: Techniques such as smoothness penalties, segment deactivation, and entropy-based sparsification are necessary to ensure robustness and prevent oscillatory or overfitted representations, especially under label noise or data scarcity (Cang et al., 2024, Bagrow et al., 13 Dec 2025).
Interpretability Quantification: While the structure of learned edge functions is analyzable, objective quantification of symbolic or mechanistic interpretability remains an open area.

Future investigations focus on integrating KANs with state-of-the-art transformer, operator, and graph neural architectures; expanding scalable, automated basis selection; refining embedded and hardware-accelerated KAN computations; and developing principled, theory-backed regularization and architecture search strategies (Somvanshi et al., 2024, Kratsios et al., 21 Apr 2025, Bagrow et al., 13 Dec 2025).