Papers
Topics
Authors
Recent
Search
2000 character limit reached

Kolmogorov–Arnold Network Architecture

Updated 18 June 2026
  • KAN Architecture is defined by its use of trainable univariate nonlinear edge functions based on the Kolmogorov–Arnold theorem to decompose continuous functions.
  • Its design achieves high expressivity and interpretability with significant parameter and FLOP reductions alongside universal approximation and minimax optimality guarantees.
  • Variants such as convolutional, temporal, and hybrid KANs have demonstrated robust performance in scientific machine learning, time series analysis, and hardware acceleration.

Kolmogorov–Arnold Networks (KANs) are a class of neural network architectures that directly operationalize the Kolmogorov–Arnold representation theorem by replacing fixed linear weights and fixed activations with a trainable composition of univariate nonlinear edge functions, typically parameterized as adaptive splines or other 1D basis expansions. This approach provides a framework that is both highly expressive and interpretable, with demonstrated parameter efficiency and substantial practical impact on scientific machine learning, time series analysis, and engineering modeling. KANs have given rise to a diversity of architectural variants (e.g., convolutional KANs, temporal KANs, variational and quantum-KANs, and hybrid models), and their theoretical properties have been extensively studied, including universal approximation results, minimax optimality, and efficient hardware realizations.

1. Mathematical Foundation and Theoretical Guarantees

At the core of KANs is the Kolmogorov–Arnold superposition theorem, which states that any continuous function f:[0,1]nRf: [0,1]^n \rightarrow \mathbb{R} can be written as

f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)

with univariate continuous functions ϕq,p\phi_{q,p} (“inner” ridge functions) and Φq\Phi_q (“outer” combiners). KANs instantiate this decomposition as a two-layer neural architecture in which every edge from xpx_p to node qq represents a learnable univariate map, and each output node applies a learnable nonlinear transformation to the sum of its incoming signals.

KANs generalize this two-stage structure into an arbitrary-depth feedforward network by stacking such function-matrix layers, and further augment expressivity by parameterizing each edge function as a flexible 1D basis expansion. Minimax optimality of KANs for function approximation in Besov spaces and explicit PAC-style sample complexity bounds hold under modest smoothness assumptions, and residual KAN architectures match the best known nonlinear N-term rates for high-dimensional function approximation, without excess logarithmic factors or dependence on input dimensionality (Kratsios et al., 21 Apr 2025).

2. Architectural Design and Parameterization

A canonical KAN layer with input xRnx \in \mathbb{R}^{n_{\ell}} produces output yRn+1y \in \mathbb{R}^{n_{\ell+1}} by evaluating

yj=i=1nφj,i(xi)y_j = \sum_{i=1}^{n_{\ell}} \varphi_{j,i}(x_i)

where each φj,i\varphi_{j,i} is a univariate trainable function on the edge from input node f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)0 to output node f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)1. These edge functions are typically parameterized as: f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)2 where f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)3 are B-spline basis functions (or variants: GRBFs, polynomials, Fourier modes), f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)4 is a fixed smooth activation (e.g., SiLU), and f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)5 are trainable scalars. The knot grid for each spline can be dynamically extended or rescaled to maintain expressivity across the range of activations (Hoever et al., 16 Jun 2026, Liu et al., 2024).

Unlike MLPs, which use weight matrices and fixed pointwise nonlinearities, KANs dispense with all explicit linear weights in hidden layers, instead assembling their expressivity entirely from locally parameterized univariate functions on the edges. This imparts substantial interpretability: after training, each edge can be visualized or even “symbolified” as a closed-form analytic function, facilitating direct human analysis and post-hoc model reduction (Liu et al., 2024, Jacobs et al., 27 Jan 2026).

3. Basis Choices, Computational Strategies, and Efficiency

Basis selection is crucial in KANs as it governs both approximation quality and computational overhead:

  • B-spline KANs: Local, compact support and excellent stability. Standard in most implementations due to their modularity and efficiency on modern hardware.
  • GRBF KANs: Smoother, Gaussian radial basis expansions offering simpler (recursion-free) implementations that accelerate computation (Hoever et al., 16 Jun 2026).
  • Chebyshev/Jacobi/Fourier KAN variants: Global spectral bases suited for functions with extensive smoothness or periodicity (Noorizadegan et al., 28 Oct 2025).
  • Rational, wavelet, sinc, and ReLU-power basis variants: Tailored to sharp fronts, multiscale signals, and discontinuities.

MatrixKAN (Coffman et al., 11 Feb 2025) demonstrates that B-spline basis evaluation can be realized via batched matrix operations, enabling up to ∼40× speedup over recursive Cox–de Boor implementations, especially as spline order increases. KANLib (Hoever et al., 16 Jun 2026) aggregates advances from PyKAN, EfficientKAN, and FastKAN into an extensible PyTorch ecosystem supporting vectorized basis evaluation, on-the-fly grid extension, and rapid experimentation with new edge-function families.

Compared to MLPs, KANs obtain equivalent or superior predictive accuracy with 10–100× reduction in floating-point operations (FLOPs) and comparable or reduced parameter counts due to more adaptive edge nonlinearities (Gaonkar et al., 15 Jan 2026).

4. Regularization, Optimization, and Training Dynamics

KANs exhibit higher per-unit capacity and are therefore more prone to overfitting and training instability in deep or wide settings (Sohail, 2024). Empirical best practices include:

  • Kaiming-Normal initialization for basis coefficients and spline weights to ensure proper scaling.
  • Adam or AdamW optimizers with cautious learning rates (f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)6), cosine annealing, or adaptive decay.
  • L2-penalties on spline coefficients and dropout (p ≈ 0.2) on edge outputs to regularize over-oscillation and improve generalization.
  • Early stopping and validation-based checkpointing.
  • Batch-normalization and grid rescaling for stability with large or heterogeneous input ranges.
  • Segment deactivation (replacing splines with linear interpolants stochastically during training) and explicit smoothness penalties are effective in computer vision applications (Cang et al., 2024).

Variational KAN (InfinityKAN) further automates selection of basis set size per edge by treating it as a latent variable optimized via variational inference, enhancing robustness and obviating brittle hyperparameter sweeps (Alesiani et al., 3 Jul 2025).

5. Applications, Variants, and Hybrid Architectures

Recent literature demonstrates the versatility and extensibility of KANs across domains:

  • Time series and signal processing: Temporal-KAN (T-KAN), Signature-KAN, and their multi-task variants leverage KAN adaptive edge functions to model temporal dynamics and perform robust forecasting, outperforming LSTMs and RNNs in multi-task and adversarial settings (Dong et al., 2024).
  • Convolutional architectures: Convolutional KANs (CKAN) replace classic kernels with nonlinear edge functions, reducing parameter counts by up to 50% while matching or exceeding CNN performance on vision benchmarks such as Fashion-MNIST (Bodner et al., 2024, Cang et al., 2024).
  • Physics-informed and scientific ML: PDE-KANs and Physics-Informed KANs achieve higher accuracy and lower sample complexity for forward/inverse PDE tasks by integrating domain-geometric priors and physics-based losses directly into adaptive spline frameworks (Kratsios et al., 21 Apr 2025, Noorizadegan et al., 28 Oct 2025).
  • Operator- and graph-learning: Graph-KANs, DeepONet variants, and graph hybridizations admit robust approximation for combinatorial and functional operator learning (Toscano et al., 2024).
  • Variational, analog, and quantum architectures: Physical analog KANs are realized via hardware-native, reconfigurable edge nonlinearities in silicon RNPUs for ultra-low power, area, and latency inference (Escudero et al., 7 Feb 2026); QKANs use quantum subroutines to implement KAN blocks efficiently in the block-encoding formalism (Ivashkov et al., 2024); Hybrid KKANs combine MLP-style inner maps with linear outer basis expansions to further mitigate spectral bias and enhance universality (Toscano et al., 2024).
  • Materials science, device modeling, and explainability: KANs have produced closed-form, human-interpretable regression formulas competitive with hand-crafted models, even at parameter counts <50 (Jacobs et al., 27 Jan 2026, Novkin et al., 19 Mar 2025).

6. Hardware and Scalability Considerations

KANs’ reliance on univariate edge function evaluation, often parameterized by B-splines, introduces unique opportunities for hardware acceleration as well as distinctive circuit design challenges (Huang et al., 7 Sep 2025, Escudero et al., 7 Feb 2026):

  • Algorithm-hardware co-design: Hardware-aware quantization, alignment-symmetry, PowerGap LUT encoding, and sparsity-aware mapping optimize area and energy utilization in RRAM-ACIM KAN accelerators, with demonstrated area and power scaling far sublinear with the increase in parameter count.
  • Analog implementations via RNPUs: Edge functions are realized physically, providing a path to single-cycle, sub-μJ inferences with high parameter efficiency. Calibration routines fit device-level I–V characteristics to differentiable surrogates, enabling end-to-end gradient-based training before flashing to hardware.
  • MatrixKAN: Leveraging matrix-based basis evaluation for high-degree splines collapses the computational span from f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)7 to f(x1,...,xn)=q=12n+1Φq(p=1nϕq,p(xp))f(x_1, ..., x_n) = \sum_{q = 1}^{2n + 1} \Phi_q\left(\sum_{p = 1}^n \phi_{q,p}(x_p)\right)8 per edge, making large-scale or high-order networks tractable on modern GPUs (Coffman et al., 11 Feb 2025).

Empirically, the scaling profile of KANs facilitates their deployment in edge inference, embedded systems, and domain-specific acceleration while minimizing accuracy degradation under aggressive quantization.

7. Interpretability, Symbolic Regression, and Practical Guidelines

A distinguishing feature of KANs is their inherent interpretability. Each trained edge encapsulates a univariate function that can be visualized, sparsified, or regressed to elementary analytic forms (polynomial, trigonometric, exponential), yielding fully-closed-form surrogate expressions post-training (Jacobs et al., 27 Jan 2026, Novkin et al., 19 Mar 2025). This affordance bridges black-box deep learning and traditional mechanistic modeling, especially in high-stakes scientific domains.

Practitioners are advised to:

  • Select the basis family and grid size to balance expressivity and parameter/memory constraints (e.g., cubic splines, Gaussian RBFs, Chebyshev/Fourier for spectral tasks).
  • Leverage regularization, appropriate optimizer settings, and grid extension to avoid overfitting and instability.
  • Employ domain decomposition and multi-scale strategies for heterogeneous or highly non-smooth targets.
  • Use model pruning, symbolification, and analytic regression post hoc to extract compact, transparent models.
  • Benchmark against MLPs and classic architectures using both accuracy metrics (MSE, RMSE, classification accuracy) and efficiency metrics (FLOPs, parameter count, inference time).
  • For hardware deployment, exploit alignment and sparsity-aware mapping and tailored quantization for resource-constrained platforms.

For comprehensive implementation and methodological guidance, frameworks such as KANLib and curated repositories of open-source architectures and benchmarks (e.g., [KANLib, (Hoever et al., 16 Jun 2026)]; [A Practitioner's Guide, (Noorizadegan et al., 28 Oct 2025)]) provide extensible support for research and application.


KANs thus represent a principled, expressive, and highly interpretable paradigm for neural modeling, directly connecting functional approximation theory to practical deep learning and spanning a rapidly expanding spectrum of architectures and application domains.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kolmogorov–Arnold Network (KAN) Architecture.