Kolmogorov–Arnold Network (KAN)
- KAN is a neural network paradigm inspired by the Kolmogorov–Arnold theorem, decomposing multivariate functions into sums of learnable univariate mappings.
- Its architecture uses inner and outer univariate spline-based functions to achieve high expressivity, interpretability, and parameter efficiency.
- KANs excel in diverse applications such as function approximation, time series forecasting, vision, and scientific computing, supported by robust theoretical backing.
Kolmogorov–Arnold Network (KAN) is a neural-network paradigm directly motivated by the Kolmogorov–Arnold superposition theorem, which demonstrates that any continuous multivariate function can be represented as the composition and summation of univariate functions. KANs depart fundamentally from conventional multilayer perceptrons (MLPs) by instantiating all weights as learnable, often spline-based univariate functions rather than fixed affine parameters. This approach endows KANs with refined expressivity, enhanced interpretability, and, in many settings, superior parameter- and computational efficiency, as evidenced by both theoretical foundations and empirical performance across function approximation, time series analysis, operator learning, vision, scientific computing, and hardware deployment contexts.
1. Theoretical Foundation: Kolmogorov–Arnold Superposition
KANs are grounded in the Kolmogorov–Arnold representation theorem. For any continuous mapping , there exist continuous univariate functions and such that: This result, established by Kolmogorov and refined by Arnold, guarantees that any -variate function reduces—up to a known explicit upper bound in the number of terms—to a sum of compositions of univariate transformations (inner functions) and outer univariate nonlinearities. The KAN architecture faithfully mirrors this structure by replacing each edge in a network layer with a separate, learnable univariate mapping, precisely capturing the superposition required for universal approximation (Liu et al., 2024, Gaonkar et al., 15 Jan 2026, Ji et al., 2024).
2. Network Architecture and Model Parameterization
The canonical KAN implements each layer as a two-stage composition:
- For each output channel , compute via learnable univariate functions ("inner layer").
- Apply a flexible univariate mapping ("outer layer") to .
- Sum over to produce the network output.
Each and is typically parameterized as a spline (e.g., B-spline of degree on a grid of size ), Chebyshev polynomial, radial basis function, or other functional basis, with learnable coefficients. Modern KANs extend this architecture to multiple layers, deep compositions, and even hybrid structures, such as convolutional or attention-based KANs (Noorizadegan et al., 28 Oct 2025, Warin, 2024, Cang et al., 2024).
Parameter counts in KAN are determined by basis size, grid resolution, and network width, yielding substantial parameter efficiency over comparably expressive MLPs—KAN often requires parameters where an MLP needs for the same functional capacity, with the basis size, the width, the depth (Noorizadegan et al., 28 Oct 2025).
Variants include:
- Standard KAN: univariate B-spline bases for all edge activations.
- FastKAN: radial basis functions or Gaussian approximations for fast evaluation.
- MatrixKAN: matrix-form evaluation of B-spline activations for parallelized inference and significant speedup at high degrees (Coffman et al., 11 Feb 2025).
- P1-KAN: fixed-support piecewise-linear finite-element basis for improved robustness on irregular functions (Warin, 2024).
- KKAN: inner univariate MLPs with outer basis expansions for universality and improved operator learning (Toscano et al., 2024).
- InfinityKAN: variational inference to adaptively grow the basis size within each univariate block during training (Alesiani et al., 3 Jul 2025).
3. Training Methodologies and Optimization
All KAN parameters—including basis coefficients, grid parameters, and any hyperparameters governing the composition—are trained jointly via backpropagation with gradient-based optimization (most commonly Adam, L-BFGS, or SGD with momentum) (Noorizadegan et al., 28 Oct 2025, Gaonkar et al., 15 Jan 2026). Regularization strategies leverage weight decay, smoothness penalties on spline second derivatives, dropout on spline knots or channels, and entropy-based sparsification to enhance generalization or interpretability (Noorizadegan et al., 28 Oct 2025, Cang et al., 2024).
For regression, the mean squared error is standard; for classification, cross-entropy loss is used. In physics-informed and operator learning applications, domain-specific losses—incorporating residuals of partial differential equations, physics constraints, or operator norms—are integrated (Kratsios et al., 21 Apr 2025). Adaptive schemes extend to grid refinement, basis selection, and, in InfinityKAN, direct variational optimization of basis cardinality (Alesiani et al., 3 Jul 2025).
KANs typically require careful tuning of grid size, spline degree, regularization weight, and—where applicable—the specific basis type to match the local or global regularity of the target, handle discontinuities, or control overfitting. Empirical scaling laws demonstrate superior neural scaling exponents (e.g., for order splines in KAN, compared to lower exponents for MLPs) (Liu et al., 2024).
4. Applications Across Scientific and Engineering Domains
Function Approximation and Symbolic Regression
KANs have shown state-of-the-art results in function regression, discovering closed-form or symbolic laws across canonical benchmarks, outperforming MLP and even genetic programming baselines in formula discovery. Interpretable, formula-extractable representations from fitted splines enable direct domain insight (Jacobs et al., 27 Jan 2026, Liu et al., 2024).
Time Series Forecasting
Temporal-KAN (T-KAN), Multi-Task KAN (MT-KAN), and SigKAN have been deployed for single- and multi-task sequence modeling, outperforming LSTM or MLP baselines in financial volatility prediction and time-series classification (Dong et al., 2024, Gaonkar et al., 15 Jan 2026, Ji et al., 2024).
Graph Learning and Operator Regression
Graph KAN (GKAN) replaces fixed graph convolutional filters with edgewise univariate functions, achieving improved accuracy and parameter efficiency on node classification and large molecular datasets (Ji et al., 2024, Toscano et al., 2024).
Vision and Transfer Learning
Convolutional KAN (CKAN) and KAN-based output heads in CNNs enable improved accuracy or robustness under strict parameter budgets, though naive KAN deployments exhibit sensitivity to input noise that can be mitigated by smoothness regularization and "segment deactivation" dropout (Cang et al., 2024, Shen et al., 2024).
Physics-Informed Machine Learning and Scientific Discovery
KANs have been applied to partial differential equation solving, turbulence modeling, and quantum control, with extensions such as rKAN, SincKAN, and operator-KAN (DeepOKAN) tailored for singularities, sharp gradients, or operator-valued regression (Noorizadegan et al., 28 Oct 2025, Kratsios et al., 21 Apr 2025).
Hardware Acceleration and Edge Deployment
Algorithm–hardware co-design for KANs enables hardware-friendly spline evaluation, quantization, and crossbar implementation, with significant area and power savings established in large-scale KAN silicon prototypes (Huang et al., 7 Sep 2025).
5. Theoretical Analysis, Approximation, and Generalization
KANs are proven universal approximators, with the ability to approximate any continuous function on to arbitrary accuracy with $2n+1$ hidden units if inner/outer univariate bases are sufficiently rich (Liu et al., 2024, Toscano et al., 2024). For smooth functions in Besov or Sobolev spaces, KAN achieves optimal nonlinear approximation rates regardless of ambient dimension, outperforming MLPs in high-dimensional regimes (Kratsios et al., 21 Apr 2025). Approximation error decays as for -degree splines, with error asymptotically independent of under compositional structural assumptions.
Data-dependent generalization bounds have been established for deep KANs using norms of basis coefficients and layer-wise Lipschitz constants, showing only logarithmic dependence on network width and parameter count, in contrast to polynomial dependencies characteristic of MLPs (Zhang et al., 2024). In low-noise regimes and with appropriate regularization, these bounds are predictive of practical generalization gaps throughout training.
6. Interpretability, Symbolic Extraction, and Robustness
A defining feature of KANs is explicit interpretability: all nonlinear processing is restricted to univariate spline (or similar) blocks, which are directly visualizable, can be sparsified or pruned for simplicity, and mapped to human-interpretable formulas via symbolic regression. This enables transparent scientific collaboration and automated discovery of underlying laws, notably in materials science and physics (Liu et al., 2024, Jacobs et al., 27 Jan 2026, Novkin et al., 19 Mar 2025). Adversarial robustness is enhanced by KANs' controlled Lipschitz constants and smooth activation structure, with empirical evidence showing improved resistance relative to MLPs in time series and vision tasks (Dong et al., 2024, Cang et al., 2024).
7. Challenges, Research Directions, and Limitations
Key limitations include higher per-parameter evaluation costs due to spline or polynomial basis computation, sensitivity to hyperparameters (basis size, grid, regularization), and difficulties modeling highly discontinuous or topologically complex targets. Training stability and convergence speed can suffer in very high dimensions or with large grids (Warin, 2024, Coffman et al., 11 Feb 2025). However, ongoing developments in architecture (FastKAN, P1-KAN, MatrixKAN), optimization (adaptive basis growth, variational approaches), regularization, and hardware-algorithm co-design are rapidly advancing the practical scalability and theoretical understanding of KANs (Noorizadegan et al., 28 Oct 2025, Alesiani et al., 3 Jul 2025, Huang et al., 7 Sep 2025).
Open research areas include standardized KAN component libraries, predictive design theory for basis selection and mixed-composition networks, optimization theory beyond neural tangent kernels, regularization for generalization control, deeper analysis of interpretability and identifiability, and expanded domains such as quantum-classical hybrids and operator learning for complex scientific systems (Noorizadegan et al., 28 Oct 2025, Ji et al., 2024).
KANs thus provide a universal, interpretable, and highly flexible framework for machine learning and scientific modeling, combining the power of classical approximation theory with the engineering advantages of modern neural architectures.