Kolmogorov-Arnold Networks
- Kolmogorov-Arnold Networks are neural architectures that decompose multivariate functions using adaptive univariate functions based on the Kolmogorov–Arnold theorem.
- KANs employ diverse basis functions like splines, radial basis, and sinusoidal forms to achieve high accuracy and parameter efficiency.
- KAN frameworks integrate with physics-informed methods, operator learning, graph networks, and time-series models, offering robust alternatives to traditional MLPs.
Kolmogorov-Arnold Networks (KANs) are a family of neural architecture models directly inspired by the Kolmogorov–Arnold representation theorem, which guarantees that any continuous multivariate function can be decomposed into a finite sum of univariate functions composed or added in a prescribed structure. This principle fundamentally reorients the design of neural function approximators by shifting the locus of learnability from weights in traditional feedforward networks to parameterized univariate functions on the edges (connections) of the architecture. The KAN framework encompasses a wide spectrum of adaptations, including spline-based, radial basis, polynomial, sinusoidal, and variational formulations, with proven advantages in accuracy, interpretability, and parameter efficiency across data-driven, physics-informed, and operator learning paradigms.
1. Mathematical and Theoretical Foundations
The core of the KAN paradigm is the Kolmogorov–Arnold superposition theorem, which, in its most utilized form for KANs, asserts: where each and is a continuous univariate function. KANs realize this structural decomposition, replacing the traditional matrix-vector multiplication and fixed activation of multilayer perceptrons with learnable univariate transformations via adaptive basis functions (e.g., splines, polynomials, radial basis, sinusoidal functions) (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024, Gleyzer et al., 1 Aug 2025).
In the KAN architecture, each edge between nodes is associated with its own function parameterized as a linear combination of suitable bases, such as cubic B-splines. The nodes themselves execute simple summation over incoming edges. This approach, compared to MLPs, is justified theoretically by both (i) representation theorems showing all continuous functions are within the model class, and (ii) analyses demonstrating that the approximation and generalization bounds scale with function smoothness and grid resolution rather than the input’s ambient dimension (Zhang et al., 10 Oct 2024, Kratsios et al., 21 Apr 2025).
2. Architectural Design and Activations
The minimal KAN instantiation is two-layered: inputs are transformed by inner univariate functions, aggregated, and then remapped by outer univariate functions before final summation. This is generalized to deeper and wider networks for modern tasks (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024).
A canonical layer’s edge function is: where is a spline expansion and may be a simple nonlinearity such as SiLU. Spline activations are commonly represented as
with a B-spline of chosen order, and trainable.
Alternative KAN formulations replace splines with other bases:
- Sinusoidal activations (Gleyzer et al., 1 Aug 2025): , with learnable frequencies.
- Radial Basis (Li, 10 May 2024): Spline bases are replaced by appropriately parameterized Gaussian RBFs.
- Chebyshev polynomials, wavelets, and others appear in operator and physics-informed variants (Faroughi et al., 30 Jul 2025).
- Variational KANs (InfinityKAN) treat the number of basis functions as a latent variable, adaptively selected during learning by variational inference (Alesiani et al., 3 Jul 2025).
KANs are often integrated as edge-wise modules within more complex architectures (e.g., autoencoders (Moradi et al., 2 Oct 2024), graph networks (Kiamari et al., 10 Jun 2024), time-series models (Liu et al., 15 Jan 2025), or operator learners (Toscano et al., 21 Dec 2024)).
3. Theoretical Analysis: Approximation and Generalization
KANs are universal approximators, inheriting this property from the Kolmogorov–Arnold theorem and reinforced in network settings through constructive proofs. Recent works establish that KANs can optimally approximate any function in the Besov space at corresponding rates in weaker Besov norms, even on fractal domains (Kratsios et al., 21 Apr 2025). The error rate of a spline-based KAN for a target is: with the grid resolution (number of spline knots).
The generalization bounds for KANs, whether with basis expansions or RKHS-based activations, scale with the -norm of coefficient matrices, Lipschitz constants of the activations (per layer), and (in low-rank cases) the product of effective layer ranks, but critically not with the number of nodes outside of logarithmic factors (Zhang et al., 10 Oct 2024). This supports the empirical finding that KANs can be more parameter-efficient and robust to overfitting than comparably performant MLPs.
4. Empirical Performance and Applications
KANs consistently demonstrate advantages over MLPs, and in many cases also over convolutional networks, in a variety of domains:
- Function regression and PDE solving: Compact KANs outperform significantly larger MLPs in accuracy and parameter efficiency (Liu et al., 30 Apr 2024, Kratsios et al., 21 Apr 2025). In PDE scenarios, PIKAN formulations exploit the flexible basis to encode physical laws and constraints (Faroughi et al., 30 Jul 2025).
- Graph learning: GKANs replace fixed-weight edge transformations in GCNs with edge-adaptive univariate functions, achieving higher accuracy on node classification tasks with similar parameter counts (Kiamari et al., 10 Jun 2024).
- Time series and causal inference: Time-KAN and KANGCI introduce autoregressive and sparsity-penalized variants for tasks such as Granger causality in nonlinear and high-dimensional time series (Liu et al., 15 Jan 2025).
- Operator learning: Combined with DeepONet-type architectures, KAN and KKAN backbones show improved accuracy and convergence for learning mappings between function spaces (Toscano et al., 21 Dec 2024).
- Industrial inspection: KANs, leveraging spline-based function approximation, offer improved test accuracy and parameter efficiency in defect classification from images, as observed in NEU and Severstal datasets (Krzywda et al., 10 Jan 2025).
KANs also provide interpretability, as learned univariate activation functions can be visualized or even "snapped" to symbolic expressions, supporting human-in-the-loop model development (Liu et al., 30 Apr 2024).
5. Training Dynamics and Implementation Considerations
KANs introduce nuances in training owing to their high parameterization per unit and strong local adaptivity:
- Initialization and Optimization: Kaiming-Normal initialization, adaptive optimizers (e.g., Adam), and small learning rates (≤ ) are recommended for stability (Sohail, 8 Nov 2024).
- Overfitting and Regularization: KANs can overfit early and are highly sensitive to initialization and learning rate choice. Incorporating dropout or complexity penalties (as informed by theoretical bounds) improves generalization (Zhang et al., 10 Oct 2024, Sohail, 8 Nov 2024).
- Training Dynamics: The "diffusion" training phase (as elucidated by information bottleneck theory in KKANs) is associated with high signal-to-noise ratio and optimal generalization (Toscano et al., 21 Dec 2024).
- Efficiency: Spline-based KANs can incur higher computational costs due to spline evaluations; FastKAN replaces splines with Gaussian RBFs for faster inference (Li, 10 May 2024). Other speedups utilize precomputed lookup tables for spline and derivative evaluation (Nagai et al., 25 Jul 2024), and ensemble/boosting-inspired methods for probabilistic output estimation (Polar et al., 2021).
- Scalability: While error scaling is favorable (dimension-independent under smoothness assumptions), practical scalability depends on efficient kernel and basis representations, hardware support, and exploiting separability (Basina et al., 15 Nov 2024, Faroughi et al., 30 Jul 2025).
Variational and adaptive approaches (e.g., InfinityKAN) allow the number of bases in each univariate activation to be learned automatically, mitigating a key design challenge (Alesiani et al., 3 Jul 2025).
6. Variants, Extensions, and Future Research
The KAN framework has seen rapid diversification, including:
- Multifidelity KAN (MFKAN): Decomposes model learning into low- and high-fidelity blocks, with linear and nonlinear corrections, useful for integrating coarse simulations and high-precision data, and enabling data-lean physics-informed learning (Howard et al., 18 Oct 2024).
- KKAN (Kurkova-Kolmogorov-Arnold Network): Two-block structures with deep MLPs for inner function learning and flexible basis-combination outer blocks; demonstrates robust performance in regression and operator learning (Toscano et al., 21 Dec 2024).
- Sinusoidal and wavelet KANs: These replace spline bases with oscillatory functions, validated by universal approximation theorems and empirical success on rapidly varying functions (Gleyzer et al., 1 Aug 2025).
- Probabilistic KANs: Methods such as Divisive Data Re-sorting (DDR) provide empirical output distributions for aleatoric uncertainty quantification (Polar et al., 2021).
Open research directions identified in the surveyed literature include:
- Theoretical extension of representation and scaling laws for deeper architectures (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024).
- Automated basis selection and adaptive hyperparameter tuning for efficiency and generalization (Alesiani et al., 3 Jul 2025, Faroughi et al., 30 Jul 2025).
- Numerical stability and full integration into mainstream deep learning libraries (Faroughi et al., 30 Jul 2025).
- Development of meta-models (e.g., "kansformers") to transfer KAN strengths (adaptivity and interpretability) to Transformer-like architectures (Liu et al., 30 Apr 2024).
- Enhanced residual and attention-based mechanisms for optimization and signal preservation (Toscano et al., 21 Dec 2024).
7. Comparative Analysis and Position within Machine Learning
KANs stand apart from MLPs by aligning network structure with the intrinsic compositionality of the function class, enabling dimension-agnostic error scaling and vastly superior interpretability (Basina et al., 15 Nov 2024, Kratsios et al., 21 Apr 2025). Empirical studies suggest that KANs can often achieve a desired accuracy with orders of magnitude fewer parameters than MLPs, with added benefits in spectral representation and robustness in scientific and engineering tasks (Faroughi et al., 30 Jul 2025, Liu et al., 30 Apr 2024).
However, challenges remain regarding computational cost, hyperparameter selection, and training instability in high-dimensional or noisy domains. The theoretical impetus for KANs has inspired a broad wave of hybrid and domain-adapted architectures, and ongoing research focuses on optimizing their integration and generalization properties within both scientific machine learning and broader deep learning contexts.