Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 11 tok/s
GPT-5 High 17 tok/s Pro
GPT-4o 77 tok/s
GPT OSS 120B 476 tok/s Pro
Kimi K2 232 tok/s Pro
2000 character limit reached

Kolmogorov-Arnold Networks

Updated 28 August 2025
  • Kolmogorov-Arnold Networks are neural architectures that decompose multivariate functions using adaptive univariate functions based on the Kolmogorov–Arnold theorem.
  • KANs employ diverse basis functions like splines, radial basis, and sinusoidal forms to achieve high accuracy and parameter efficiency.
  • KAN frameworks integrate with physics-informed methods, operator learning, graph networks, and time-series models, offering robust alternatives to traditional MLPs.

Kolmogorov-Arnold Networks (KANs) are a family of neural architecture models directly inspired by the Kolmogorov–Arnold representation theorem, which guarantees that any continuous multivariate function can be decomposed into a finite sum of univariate functions composed or added in a prescribed structure. This principle fundamentally reorients the design of neural function approximators by shifting the locus of learnability from weights in traditional feedforward networks to parameterized univariate functions on the edges (connections) of the architecture. The KAN framework encompasses a wide spectrum of adaptations, including spline-based, radial basis, polynomial, sinusoidal, and variational formulations, with proven advantages in accuracy, interpretability, and parameter efficiency across data-driven, physics-informed, and operator learning paradigms.

1. Mathematical and Theoretical Foundations

The core of the KAN paradigm is the Kolmogorov–Arnold superposition theorem, which, in its most utilized form for KANs, asserts: f(x1,,xn)=q=12n+1Φq(p=1nψqp(xp))f(x_1, \ldots, x_n) = \sum_{q=1}^{2n+1}\Phi_q\left( \sum_{p=1}^n \psi_{qp}(x_p) \right) where each ψqp\psi_{qp} and Φq\Phi_q is a continuous univariate function. KANs realize this structural decomposition, replacing the traditional matrix-vector multiplication and fixed activation of multilayer perceptrons with learnable univariate transformations via adaptive basis functions (e.g., splines, polynomials, radial basis, sinusoidal functions) (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024, Gleyzer et al., 1 Aug 2025).

In the KAN architecture, each edge between nodes is associated with its own function parameterized as a linear combination of suitable bases, such as cubic B-splines. The nodes themselves execute simple summation over incoming edges. This approach, compared to MLPs, is justified theoretically by both (i) representation theorems showing all continuous functions are within the model class, and (ii) analyses demonstrating that the approximation and generalization bounds scale with function smoothness and grid resolution rather than the input’s ambient dimension (Zhang et al., 10 Oct 2024, Kratsios et al., 21 Apr 2025).

2. Architectural Design and Activations

The minimal KAN instantiation is two-layered: inputs are transformed by inner univariate functions, aggregated, and then remapped by outer univariate functions before final summation. This is generalized to deeper and wider networks for modern tasks (Liu et al., 30 Apr 2024, Basina et al., 15 Nov 2024).

A canonical layer’s edge function is: φ(x)=wbb(x)+wsS(x)\varphi(x) = w_b\cdot b(x) + w_s\cdot S(x) where S(x)S(x) is a spline expansion and b(x)b(x) may be a simple nonlinearity such as SiLU. Spline activations are commonly represented as

S(x)=i=1nciBi(x)S(x) = \sum_{i=1}^{n} c_iB_i(x)

with Bi(x)B_i(x) a B-spline of chosen order, and {ci}\{c_i\} trainable.

Alternative KAN formulations replace splines with other bases:

  • Sinusoidal activations (Gleyzer et al., 1 Aug 2025): f(x)k=0NAksin(ωkx+ϕk)f(x) \approx \sum_{k=0}^N A_k \sin(\omega_k x + \phi_k), with learnable frequencies.
  • Radial Basis (Li, 10 May 2024): Spline bases are replaced by appropriately parameterized Gaussian RBFs.
  • Chebyshev polynomials, wavelets, and others appear in operator and physics-informed variants (Faroughi et al., 30 Jul 2025).
  • Variational KANs (InfinityKAN) treat the number of basis functions as a latent variable, adaptively selected during learning by variational inference (Alesiani et al., 3 Jul 2025).

KANs are often integrated as edge-wise modules within more complex architectures (e.g., autoencoders (Moradi et al., 2 Oct 2024), graph networks (Kiamari et al., 10 Jun 2024), time-series models (Liu et al., 15 Jan 2025), or operator learners (Toscano et al., 21 Dec 2024)).

3. Theoretical Analysis: Approximation and Generalization

KANs are universal approximators, inheriting this property from the Kolmogorov–Arnold theorem and reinforced in network settings through constructive proofs. Recent works establish that KANs can optimally approximate any function in the Besov space Bp,qs(X)B^s_{p,q}(\mathcal{X}) at corresponding rates in weaker Besov norms, even on fractal domains (Kratsios et al., 21 Apr 2025). The error rate of a spline-based KAN for a Ck+1C^{k+1} target is: f(ΦL1GΦ0G)(x)CmCG(k+1)+m\|f - (\Phi^G_{L-1} \circ \cdots \circ \Phi^G_0)(x)\|_{C^m} \leq C G^{-(k+1)+m} with GG the grid resolution (number of spline knots).

The generalization bounds for KANs, whether with basis expansions or RKHS-based activations, scale with the l1l_1-norm of coefficient matrices, Lipschitz constants of the activations (per layer), and (in low-rank cases) the product of effective layer ranks, but critically not with the number of nodes outside of logarithmic factors (Zhang et al., 10 Oct 2024). This supports the empirical finding that KANs can be more parameter-efficient and robust to overfitting than comparably performant MLPs.

4. Empirical Performance and Applications

KANs consistently demonstrate advantages over MLPs, and in many cases also over convolutional networks, in a variety of domains:

  • Function regression and PDE solving: Compact KANs outperform significantly larger MLPs in accuracy and parameter efficiency (Liu et al., 30 Apr 2024, Kratsios et al., 21 Apr 2025). In PDE scenarios, PIKAN formulations exploit the flexible basis to encode physical laws and constraints (Faroughi et al., 30 Jul 2025).
  • Graph learning: GKANs replace fixed-weight edge transformations in GCNs with edge-adaptive univariate functions, achieving higher accuracy on node classification tasks with similar parameter counts (Kiamari et al., 10 Jun 2024).
  • Time series and causal inference: Time-KAN and KANGCI introduce autoregressive and sparsity-penalized variants for tasks such as Granger causality in nonlinear and high-dimensional time series (Liu et al., 15 Jan 2025).
  • Operator learning: Combined with DeepONet-type architectures, KAN and KKAN backbones show improved accuracy and convergence for learning mappings between function spaces (Toscano et al., 21 Dec 2024).
  • Industrial inspection: KANs, leveraging spline-based function approximation, offer improved test accuracy and parameter efficiency in defect classification from images, as observed in NEU and Severstal datasets (Krzywda et al., 10 Jan 2025).

KANs also provide interpretability, as learned univariate activation functions can be visualized or even "snapped" to symbolic expressions, supporting human-in-the-loop model development (Liu et al., 30 Apr 2024).

5. Training Dynamics and Implementation Considerations

KANs introduce nuances in training owing to their high parameterization per unit and strong local adaptivity:

  • Initialization and Optimization: Kaiming-Normal initialization, adaptive optimizers (e.g., Adam), and small learning rates (≤ 51045\cdot 10^{-4}) are recommended for stability (Sohail, 8 Nov 2024).
  • Overfitting and Regularization: KANs can overfit early and are highly sensitive to initialization and learning rate choice. Incorporating dropout or complexity penalties (as informed by theoretical bounds) improves generalization (Zhang et al., 10 Oct 2024, Sohail, 8 Nov 2024).
  • Training Dynamics: The "diffusion" training phase (as elucidated by information bottleneck theory in KKANs) is associated with high signal-to-noise ratio and optimal generalization (Toscano et al., 21 Dec 2024).
  • Efficiency: Spline-based KANs can incur higher computational costs due to spline evaluations; FastKAN replaces splines with Gaussian RBFs for faster inference (Li, 10 May 2024). Other speedups utilize precomputed lookup tables for spline and derivative evaluation (Nagai et al., 25 Jul 2024), and ensemble/boosting-inspired methods for probabilistic output estimation (Polar et al., 2021).
  • Scalability: While error scaling is favorable (dimension-independent under smoothness assumptions), practical scalability depends on efficient kernel and basis representations, hardware support, and exploiting separability (Basina et al., 15 Nov 2024, Faroughi et al., 30 Jul 2025).

Variational and adaptive approaches (e.g., InfinityKAN) allow the number of bases in each univariate activation to be learned automatically, mitigating a key design challenge (Alesiani et al., 3 Jul 2025).

6. Variants, Extensions, and Future Research

The KAN framework has seen rapid diversification, including:

  • Multifidelity KAN (MFKAN): Decomposes model learning into low- and high-fidelity blocks, with linear and nonlinear corrections, useful for integrating coarse simulations and high-precision data, and enabling data-lean physics-informed learning (Howard et al., 18 Oct 2024).
  • KKAN (Kurkova-Kolmogorov-Arnold Network): Two-block structures with deep MLPs for inner function learning and flexible basis-combination outer blocks; demonstrates robust performance in regression and operator learning (Toscano et al., 21 Dec 2024).
  • Sinusoidal and wavelet KANs: These replace spline bases with oscillatory functions, validated by universal approximation theorems and empirical success on rapidly varying functions (Gleyzer et al., 1 Aug 2025).
  • Probabilistic KANs: Methods such as Divisive Data Re-sorting (DDR) provide empirical output distributions for aleatoric uncertainty quantification (Polar et al., 2021).

Open research directions identified in the surveyed literature include:

7. Comparative Analysis and Position within Machine Learning

KANs stand apart from MLPs by aligning network structure with the intrinsic compositionality of the function class, enabling dimension-agnostic error scaling and vastly superior interpretability (Basina et al., 15 Nov 2024, Kratsios et al., 21 Apr 2025). Empirical studies suggest that KANs can often achieve a desired accuracy with orders of magnitude fewer parameters than MLPs, with added benefits in spectral representation and robustness in scientific and engineering tasks (Faroughi et al., 30 Jul 2025, Liu et al., 30 Apr 2024).

However, challenges remain regarding computational cost, hyperparameter selection, and training instability in high-dimensional or noisy domains. The theoretical impetus for KANs has inspired a broad wave of hybrid and domain-adapted architectures, and ongoing research focuses on optimizing their integration and generalization properties within both scientific machine learning and broader deep learning contexts.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube