Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 91 tok/s
Gemini 3.0 Pro 46 tok/s Pro
Gemini 2.5 Flash 148 tok/s Pro
Kimi K2 170 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Kolmogorov-Arnold Network (KAN)

Updated 20 November 2025
  • KAN is a neural network architecture that employs learnable univariate functions instead of scalar weights, leveraging the Kolmogorov–Arnold theorem for function representation.
  • KAN achieves high expressive power with orders-of-magnitude fewer parameters compared to conventional MLPs, using B-spline basis functions for efficient nonlinearity.
  • KAN enables improved interpretability and hardware acceleration through direct extraction of learned function transformations and optimized LUT-based computation.

Kolmogorov Arnold Network (KAN) is a neural network architectural paradigm that implements the Kolmogorov–Arnold superposition theorem in a deep, trainable framework. By replacing fixed scalar weights and activations with learnable univariate functions—most commonly parametrized by B-splines—KANs provide a parameter-efficient, highly expressive, and interpretable alternative to traditional multilayer perceptrons (MLPs). This structure enables the network to achieve comparable or superior expressive power with orders-of-magnitude fewer parameters, while also enabling direct extraction and analysis of the functional transformations learned on each edge. The design and hardware implementation of KANs present novel algorithmic and system-level challenges, especially in the context of large-scale deployment and efficient analog or in-memory acceleration.

1. Kolmogorov–Arnold Theoretical Foundations and Mathematical Formulation

KANs are directly inspired by the Kolmogorov–Arnold representation, which states that any continuous multivariate function f:[0,1]nRf : [0,1]^n \to \mathbb{R} can be represented as a finite sum of univariate continuous functions of affine combinations of inputs: f(x1,,xn)=k=12n+1Φk(i=1nϕk,i(xi))f(x_1, \ldots, x_n) = \sum_{k=1}^{2n+1} \Phi_k\left(\sum_{i=1}^n \phi_{k,i}(x_i)\right) where both the inner functions ϕk,i\phi_{k,i} and outer functions Φk\Phi_k are continuous and effectively reduce the problem of high-dimensional function learning to learning one-dimensional functionals.

KAN generalizes this construction into a layered neural network where each edge corresponds to a learnable univariate function (rather than a scalar weight). The canonical KAN-layer is defined as: y+(x)=Wbb(x)+i=0K+G1ciBi(x)y^{+}(x) = W_b b(x) + \sum_{i=0}^{K+G-1} c_i' B_i(x) where b(x)b(x) is a standard activation function (e.g., ReLU), Bi(x)B_i(x) are B-spline basis functions of order KK over a knot grid of size GG, and cic_i' are trainable coefficients (often quantized for hardware deployment) (Huang et al., 7 Sep 2025).

This architecture admits substantial parameter reduction: KANs with B-splines of modest order and grid size achieve equivalent expressiveness to MLPs, but with 10310510^3 - 10^5 fewer parameters per layer (Noorizadegan et al., 28 Oct 2025). Theoretical and empirical analysis demonstrates KANs provide faster scaling laws and approximation rates—up to N(k+1)\ell \propto N^{-(k+1)} for degree-kk splines—than classical MLPs, which are limited by their fixed-pointwise nonlinearity (Liu et al., 30 Apr 2024, Noorizadegan et al., 28 Oct 2025).

2. KAN Architecture, Basis Function Parameterization, and Variants

In KANs, each univariate function is typically expanded in a basis: φq,p()(x)=k=1Kcq,p,k()Bk(x)\varphi_{q,p}^{(\ell)}(x) = \sum_{k=1}^K c_{q,p,k}^{(\ell)} B_k(x) with the choice of basis critically influencing smoothness, locality, regularization, and computational cost (Noorizadegan et al., 28 Oct 2025). Common basis choices include cubic B-splines (for compact support and locality), Chebyshev or Jacobi polynomials (for spectral properties), ReLU polynomials, Gaussian RBFs, Fourier series, and bandlimited Sinc expansions. Advanced variants layer multiple basis types, allow grid/knot adaptation, or enable rational/fractional-order warping (e.g., rKAN, fKAN).

The most widely adopted form is the cubic B-spline KAN, with grid-size G32G \approx 32 as a robust baseline. For discontinuous or highly oscillatory functions, Sinc or rational Jacobi basis functions are preferred (Noorizadegan et al., 28 Oct 2025).

KANs can be extended with residual connections, gating mechanisms, hybrid MLP-KAN stacks, or domain-decomposition strategies (such as FBKAN and SPIKAN). The architectural choices, including basis selection and adaptation, network depth, and parameter sharing, are application- and domain-specific.

3. Algorithm-Hardware Co-Design and Hardware Acceleration

The central hardware challenge in KAN deployment arises from the complexity of B-spline evaluation, which involves recursive Cox–de Boor computation with many divides and multiplications. Standard practice for hardware mapping is LUT-based evaluation, where values of Bi(x)B_i(x) are precomputed and accessed via address decoding and multiplexing—a strategy that, while fast, incurs large area and energy overheads (Huang et al., 7 Sep 2025).

To address this, algorithm–hardware co-design strategies have emerged:

  • ASP-KAN-HAQ (Alignment-Symmetry & PowerGap KAN Hardware-Aware Quantization): Enforces knot-quantization alignment and leverages LUT symmetry, allowing storage sharing and significant reductions in decoder and multiplexer circuitry. This yields up to 44.2×44.2\times area and 7.1×7.1\times energy reductions over naive post-training quantization for grid sizes up to 64.
  • KAN-SAM (KAN Sparsity-Aware Mapping): Maps basis coefficients onto array rows in descending order of activation criticality (a function of activation statistics, mean, variance, and coefficient importance), optimizing placement for robustness to IR-drop in RRAM-ACIM arrays. This strategy reduces accuracy loss by 2.8×2.8\times5.3×5.3\times compared to uniform mapping.
  • N:1 Time-Modulation Dynamic-Voltage Input Generator (TM-DV-IG): Hybrid digital-analog input encoding solution that combines voltage- and pulse-width-based quantization to balance speed, resolution, noise margin, and circuit area. This technique achieves up to 4×4\times10×10\times joint improvement in area/power/latency compared to pure voltage- or PWM-based schemes.

At the circuit level, analog compute-in-memory multiply-accumulate (ACIM MAC) with RRAM cells encodes bit-sliced coefficients, and incorporates mitigation techniques for partial sum deviations due to process variation and IR-drop (Huang et al., 7 Sep 2025).

4. Empirical Evaluation, Scaling, and Model Efficiency

Large-scale validation of these algorithm–hardware codesign strategies is presented on commercial 22nm RRAM-ACIM technology (Huang et al., 7 Sep 2025). Key benchmarks include collaborative filtering (CF-KAN) recommendation models with parameter budgets of 39MB and 63MB—representing 5×1055 \times 10^5 to 8×1058 \times 10^5 increase over tiny models.

Scaling metrics include:

  • Area Overhead: Increases by only 2.8×1042.8 \times 10^4 to 4.1×1044.1 \times 10^4 despite 5×1055 \times 10^5 to 8×1058 \times 10^5 increase in parameter count.
  • Power Consumption: Rises moderately (factor 5.1×1015.1 \times 10^1 to 9.4×1019.4 \times 10^1).
  • Accuracy Degradation: Minimal (0.11%0.23%0.11\%-0.23\%), far less than typical scaling penalties in classical DNN frameworks.
  • End-to-end Latency: Measured at 3.6μs4.4μs3.6 \mu\text{s} - 4.4 \mu\text{s} for inference.

These results establish the feasibility of scaling KANs to large inference workloads with disciplined co-design, while maintaining the parameter efficiency and interpretability intrinsic to the architecture (Huang et al., 7 Sep 2025).

5. Comparison with Deep Neural Network Architectures and Parameter Efficiency

In contrast to conventional DNNs, where each layer involves a large weight matrix followed by fixed nonlinearities, KAN replaces linear weights with a sparse, learnable ensemble of univariate nonlinear transformations. This results in:

  • Dramatic Parameter Reduction: For a layer of size M×NM \times N, KANs require only (G+K)(G+K) trainable coefficients per channel, as opposed to M×NM \times N weights, yielding typical reductions of three to five orders of magnitude.
  • Interpretability: Each basis function is visible and can be visualized or locked to analytic forms, enabling post-training symbolic regression and human-driven discovery (Liu et al., 30 Apr 2024).
  • Spectral and Scaling Behavior: KANs empirically achieve higher scaling exponents in error-parameter count power law (α\alpha), and demonstrate weaker spectral bias (ability to fit high-frequency modes efficiently), as confirmed by Neural Tangent Kernel analyses (Noorizadegan et al., 28 Oct 2025).
  • Convergence and Robustness: KANs exhibit robust convergence properties with proper regularization, though high-degree splines demand careful training and numerical engineering, especially in large-scale or high-dimensional settings.

6. Advanced Topics: Grid Adaptation, Quantization, and Practical Considerations

Optimal KAN deployment in hardware involves grid adaptation (dynamic increase of knot intervals as validation loss decreases), grid alignment with quantization, LUT sharing, and selection of local/global decoder splits for minimized storage and logic (Huang et al., 7 Sep 2025). Quantization (commonly to 8 bits) is essential for deploying trained coefficients in memory arrays.

Mapping decisions in RRAM-based MAC arrays must account for the non-idealities of analog hardware, including device variability and IR-drop, necessitating data-driven, sparsity-aware placement (KAN-SAM) and architecture-aware optimization. All metric and mapping decisions require detailed empirical validation within the constraints of the target hardware platform.

7. Outlook and Continuing Research Directions

KANs are now an established direction for parameter- and power-efficient machine learning, with empirically validated scaling in large-scale hardware. Open research directions include further reduction of inference latency, integration with domain-decomposition and symbolic discovery workflows, formal theory of basis function selection and adaptation, and extension to more diverse analog and in-memory platforms.

Ongoing work aims to standardize robust component libraries for each major basis family, formalize the connections between basis complexity and generalization, and expand the class of smooth and non-smooth functions efficiently realizable within the KAN framework. Hardware-level research will continue to focus on LUT optimization, low-power quantization, and efficient mapping of KANs to hybrid digital-analog inference accelerators (Huang et al., 7 Sep 2025, Noorizadegan et al., 28 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Kolmogorov Arnold Network (KAN).