FastKAN: Accelerated Kolmogorov–Arnold Networks

Updated 15 December 2025

FastKAN is an efficient variant of Kolmogorov–Arnold Networks that replaces adaptive B-spline activations with fixed radial basis function kernels for enhanced computational speed.
FastKAN achieves significant speedups (3–40×) compared to spline-based KANs by enabling parallelizable, vectorized evaluations that reduce memory overhead.
FastKAN maintains the expressive power and interpretability of traditional KANs while supporting diverse tasks including reinforcement learning and scientific inference.

FastKAN is a computationally efficient variant of the Kolmogorov–Arnold Network (KAN). KANs, inspired by the Kolmogorov–Arnold superposition theorem, replace the fixed, node-based nonlinear activations of standard neural networks with adaptive, learnable, edge-wise univariate functions—typically parameterized as B-splines. However, the recursive evaluation of spline bases, especially at moderate-to-high orders and grid densities, imposes substantial computational and memory overhead. FastKAN addresses these performance bottlenecks by exchanging the original spline activations for fixed, parallelizable kernel expansions, principally using radial basis functions (RBFs, typically Gaussians), or, in some implementations, nonrecursive matrix representations. This enables substantial acceleration of both training and inference while maintaining the expressive power and interpretability characteristic of KANs.

1. Mathematical Foundation and Architectural Principles

In the canonical KAN layer, each output coordinate $x^{(\ell)}_i$ at layer $\ell$ is defined by a sum over input coordinates $x^{(\ell-1)}_j$ via learnable univariate maps: $x^{(\ell)}_i = \sum_{j=1}^{n_{\ell-1}} \phi_{ij}^{(\ell)}\!\left(\mathrm{LayerNorm}(x^{(\ell-1)}_j)\right)$ where each $\phi_{ij}^{(\ell)}$ is typically realized as a spline expansion: $\phi_{ij}^{(\ell)}(x) = \sum_{k=1}^{K} \theta_{ijk}^{(\ell)} B_{k}(x)$ with $B_{k}(\cdot)$ the B-spline basis of fixed order and uniform knots.

The defining change in FastKAN is to replace the spline basis with an RBF dictionary, yielding: $\phi_{ij}^{(\ell)}(x) = w_{ij}^{(\ell)} b(x) + \sum_{k=1}^{g} \theta_{ijk}^{(\ell)} \exp\left(-\frac{(x - \mu_k)^2}{\sigma^2}\right)$ where $b(x)$ is a fixed base activation (e.g., SiLU), $\{\mu_k\}$ are fixed RBF centers, $\sigma$ is the common width, and $\{w_{ij},\,\theta_{ijk}\}$ are trainable. Typically, centers $\mu_k$ are uniformly distributed over the normalized input range to ensure input coverage and moderate overlap of the Gaussian windows (Villagómez et al., 4 Aug 2025, Coffman et al., 11 Feb 2025, Li, 10 May 2024, Shi et al., 8 Dec 2025).

2. Complexity, Vectorization, and Performance

The computational bottleneck in original KANs is the recursive evaluation of B-splines via the Cox–de Boor algorithm, whose complexity per call is $O(G k^2)$ (with $G$ grid intervals, $k$ spline degree). FastKAN reduces this by adopting a fixed RBF grid. Each forward pass in a FastKAN layer is dominated by the evaluation of $g$ (typically $g=5$ –$8$) Gaussian kernels per edge, yielding total cost: $O(g\,n_{\ell-1}\,n_{\ell})$ per sample. This supports full batching and vectorization on modern hardware. Memory usage is reduced as RBF activations are computed per batch and reused, and there is no per-sample adaptive knot processing (Villagómez et al., 4 Aug 2025, Li, 10 May 2024, Shi et al., 8 Dec 2025). Empirical results on V100/A100 GPUs confirm speedup factors of 3–40 $\times$ over efficient spline-based KANs, proportional to spline degree and dataset size (Coffman et al., 11 Feb 2025, Li, 10 May 2024).

3. Regularization, Training, and Optimization Strategies

FastKAN is agnostic to loss objectives, supporting supervised (classification, regression) and unsupervised (e.g., autoencoding) tasks. For autoencoders, the objective is mean squared reconstruction error: $\mathcal{L}_{\mathrm{rec}} = \frac{1}{N}\sum_{i=1}^N \left\|x_i - g\big(f(x_i)\big)\right\|_2^2$ where $f,g$ are the FastKAN encoder/decoder (Villagómez et al., 4 Aug 2025).

Regularization is typically achieved via decoupled $\ell_2$ weight decay, applied through AdamW or similar optimizers, and occasionally by imposing finite-difference or Hessian penalties on spline/edge coefficient smoothness (especially for overparameterized or low-data regimes) (Lee et al., 9 May 2025, Pozdnyakov et al., 8 Sep 2025). Grid search or Tree-structured Parzen Estimator (TPE) methods are used for hyperparameter selection. Batch normalization or layer normalization is standard before RBF activations to keep feature scales compatible with the fixed RBF windowing (Li, 10 May 2024, Pozdnyakov et al., 8 Sep 2025).

4. Benchmarks and Empirical Results

FastKAN has been benchmarked across various domains:

Fault Detection in Chemical Processes (Autoencoder Setting):
- On the Tennessee Eastman Process, FastKAN-AE achieves an average fault detection rate (FDR) for back-to-control faults rising to $\sim$ 100\% and uncontrollable faults above 90\% with large datasets ( $n_{\mathrm{train}}\geq 5\times 10^4$ ), while controllable faults drop to false-alarm levels as expected for anomaly detection.
- Parameter count (10,074) matches OAE, but FastKAN-AE achieves richer edge-wise nonlinearity and comparable memory footprint (Villagómez et al., 4 Aug 2025).

$n_{\mathrm{train}}$	Controllable FDR	Back-to-Control FDR	Uncontrollable FDR
625	7.1%	95.4%	65.2%
5,000	5.9%	98.2%	80.6%
51,500	4.6%	$\sim$ 100%	90.1%
250,000	4.6%	$\sim$ 100%	92.3%

Model-based Reinforcement Learning (RL):
- Integrated into DreamerV3, FastKAN layers yield training and inference throughput commensurate with MLPs (latency within 10–20\% for reward/continue predictors at identical parameter counts) and maintain parity in downstream policy sample efficiency and asymptotic performance (Shi et al., 8 Dec 2025).
- Visual encoding with FastKAN is less effective than CNNs in image reconstruction, indicating limitations in capturing spatial correlations without inductive biases.
Emulation for Scientific Inference:
- In global 21 cm cosmology signal emulation, 21cmKAN (a FastKAN implementation) achieves sub-mK error, training speed $75\times$ faster than LSTM benchmarks, and evaluation latency of 3.7 ms per sample at comparable accuracy (Jones et al., 15 Aug 2025).
General Function Approximation and Classification:
- On MNIST, FastKAN delivers a $3.3\times$ faster forward pass and $1.25\times$ faster forward+backward pass (rel. to efficient-KAN) with no measurable accuracy loss (Li, 10 May 2024).
- In MBRL, as a drop-in regressor for scalar prediction (reward, continue), FastKAN matches baseline sample efficiency and wall-clock performance (Shi et al., 8 Dec 2025).

5. Hardware Acceleration and Implementation Details

Accelerating B-spline evaluation on hardware presents unique challenges due to recursive dependencies and local support. FastKAN—by design—eliminates these recursion dependencies and supports fully vectorized, fused-kernel GPU implementations and efficient CPU backend via BLAS/cuBLAS:

High-performance JAX and PyTorch kernels leverage batched exponential and matrix multiplication primitives (Shi et al., 8 Dec 2025, Coffman et al., 11 Feb 2025).
MatrixKAN (closely related) formalizes the nonrecursive matrix evaluation of B-splines: each interval forms a power basis $U(u)$ and applies a precomputed $k\times k$ matrix $\Psi^{(k)}$ . This enables $O(1)$ depth per batch, with a one-time $O(k^4)$ initialization cost (Coffman et al., 11 Feb 2025).
On systolic array hardware, nonrecursive/lookup-based spline units enable single-cycle, high-utilization multiplication and accumulation, achieving up to $2\times$ lower latency for equal-area implementations and utilization up to 100% (Errabii et al., 20 Nov 2025).
For low-dimensional settings, kernel lookups and sum-aggregation are cheap enough to allow for code generation or dedicated kernels (e.g., CUDA) (Pozdnyakov et al., 8 Sep 2025).

6. Interpretability and Model Transparency

Each FastKAN edge function is a linear combination of fixed Gaussian basis functions (or, in splined FastKAN, nonrecursive B-spline matrices), facilitating post hoc analysis:

Learned coefficients $\theta_{ijk}$ can be visualized to interpret latent feature sensitivity as a function of input region (Villagómez et al., 4 Aug 2025, Jones et al., 15 Aug 2025).
In scientific emulation, inspecting edgewise spline shapes exposes domain-variable salience directly, enabling sensitivity analysis and parameter importance quantification without external tools (Jones et al., 15 Aug 2025).
The regular structure of basis expansions—whether Gaussian or polynomial—enables symbolic metamodel discovery and smoothness control via explicit regularization (Lee et al., 9 May 2025, Li, 10 May 2024).

7. Limitations, Trade-offs, and Extensions

While FastKAN provides algorithmic and hardware acceleration, several constraints persist:

The fixed RBF/lookup grids can limit expressivity in high-noise or highly nonstationary domains compared to fully adaptive spline placements (Villagómez et al., 4 Aug 2025, Pozdnyakov et al., 8 Sep 2025).
On vision tasks, lack of spatial priors limits visual encoder performance; CNN-style prior or “Convolutional KAN” blocks are an active research direction (Shi et al., 8 Dec 2025).
For large spline degrees ( $d>100$ ) or massive batch/edge counts, temporary storage costs or GEMM overheads may bottleneck.
Extensions include hybrid lmKAN (multivariate spline tables), adaptive grid placement, Hessian regularization for curvature suppression, and direct hardware implementation via lookup/memory arrays (Pozdnyakov et al., 8 Sep 2025, Errabii et al., 20 Nov 2025).

Further, recent variants such as PowerMLP (Qiu et al., 18 Dec 2024) place closed-form, non-iterative nonlinearities at nodes but structurally diverge from FastKAN by focusing on ReLU-powers rather than edgewise spline or RBF expansions.

FastKAN represents a pivotal advance in the practicality of Kolmogorov–Arnold superposition-based neural architectures, delivering substantial speedup via parallelized, nonrecursive basis expansions with RBF dictionaries or fast matrix methods, without sacrificing accuracy, transparency, or universality in its core function-approximation setting (Villagómez et al., 4 Aug 2025, Coffman et al., 11 Feb 2025, Li, 10 May 2024, Shi et al., 8 Dec 2025, Errabii et al., 20 Nov 2025, Jones et al., 15 Aug 2025).