Papers
Topics
Authors
Recent
Search
2000 character limit reached

Free-Knots Kolmogorov-Arnold Networks

Updated 20 April 2026
  • Free-Knots Kolmogorov-Arnold Networks (FR-KAN) are neural architectures that turn fixed B-spline grids into adaptive, trainable knot placements based on data-driven curvature.
  • They employ curvature-based importance density functions to allocate knots dynamically, enhancing expressivity and statistical efficiency while maintaining training stability.
  • FR-KAN demonstrates state-of-the-art performance in diverse applications such as symbolic regression, scientific machine learning, and image classification, with improved interpretability and efficiency.

Free-Knots Kolmogorov-Arnold Networks (FR-KAN) are a class of neural architectures derived from Kolmogorov-Arnold Networks (KAN), extending them by promoting the knot locations in the B-spline parameterization to trainable variables. This dynamic allocation of knot positions enables adaptive grid refinement according to the target function's geometric complexity, offering enhanced expressivity, statistical efficiency, and training stability compared to fixed-grid KAN variants. FR-KANs have demonstrated state-of-the-art empirical performance across symbolic regression, scientific machine learning, and diverse real-world data modalities (Rigas et al., 26 Jan 2026, Zheng et al., 16 Jan 2025, Liu et al., 2024).

1. Theoretical Foundation and Formulation

Kolmogorov-Arnold Networks are based on the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function f:[0,1]nRf:[0,1]^n\to\mathbb{R} can be expressed as a superposition of univariate functions: f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right) KAN instantiates each univariate map Φ\Phi by a learnable B-spline of order KK over a fixed grid of knots, optionally augmented with a SiLU shortcut. In its free-knot generalization, FR-KAN, the knot vector κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1}) for each input dimension is optimized jointly with spline coefficients.

A typical FR-KAN layer is parameterized as: f(x;θ,κ)n=i=1ninm=1G+Kbn,i,mBm(K)(xi;κi)+rn,iSiLU(xi)f(x;\theta,\kappa)_n = \sum_{i=1}^{n_{\rm in}} \sum_{m=1}^{G+K} b_{n,i,m} B_m^{(K)}(x_i;\kappa_i) + r_{n,i} {\rm SiLU}(x_i) where Bm(K)(;κi)B_m^{(K)}(\cdot;\kappa_i) denotes the mm-th order-KK B-spline basis function with knot vector κi\kappa_i, and both the weights f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)0 and knots f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)1 are trainable parameters (Rigas et al., 26 Jan 2026, Zheng et al., 16 Jan 2025).

2. Knot Adaptation via Importance Density Functions

FR-KAN formulates knot allocation as a density estimation problem and introduces Importance Density Functions (IDF) to guide the knot distribution. Given a batch of samples f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)2, importance weights f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)3 are computed, typically using curvature-based surrogates. The empirical probability mass function for each input point is then f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)4.

To adapt to geometric complexity, the curvature-based IDF is defined as: f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)5 with a corresponding normalized density in the continuous limit: f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)6 where f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)7 is a small constant to avoid vanishing density (Rigas et al., 26 Jan 2026). The knot locations are optimized so the induced grid's empirical cumulative distribution (CDF) matches the quantiles of the IDF. The grid-matching loss is given by: f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)8 where f(x1,,xn)=q=02nΨq(p=1nΦq,p(xp))f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)9 and Φ\Phi0 are uniform quantiles. The loss gradient is explicitly derived, providing efficient backpropagation for knot updates.

3. Optimization and Stability Mechanisms

The training objective for FR-KAN is augmented with a grid-matching term: Φ\Phi1 where Φ\Phi2 is the prediction loss (e.g., MSE, cross-entropy, or a physics-informed residual for PDEs) and Φ\Phi3 controls the adaptation strength (Rigas et al., 26 Jan 2026). Training optimizes both spline weights and knot positions: Adam is used for Φ\Phi4, and SGD with a smaller learning rate for Φ\Phi5 is empirically effective.

To ensure smoothness and mitigate oscillations, FR-KAN introduces a Φ\Phi6 continuity regularizer: Φ\Phi7 enforced via finite differences at grid points. A wide initialization range Φ\Phi8 with Φ\Phi9–KK0 is critical for avoiding NaN divergence and reducing grid clustering. Spline parameters are grouped among neurons to reduce parameter overhead and match standard MLPs for scalability (Zheng et al., 16 Jan 2025).

4. Empirical Performance and Comparison

FR-KANs have been evaluated on synthetic function fitting, regression tasks drawing from the Feynman equations, time series prediction, image classification (MNIST, CIFAR-10/100, STL-10), text (AG News), and multimodal datasets (AVMNIST, MIMIC-III) (Zheng et al., 16 Jan 2025).

Performance benchmarks highlight:

  • On 10 synthetic functions, curvature-based FR-KAN reduces median relative KK1 error by 25.3% (Wilcoxon KK2) over input-density baseline.
  • On 15 Feynman regression tasks, it improves error by 9.4% (KK3).
  • With the 2D Helmholtz PDE, average relative error drop is 23.3% (e.g., 34.2%, 3.15%, 27.7%, 28.1% for different frequencies) (Rigas et al., 26 Jan 2026).
  • On real datasets, FR-KAN matches or exceeds ReLU-MLP accuracy, attaining +3–5% improvement on CIFAR-100/STL-10 and halving RMSEs on Feynman regression tasks.

Ablation analyses demonstrate that free-knot allocation (versus fixed grid) increases the maximum number of knots per layer and thus expressive power, while KK4 regularization cuts accuracy variance in half and enhances convergence reliability (Zheng et al., 16 Jan 2025). Stability with expanded interval grids (KK5) enables deeper networks to avoid instability and grid collapse.

5. Computational Complexity and Implementation

Curvature-based IDF estimation at each training step requires Hessian diagonal evaluations, which can be performed via automatic differentiation at KK6 cost or finite differences. Quantile computation for grid updating requires KK7 per knot group. Empirical wall-clock overhead is typically 5–15% above the input-density baseline, but this amortizes for deeper/wider tasks:

  • Synthetic function: 10.8% overhead (15.47sKK817.13s)
  • Feynman regression: 14.8% (15.36sKK917.62s)
  • Helmholtz PDE: 5% (61.21sκi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})064.30s) (Rigas et al., 26 Jan 2026).

Grouping spline parameters across neurons allows the per-layer parameter count to remain near κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})1, comparable to standard MLPs, and below the original KAN's κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})2 scaling.

6. Theoretical Properties and Scalability

KANs, and by extension FR-KANs, admit formal universal approximation guarantees. Theoretically, for a KAN with splines of degree κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})3 and κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})4 grid intervals, the κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})5-norm approximation error scales as: κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})6 and empirically, loss scaling for KANs is κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})7, outperforming ReLU MLPs which saturate at κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})8 (Liu et al., 2024). FR-KAN's knot adaptivity further concentrates resolution where needed, suggesting improved parameter efficiency. Spline-knot bounds are theoretically established, with the maximum knot count per layer raised from κi=(κi,1,,κi,G+1)\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})9 to f(x;θ,κ)n=i=1ninm=1G+Kbn,i,mBm(K)(xi;κi)+rn,iSiLU(xi)f(x;\theta,\kappa)_n = \sum_{i=1}^{n_{\rm in}} \sum_{m=1}^{G+K} b_{n,i,m} B_m^{(K)}(x_i;\kappa_i) + r_{n,i} {\rm SiLU}(x_i)0 by free-knot mechanisms (Zheng et al., 16 Jan 2025). f(x;θ,κ)n=i=1ninm=1G+Kbn,i,mBm(K)(xi;κi)+rn,iSiLU(xi)f(x;\theta,\kappa)_n = \sum_{i=1}^{n_{\rm in}} \sum_{m=1}^{G+K} b_{n,i,m} B_m^{(K)}(x_i;\kappa_i) + r_{n,i} {\rm SiLU}(x_i)1 regularization and wide ranges ensure smooth, stable activations.

7. Interpretability and Extensions

FR-KANs remain interpretable: each edge function is a 1D spline map, and trainable knots facilitate direct visualization of function adaptivity and learning strategy. Extensions such as group-shared splines, regularizers for interpretability and sparsity, and alternative kernelizations (Fourier, Rational, RBF) enable broad applicability and further hybridization (Zheng et al., 16 Jan 2025, Liu et al., 2024).

A plausible implication is that FR-KANs, by tightly coupling grid adaptation to geometric features of the target, offer a principled path for scientific machine learning applications demanding accuracy, efficiency, and transparency in low- and moderate-dimensional settings.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Free-Knots Kolmogorov-Arnold Networks (FR-KAN).