Free-Knots Kolmogorov-Arnold Networks

Updated 20 April 2026

Free-Knots Kolmogorov-Arnold Networks (FR-KAN) are neural architectures that turn fixed B-spline grids into adaptive, trainable knot placements based on data-driven curvature.
They employ curvature-based importance density functions to allocate knots dynamically, enhancing expressivity and statistical efficiency while maintaining training stability.
FR-KAN demonstrates state-of-the-art performance in diverse applications such as symbolic regression, scientific machine learning, and image classification, with improved interpretability and efficiency.

Free-Knots Kolmogorov-Arnold Networks (FR-KAN) are a class of neural architectures derived from Kolmogorov-Arnold Networks (KAN), extending them by promoting the knot locations in the B-spline parameterization to trainable variables. This dynamic allocation of knot positions enables adaptive grid refinement according to the target function's geometric complexity, offering enhanced expressivity, statistical efficiency, and training stability compared to fixed-grid KAN variants. FR-KANs have demonstrated state-of-the-art empirical performance across symbolic regression, scientific machine learning, and diverse real-world data modalities (Rigas et al., 26 Jan 2026, Zheng et al., 16 Jan 2025, Liu et al., 2024).

1. Theoretical Foundation and Formulation

Kolmogorov-Arnold Networks are based on the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function $f:[0,1]^n\to\mathbb{R}$ can be expressed as a superposition of univariate functions: $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ KAN instantiates each univariate map $\Phi$ by a learnable B-spline of order $K$ over a fixed grid of knots, optionally augmented with a SiLU shortcut. In its free-knot generalization, FR-KAN, the knot vector $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ for each input dimension is optimized jointly with spline coefficients.

A typical FR-KAN layer is parameterized as: $f(x;\theta,\kappa)_n = \sum_{i=1}^{n_{\rm in}} \sum_{m=1}^{G+K} b_{n,i,m} B_m^{(K)}(x_i;\kappa_i) + r_{n,i} {\rm SiLU}(x_i)$ where $B_m^{(K)}(\cdot;\kappa_i)$ denotes the $m$ -th order- $K$ B-spline basis function with knot vector $\kappa_i$ , and both the weights $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 0 and knots $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 1 are trainable parameters (Rigas et al., 26 Jan 2026, Zheng et al., 16 Jan 2025).

2. Knot Adaptation via Importance Density Functions

FR-KAN formulates knot allocation as a density estimation problem and introduces Importance Density Functions (IDF) to guide the knot distribution. Given a batch of samples $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 2, importance weights $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 3 are computed, typically using curvature-based surrogates. The empirical probability mass function for each input point is then $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 4.

To adapt to geometric complexity, the curvature-based IDF is defined as: $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 5 with a corresponding normalized density in the continuous limit: $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 6 where $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 7 is a small constant to avoid vanishing density (Rigas et al., 26 Jan 2026). The knot locations are optimized so the induced grid's empirical cumulative distribution (CDF) matches the quantiles of the IDF. The grid-matching loss is given by: $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 8 where $f(x_1,\dots,x_n) = \sum_{q=0}^{2n} \Psi_q\left(\sum_{p=1}^n \Phi_{q,p}(x_p)\right)$ 9 and $\Phi$ 0 are uniform quantiles. The loss gradient is explicitly derived, providing efficient backpropagation for knot updates.

3. Optimization and Stability Mechanisms

The training objective for FR-KAN is augmented with a grid-matching term: $\Phi$ 1 where $\Phi$ 2 is the prediction loss (e.g., MSE, cross-entropy, or a physics-informed residual for PDEs) and $\Phi$ 3 controls the adaptation strength (Rigas et al., 26 Jan 2026). Training optimizes both spline weights and knot positions: Adam is used for $\Phi$ 4, and SGD with a smaller learning rate for $\Phi$ 5 is empirically effective.

To ensure smoothness and mitigate oscillations, FR-KAN introduces a $\Phi$ 6 continuity regularizer: $\Phi$ 7 enforced via finite differences at grid points. A wide initialization range $\Phi$ 8 with $\Phi$ 9– $K$ 0 is critical for avoiding NaN divergence and reducing grid clustering. Spline parameters are grouped among neurons to reduce parameter overhead and match standard MLPs for scalability (Zheng et al., 16 Jan 2025).

4. Empirical Performance and Comparison

FR-KANs have been evaluated on synthetic function fitting, regression tasks drawing from the Feynman equations, time series prediction, image classification (MNIST, CIFAR-10/100, STL-10), text (AG News), and multimodal datasets (AVMNIST, MIMIC-III) (Zheng et al., 16 Jan 2025).

Performance benchmarks highlight:

On 10 synthetic functions, curvature-based FR-KAN reduces median relative $K$ 1 error by 25.3% (Wilcoxon $K$ 2) over input-density baseline.
On 15 Feynman regression tasks, it improves error by 9.4% ( $K$ 3).
With the 2D Helmholtz PDE, average relative error drop is 23.3% (e.g., 34.2%, 3.15%, 27.7%, 28.1% for different frequencies) (Rigas et al., 26 Jan 2026).
On real datasets, FR-KAN matches or exceeds ReLU-MLP accuracy, attaining +3–5% improvement on CIFAR-100/STL-10 and halving RMSEs on Feynman regression tasks.

Ablation analyses demonstrate that free-knot allocation (versus fixed grid) increases the maximum number of knots per layer and thus expressive power, while $K$ 4 regularization cuts accuracy variance in half and enhances convergence reliability (Zheng et al., 16 Jan 2025). Stability with expanded interval grids ( $K$ 5) enables deeper networks to avoid instability and grid collapse.

5. Computational Complexity and Implementation

Curvature-based IDF estimation at each training step requires Hessian diagonal evaluations, which can be performed via automatic differentiation at $K$ 6 cost or finite differences. Quantile computation for grid updating requires $K$ 7 per knot group. Empirical wall-clock overhead is typically 5–15% above the input-density baseline, but this amortizes for deeper/wider tasks:

Synthetic function: 10.8% overhead (15.47s $K$ 817.13s)
Feynman regression: 14.8% (15.36s $K$ 917.62s)
Helmholtz PDE: 5% (61.21s $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 064.30s) (Rigas et al., 26 Jan 2026).

Grouping spline parameters across neurons allows the per-layer parameter count to remain near $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 1, comparable to standard MLPs, and below the original KAN's $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 2 scaling.

6. Theoretical Properties and Scalability

KANs, and by extension FR-KANs, admit formal universal approximation guarantees. Theoretically, for a KAN with splines of degree $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 3 and $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 4 grid intervals, the $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 5-norm approximation error scales as: $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 6 and empirically, loss scaling for KANs is $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 7, outperforming ReLU MLPs which saturate at $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 8 (Liu et al., 2024). FR-KAN's knot adaptivity further concentrates resolution where needed, suggesting improved parameter efficiency. Spline-knot bounds are theoretically established, with the maximum knot count per layer raised from $\kappa_i=(\kappa_{i,1},\dots,\kappa_{i,G+1})$ 9 to $f(x;\theta,\kappa)_n = \sum_{i=1}^{n_{\rm in}} \sum_{m=1}^{G+K} b_{n,i,m} B_m^{(K)}(x_i;\kappa_i) + r_{n,i} {\rm SiLU}(x_i)$ 0 by free-knot mechanisms (Zheng et al., 16 Jan 2025). $f(x;\theta,\kappa)_n = \sum_{i=1}^{n_{\rm in}} \sum_{m=1}^{G+K} b_{n,i,m} B_m^{(K)}(x_i;\kappa_i) + r_{n,i} {\rm SiLU}(x_i)$ 1 regularization and wide ranges ensure smooth, stable activations.

7. Interpretability and Extensions

FR-KANs remain interpretable: each edge function is a 1D spline map, and trainable knots facilitate direct visualization of function adaptivity and learning strategy. Extensions such as group-shared splines, regularizers for interpretability and sparsity, and alternative kernelizations (Fourier, Rational, RBF) enable broad applicability and further hybridization (Zheng et al., 16 Jan 2025, Liu et al., 2024).

A plausible implication is that FR-KANs, by tightly coupling grid adaptation to geometric features of the target, offer a principled path for scientific machine learning applications demanding accuracy, efficiency, and transparency in low- and moderate-dimensional settings.

Markdown Report Issue Upgrade to Chat

References (3)

A Dynamic Framework for Grid Adaptation in Kolmogorov-Arnold Networks (2026)

Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability (2025)

KAN: Kolmogorov-Arnold Networks (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Free-Knots Kolmogorov-Arnold Networks (FR-KAN).

Free-Knots Kolmogorov-Arnold Networks

1. Theoretical Foundation and Formulation

2. Knot Adaptation via Importance Density Functions

3. Optimization and Stability Mechanisms

4. Empirical Performance and Comparison

5. Computational Complexity and Implementation

6. Theoretical Properties and Scalability

7. Interpretability and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Free-Knots Kolmogorov-Arnold Networks

1. Theoretical Foundation and Formulation

2. Knot Adaptation via Importance Density Functions

3. Optimization and Stability Mechanisms

4. Empirical Performance and Comparison

5. Computational Complexity and Implementation

6. Theoretical Properties and Scalability

7. Interpretability and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research