Free-Knots Kolmogorov-Arnold Networks
- Free-Knots Kolmogorov-Arnold Networks (FR-KAN) are neural architectures that turn fixed B-spline grids into adaptive, trainable knot placements based on data-driven curvature.
- They employ curvature-based importance density functions to allocate knots dynamically, enhancing expressivity and statistical efficiency while maintaining training stability.
- FR-KAN demonstrates state-of-the-art performance in diverse applications such as symbolic regression, scientific machine learning, and image classification, with improved interpretability and efficiency.
Free-Knots Kolmogorov-Arnold Networks (FR-KAN) are a class of neural architectures derived from Kolmogorov-Arnold Networks (KAN), extending them by promoting the knot locations in the B-spline parameterization to trainable variables. This dynamic allocation of knot positions enables adaptive grid refinement according to the target function's geometric complexity, offering enhanced expressivity, statistical efficiency, and training stability compared to fixed-grid KAN variants. FR-KANs have demonstrated state-of-the-art empirical performance across symbolic regression, scientific machine learning, and diverse real-world data modalities (Rigas et al., 26 Jan 2026, Zheng et al., 16 Jan 2025, Liu et al., 2024).
1. Theoretical Foundation and Formulation
Kolmogorov-Arnold Networks are based on the Kolmogorov-Arnold representation theorem, which states that any continuous multivariate function can be expressed as a superposition of univariate functions: KAN instantiates each univariate map by a learnable B-spline of order over a fixed grid of knots, optionally augmented with a SiLU shortcut. In its free-knot generalization, FR-KAN, the knot vector for each input dimension is optimized jointly with spline coefficients.
A typical FR-KAN layer is parameterized as: where denotes the -th order- B-spline basis function with knot vector , and both the weights 0 and knots 1 are trainable parameters (Rigas et al., 26 Jan 2026, Zheng et al., 16 Jan 2025).
2. Knot Adaptation via Importance Density Functions
FR-KAN formulates knot allocation as a density estimation problem and introduces Importance Density Functions (IDF) to guide the knot distribution. Given a batch of samples 2, importance weights 3 are computed, typically using curvature-based surrogates. The empirical probability mass function for each input point is then 4.
To adapt to geometric complexity, the curvature-based IDF is defined as: 5 with a corresponding normalized density in the continuous limit: 6 where 7 is a small constant to avoid vanishing density (Rigas et al., 26 Jan 2026). The knot locations are optimized so the induced grid's empirical cumulative distribution (CDF) matches the quantiles of the IDF. The grid-matching loss is given by: 8 where 9 and 0 are uniform quantiles. The loss gradient is explicitly derived, providing efficient backpropagation for knot updates.
3. Optimization and Stability Mechanisms
The training objective for FR-KAN is augmented with a grid-matching term: 1 where 2 is the prediction loss (e.g., MSE, cross-entropy, or a physics-informed residual for PDEs) and 3 controls the adaptation strength (Rigas et al., 26 Jan 2026). Training optimizes both spline weights and knot positions: Adam is used for 4, and SGD with a smaller learning rate for 5 is empirically effective.
To ensure smoothness and mitigate oscillations, FR-KAN introduces a 6 continuity regularizer: 7 enforced via finite differences at grid points. A wide initialization range 8 with 9–0 is critical for avoiding NaN divergence and reducing grid clustering. Spline parameters are grouped among neurons to reduce parameter overhead and match standard MLPs for scalability (Zheng et al., 16 Jan 2025).
4. Empirical Performance and Comparison
FR-KANs have been evaluated on synthetic function fitting, regression tasks drawing from the Feynman equations, time series prediction, image classification (MNIST, CIFAR-10/100, STL-10), text (AG News), and multimodal datasets (AVMNIST, MIMIC-III) (Zheng et al., 16 Jan 2025).
Performance benchmarks highlight:
- On 10 synthetic functions, curvature-based FR-KAN reduces median relative 1 error by 25.3% (Wilcoxon 2) over input-density baseline.
- On 15 Feynman regression tasks, it improves error by 9.4% (3).
- With the 2D Helmholtz PDE, average relative error drop is 23.3% (e.g., 34.2%, 3.15%, 27.7%, 28.1% for different frequencies) (Rigas et al., 26 Jan 2026).
- On real datasets, FR-KAN matches or exceeds ReLU-MLP accuracy, attaining +3–5% improvement on CIFAR-100/STL-10 and halving RMSEs on Feynman regression tasks.
Ablation analyses demonstrate that free-knot allocation (versus fixed grid) increases the maximum number of knots per layer and thus expressive power, while 4 regularization cuts accuracy variance in half and enhances convergence reliability (Zheng et al., 16 Jan 2025). Stability with expanded interval grids (5) enables deeper networks to avoid instability and grid collapse.
5. Computational Complexity and Implementation
Curvature-based IDF estimation at each training step requires Hessian diagonal evaluations, which can be performed via automatic differentiation at 6 cost or finite differences. Quantile computation for grid updating requires 7 per knot group. Empirical wall-clock overhead is typically 5–15% above the input-density baseline, but this amortizes for deeper/wider tasks:
- Synthetic function: 10.8% overhead (15.47s817.13s)
- Feynman regression: 14.8% (15.36s917.62s)
- Helmholtz PDE: 5% (61.21s064.30s) (Rigas et al., 26 Jan 2026).
Grouping spline parameters across neurons allows the per-layer parameter count to remain near 1, comparable to standard MLPs, and below the original KAN's 2 scaling.
6. Theoretical Properties and Scalability
KANs, and by extension FR-KANs, admit formal universal approximation guarantees. Theoretically, for a KAN with splines of degree 3 and 4 grid intervals, the 5-norm approximation error scales as: 6 and empirically, loss scaling for KANs is 7, outperforming ReLU MLPs which saturate at 8 (Liu et al., 2024). FR-KAN's knot adaptivity further concentrates resolution where needed, suggesting improved parameter efficiency. Spline-knot bounds are theoretically established, with the maximum knot count per layer raised from 9 to 0 by free-knot mechanisms (Zheng et al., 16 Jan 2025). 1 regularization and wide ranges ensure smooth, stable activations.
7. Interpretability and Extensions
FR-KANs remain interpretable: each edge function is a 1D spline map, and trainable knots facilitate direct visualization of function adaptivity and learning strategy. Extensions such as group-shared splines, regularizers for interpretability and sparsity, and alternative kernelizations (Fourier, Rational, RBF) enable broad applicability and further hybridization (Zheng et al., 16 Jan 2025, Liu et al., 2024).
A plausible implication is that FR-KANs, by tightly coupling grid adaptation to geometric features of the target, offer a principled path for scientific machine learning applications demanding accuracy, efficiency, and transparency in low- and moderate-dimensional settings.