- The paper proposes BSRBF-KAN, which integrates learnable B-splines and Gaussian RBFs as activation functions within Kolmogorov-Arnold Networks.
- The experiments on MNIST reveal that BSRBF-KAN achieves perfect training accuracy on best runs with competitive validation performance and enhanced stability compared to other models.
- The ablation study shows that removing key components such as base output or layer normalization severely degrades performance, underscoring their critical role.
This paper introduces BSRBF-KAN (B-splines and Radial Basis Functions in Kolmogorov-Arnold Networks), a novel architecture that combines B-splines and Radial Basis Functions (RBFs), specifically Gaussian RBFs, as the learnable activation functions within a Kolmogorov-Arnold Network (KAN) structure.
The core idea behind KANs is inspired by the Kolmogorov-Arnold Representation Theorem (KART), which states that any multivariate continuous function can be represented as a sum of compositions of single-variable functions and additions. Unlike traditional MLPs that use fixed activation functions at nodes, KANs use learnable, univariate functions on the edges connecting nodes. The BSRBF-KAN leverages this paradigm by designing these edge functions ϕ(x) as a combination of a base output b(x) and a sum of B-spline and RBF components, modulated by weight matrices wb and ws:
ϕ(x)=wbb(x)+ws(ϕBS(x)+ϕRBF(x))
Here, b(x) is a base function (like silu(x)
used in the original KAN), ϕBS(x) is a function represented by a linear combination of B-splines, and ϕRBF(x) is a function represented by a summation of Gaussian Radial Basis Functions. The combination aims to leverage the strengths of both B-splines (smoothness, local control) and RBFs (approximation capabilities) for approximating the required univariate functions in the KAN structure. Gaussian RBFs are chosen due to their popularity and form e−ϵr2, where r is related to the input distance from a center, often simplified to exp(−21(hx−c)2) in KAN contexts.
The paper implements BSRBF-KAN similarly to the original KAN and EfficientKAN, using a residual connection where the activation function is the sum of a base function output and a component learned via B-splines and RBFs. Learnable scales for activation functions and Kaiming uniform initialization are adopted from EfficientKAN. Layer normalization is also employed, following the FastKAN approach, to keep inputs within a suitable domain for the basis functions.
Experiments were conducted on the MNIST dataset to evaluate BSRBF-KAN against other popular KAN variants (FastKAN, FasterKAN, EfficientKAN, GottliebKAN) and a standard MLP. All models except GottliebKAN used a (784, 64, 10) structure (input, hidden, output layers), while GottliebKAN used (784, 64, 64, 10) due to its potentially shallower effective depth. Common hyperparameters like batch size (64), learning rate (1e-3), weight decay (1e-4), AdamW optimizer, and CrossEntropy loss were used. Each model was trained 5 times to assess stability.
Key findings from the experiments:
- Accuracy: GottliebKAN achieved the highest best validation accuracy (97.78%), followed closely by MLP (97.69%) and BSRBF-KAN (97.63%) across 5 runs. In terms of average validation accuracy over 5 runs, MLP was slightly better (97.62%) than BSRBF-KAN (97.55%), but BSRBF-KAN showed perfect training accuracy (100%) on average and best runs, indicating strong fitting capacity.
- Stability: BSRBF-KAN and MLP demonstrated greater stability across the 5 training runs compared to other KANs, particularly GottliebKAN, which showed higher variance in average metrics despite having the best single-run performance.
- Convergence: BSRBF-KAN showed a sharper decrease in training loss and initially in validation loss compared to other models, suggesting good convergence properties, although GottliebKAN ultimately achieved the lowest validation loss.
- Training Time: BSRBF-KAN had a longer training time (avg 231s) compared to FastKAN (avg 101s), FasterKAN (avg 93s), and EfficientKAN (avg 120s) but was comparable to MLP (avg 181s) and GottliebKAN (avg 221s). This aligns with the expectation that combining functions might increase computational cost.
An ablation paper further investigated the contribution of BSRBF-KAN's components. Removing either RBFs or B-splines individually had a minor negative impact on performance. Removing both B-splines and RBFs resulted in a model resembling an MLP structure (relative to the BSRBF-KAN framework), which surprisingly yielded slightly better average validation accuracy and F1 score than the full BSRBF-KAN model, although the full BSRBF-KAN showed better convergence behavior. Crucially, removing the base output or layer normalization significantly degraded performance, highlighting their essential role in the network's effectiveness. Removing both base output and layer normalization resulted in the worst performance metrics.
The authors conclude that BSRBF-KAN enhances stability and convergence compared to single-function KANs, though it requires more training time. The experiments on MNIST showed competitive performance with MLP and other KANs, but the inherent benefits of KANs over MLP (like interpretability, although not evaluated in this paper) were not clearly demonstrated solely based on accuracy and training speed on this specific dataset. The combination of basis functions is presented as a promising direction for future KAN research.