FC-KAN: Function Combinations in Kolmogorov-Arnold Networks

Published 3 Sep 2024 in cs.LG and cs.CL | (2409.01763v3)

Abstract: In this paper, we introduce FC-KAN, a Kolmogorov-Arnold Network (KAN) that leverages combinations of popular mathematical functions such as B-splines, wavelets, and radial basis functions on low-dimensional data through element-wise operations. We explore several methods for combining the outputs of these functions, including sum, element-wise product, the addition of sum and element-wise product, representations of quadratic and cubic functions, concatenation, linear transformation of the concatenated output, and others. In our experiments, we compare FC-KAN with a multi-layer perceptron network (MLP) and other existing KANs, such as BSRBF-KAN, EfficientKAN, FastKAN, and FasterKAN, on the MNIST and Fashion-MNIST datasets. Two variants of FC-KAN, which use a combination of outputs from B-splines and Difference of Gaussians (DoG) and from B-splines and linear transformations in the form of a quadratic function, outperformed overall other models on the average of 5 independent training runs. We expect that FC-KAN can leverage function combinations to design future KANs. Our repository is publicly available at: https://github.com/hoangthangta/FC_KAN.

Abstract PDF Upgrade to Chat

References (31)

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that combining mathematical functions in Kolmogorov-Arnold Networks significantly improves performance on low-dimensional data.
It details a novel architecture employing element-wise summation, concatenation, and quadratic operations to optimize data representation.
Benchmarking on MNIST and Fashion-MNIST shows that FC-KAN outperforms conventional KANs and MLPs through diverse function combinations.

FC-KAN: Function Combinations in Kolmogorov-Arnold Networks

Introduction

FC-KAN proposes leveraging function combinations within the Kolmogorov-Arnold Networks (KANs) framework to enhance performance in low-dimensional data applications. By implementing neural network architectures that integrate B-splines, wavelets, and radial basis functions through element-wise operations, FC-KAN differentiates itself from conventional techniques that rely on single-function usage. This approach amplifies KAN's effectiveness, as demonstrated through comparative evaluations against standard models like MLPs and various KAN variants on popular datasets such as MNIST and Fashion-MNIST.

Kolmogorov-Arnold Representation in Neural Networks

Kolmogorov's representation theorem underpins KAN's architecture, enabling any multivariable continuous function to be expressed through univariate function aggregation. KANs extend this theorem by substituting learnable function matrices for fixed activation functions in MLPs, resulting in architectures optimized for function combinations in the input domain. The general theory provides the scaffolding for designing deeper and wider networks, thereby aligning network capabilities with complex problem spaces.

Figure 1: Left: The structure of KAN(2,3,1). Right: The simulation of how to calculate $\phi_{1,1,1}$ .

Function Combinations in FC-KAN

FC-KAN's core contribution is the synthesis of individual low-dimensional data outputs using combinations of mathematical function outputs. This framework employs both standard mathematical functions and innovative constructs such as B-splines and wavelets in the form of Difference of Gaussians (DoG). The architecture of FC-KAN includes operational mechanisms for element-wise summation, concatenation, and linearization at the output, which optimizes data representation based on task-specific requirements.

Figure 2: The structure of FC-KAN and the three types of combined outputs: element-wise, concatenation, and linearization.

Experimental Evaluation

In comprehensive benchmarking, FC-KAN demonstrated superior capability by outperforming both traditional KANs and MLPs. Comparisons conducted on the MNIST and Fashion-MNIST datasets underline the model's efficacy, with FC-KAN leveraging combined function outputs to yield higher validation accuracies. For instance, when combining DoG and B-splines with a quadratic function at the output, FC-KAN achieved notable accuracy gains.

Figure 3: The logarithmic values of training losses for the models over 25 epochs on MNIST and 35 epochs on Fashion-MNIST.

Combination Methodologies

The study investigated several output combination techniques, including element-wise summation, product operations, and quadratic functions. Element-wise combinations exemplified superior performance, indicating that they encapsulate more data features compared to alternatives. Quadratic function representation consistently resulted in the highest validation accuracies, albeit with increased computational demands. The same principle holds true when employing cubic functions, although gains in accuracy plateaued relative to quadratic mechanisms.

Figure 4: Various data combinations are performed using element-wise operations (additions $+$ and multiplications $\odot$ ) over two given outputs.

Empirical Insights and Performance Implications

Empirical analysis suggests that FC-KAN requires strategic selection of function combinations to maximize performance gains. Basic linear combinations serve as a baseline, whereas more complex operations like addition with products or higher-degree models provide nuanced improvements. Nevertheless, the scalability of these operations is inherently dependent on data dimensionality, necessitating computational trade-offs.

Figure 5: The validation accuracy values of the models across various data subsets.

Conclusion

FC-KAN successfully advances the application of function combinations in KAN architectures, achieving significant performance improvements across standard image classification tasks. The research emphasizes the need for targeted combination strategies to harness KAN's theoretical properties effectively. Future research directions should consider dynamic adaptation techniques for function selection and combinations to further optimize neural architecture designs. By doing so, practical implementation of function combinations in neural network models can be further refined for broader applications.

Markdown