Activation Space Selectable Kolmogorov-Arnold Networks

Published 15 Aug 2024 in cs.LG | (2408.08338v1)

Abstract: The multilayer perceptron (MLP), a fundamental paradigm in current artificial intelligence, is widely applied in fields such as computer vision and natural language processing. However, the recently proposed Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to MLPs with significantly fewer parameters. Despite this potential, the use of a single activation function space results in reduced performance of KAN and related works across different tasks. To address this issue, we propose an activation space Selectable KAN (S-KAN). S-KAN employs an adaptive strategy to choose the possible activation mode for data at each feedforward KAN node. Our approach outperforms baseline methods in seven representative function fitting tasks and significantly surpasses MLP methods with the same level of parameters. Furthermore, we extend the structure of S-KAN and propose an activation space selectable Convolutional KAN (S-ConvKAN), which achieves leading results on four general image classification datasets. Our method mitigates the performance variability of the original KAN across different tasks and demonstrates through extensive experiments that feedforward KANs with selectable activations can achieve or even exceed the performance of MLP-based methods. This work contributes to the understanding of the data-centric design of new AI paradigms and provides a foundational reference for innovations in KAN-based network architectures.

Abstract PDF HTML Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper introduces S-KANs to adaptively select activation functions, thereby enhancing performance in both function fitting and image classification tasks.
It employs a three-phase training strategy—full training, selective training, and pruning—to optimize activation function weights and reduce computational complexity.
S-ConvKAN extends the approach to CNNs by integrating dynamic activation selection for robust image feature extraction, achieving lower error rates on benchmark datasets.

Activation Space Selectable Kolmogorov-Arnold Networks

Introduction

Kolmogorov-Arnold Networks (KANs) represent a compelling alternative to traditional neural architectures, specifically Multilayer Perceptrons (MLPs), by leveraging the nonlinear additive connections principle derived from the Kolmogorov-Arnold theorem. The framework posits KANs as capable of matching or exceeding MLP performance with fewer parameters, thus tackling the constraints of computational efficiency and complexity in artificial intelligence systems. However, employing a singular activation function limits the adaptability of KANs across diverse tasks. This paper introduces Activation Space Selectable KANs (S-KANs) to provide an adaptive mechanism for activation function selection, thereby enhancing the performance variability across various tasks and applications.

Methodology

Selectable Activation Spaces

The proposed S-KAN architecture reframes the traditional paradigms by incorporating a selectable pool of activation functions. This pool, containing various nonlinear function fitting strategies such as B-spline, Chebyshev polynomials, and wavelet transforms, enables a dynamic selection process across each node within the KAN. The selection process embodies an adaptive strategy, governed by weighted combinations and pruning methodologies, to identify optimal activation functions that minimize fitting errors and improve generalization capabilities.

S-KAN Training Strategy

The training of S-KAN follows a distinct three-phase methodology—full training, selective training, and pruning. Initially, the model parameters undergo comprehensive training to establish a baseline performance standard. Subsequent selective training focuses on refining activation function weights, prioritizing those most beneficial for task-specific accuracy enhancements. The final pruning phase actively eliminates lesser performant activation functions, consolidating the network into a streamlined structure optimized for computational efficiency and fitting precision.

S-ConvKAN for Image Classification

Building upon the advantages of S-KAN, the paper proposes an extension to convolutional neural networks, termed S-ConvKAN. This structure integrates the adaptability of activation function selections within convolutional filters, enabling robust image feature extraction capabilities superior to traditional CNN models. By replacing linear computations with nonlinear mappings through selectable activation spaces in the convolution process, S-ConvKAN aims to outperform existing models in general image classification tasks.

Experimental Results

Function Fitting

S-KAN exhibits significant improvement over existing KAN and MLP architectures in function fitting tasks, evidenced by the lower mean squared error in several benchmark scenarios. The evaluation spans a range of intricate fitting functions, underscoring S-KAN's capability for robust nonlinear adaptability and superior precision.

Image Classification

In computer vision applications, S-ConvKAN surpasses parallel CNN architectures on datasets including MNIST, Fashion MNIST, CIFAR-10, and CIFAR-100. The adaptive selection mechanism implemented within S-ConvKAN ensures optimal feature representation and classification performance, demonstrating its applicability as a potent feature extraction paradigm across variable datasets.

Conclusion

The Activation Space Selectable Kolmogorov-Arnold Networks offers a compelling advance in the design and deployment of AI systems, pivoting from fixed activation patterns to dynamically adaptable strategies. This paper illustrates the versatility and efficiency gains attainable through selective activation schemes, facilitating enhanced performance across function fitting and image classification domains. Future work will concentrate on further optimizing computational efficiency, expanding selection mechanisms to hidden layers, and exploring parallel processing capabilities to accelerate KAN-based designs.