Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks

Published 10 Feb 2025 in cs.LG and cs.AI | (2502.06153v2)

Abstract: Kolmogorov--Arnold networks (KANs) have demonstrated their potential as an alternative to multi-layer perceptions (MLPs) in various domains, especially for science-related tasks. However, transfer learning of KANs remains a relatively unexplored area. In this paper, inspired by Tucker decomposition of tensors and evidence on the low tensor-rank structure in KAN parameter updates, we develop low tensor-rank adaptation (LoTRA) for fine-tuning KANs. We study the expressiveness of LoTRA based on Tucker decomposition approximations. Furthermore, we provide a theoretical analysis to select the learning rates for each LoTRA component to enable efficient training. Our analysis also shows that using identical learning rates across all components leads to inefficient training, highlighting the need for an adaptive learning rate strategy. Beyond theoretical insights, we explore the application of LoTRA for efficiently solving various partial differential equations (PDEs) by fine-tuning KANs. Additionally, we propose Slim KANs that incorporate the inherent low-tensor-rank properties of KAN parameter tensors to reduce model size while maintaining superior performance. Experimental results validate the efficacy of the proposed learning rate selection strategy and demonstrate the effectiveness of LoTRA for transfer learning of KANs in solving PDEs. Further evaluations on Slim KANs for function representation and image classification tasks highlight the expressiveness of LoTRA and the potential for parameter reduction through low tensor-rank decomposition.

Abstract PDF Upgrade to Chat

Summary

The paper introduces LoTRA, a method leveraging Tucker decomposition to adapt KANs efficiently by reducing parameters while maintaining model performance.
It demonstrates that low tensor-rank structures enable faster convergence and improved efficiency in physics-informed machine learning tasks, especially in solving PDEs.
Experimental findings reveal that slim KANs with LoTRA achieve significant compression ratios and enhanced generalization for function approximation and image classification.

Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks

Introduction

"Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks" presents a novel approach for fine-tuning Kolmogorov--Arnold Networks (KANs) by leveraging low tensor-rank adaptation (LoTRA). Inspired by Tucker decomposition and the observed low tensor-rank structures in KAN parameter updates, LoTRA aims to enhance the efficiency of training and reduce storage requirements while maintaining model expressiveness.

KANs have emerged as effective alternatives to traditional MLPs in diverse scientific domains, especially in physics-informed machine learning. They are based on the Kolmogorov--Arnold representation theorem, allowing representation of multivariate continuous functions through sums of univariate functions. The architecture of KANs surpasses traditional limitations of KART-based networks by offering deeper layers and arbitrary widths, enabling improved function approximation capabilities.

Despite their inherent advantages, transferring learning for KANs remains largely unexplored. This paper addresses this gap by introducing LoTRA to fine-tune KANs efficiently, demonstrating robust performance in solving partial differential equations (PDEs) and reducing model size through Slim KANs.

Low Tensor-Rank Adaptation

LoTRA leverages Tucker decomposition, a powerful technique for compressing third-order tensors into smaller core tensors augmented with transformation matrices. The method adapts KAN parameters through efficient updates, reducing computational load and storage requirements. The adaptation formula in LoTRA is inspired by evidence on low tensor-rank structures within KANs, providing for each KAN parameter tensor:

$\mathcal{A}_{\text{ft}} = \mathcal{A}_{\text{pt}} + \mathcal{G} \times_{1} U^{(1)} \times_{2} U^{(2)} \times_{3} U^{(3)}$

Applying LoTRA in physics-informed machine learning, the approach yields efficient training and storage across a class of PDEs with different parameters. By retaining shared information across tasks, LoTRA achieves faster convergence during fine-tuning, eliminating redundant learning.

Figure 1: Singular values for the pre-trained model parameters and fine-tuned updates, demonstrating the low Tucker rank structure.

Theoretical Insights

The paper provides a theoretical analysis of LoTRA's expressiveness and efficient training capabilities. Based on Tucker decomposition, the fine-tuned model’s ability to approximate target functions depends on the magnitude of omitted singular values in error tensors.

The theoretical framework proposes an optimal learning rate selection strategy for LoTRA, ensuring efficient gradient descent training. Using identical learning rate scales for all components leads to inefficiency, necessitating differentiated rates based on tensor ranks and component attributes.

Figure 2: Fine-tuning trajectories of Chebyshev KANs using LoTRA under four strategies of learning rate selection (denoted as LR-1" toLR-4") for solving elliptic equations.

Applications

Physics-Informed KANs

KANs excel in physics-informed machine learning, particularly in approximating PDE solutions. With LoTRA, KANs can efficiently handle variations in PDEs with different physical parameters by retaining shared patterns and adapting to task-specific information using low tensor-rank structures.

Experiments illustrate that KANs with LoTRA consistently match or surpass the performance of full fine-tuning models, demonstrating faster convergence and effective storage reduction through compression ratios.

Slim KANs

Slim KANs harness LoTRA’s low tensor-rank properties to achieve parameter-efficient models without requiring a pre-trained basis. The approach retains model expressiveness and mitigates overfitting through built-in regularization, yielding competitive performance in function representation and image classification tasks.

Figure 3: Visualization of the generated function by slim KAN models with different compression ratios for representing a trigonometric function. The figures illustrate that models with a larger compression ratio tend to overfit the noise, producing less smooth results. In contrast, models with smaller compression ratios generate smoother functions, reflecting better generalization.

Conclusion

LoTRA offers a transformative approach to fine-tuning KANs, enabling efficient training and significant parameter reduction while maintaining high performances across PDE solving tasks. The theoretical insights, coupled with robust experimental validations, position LoTRA as a promising method for scalable KAN applications in physics-informed machine learning and beyond. Future work may extend LoTRA’s theoretical foundations and explore integration into complex deep learning architectures for broader applications.