Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
The paper "CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning" introduces a novel approach to tackle Class-Incremental Learning (CIL) with pre-trained models. The authors propose a dual-adapter architecture named CL-LoRA, integrating the strengths of both task-shared and task-specific adapters. This paper concerns itself with the challenges presented by CIL—specifically, catastrophic forgetting—and aims to provide solutions within the framework of parameter-efficient tuning.
Problem Statement
Class-Incremental Learning, where new classes are learned sequentially, poses significant challenges to maintaining performance on previously learned classes. Traditional methods often rely on storing past data for rehearsal, but this is impractical due to privacy and storage issues. Pre-trained models combined with Parameter-Efficient Fine-Tuning (PEFT) methods have shown promise, but existing adapter-based approaches suffer from parameter redundancy and limited knowledge sharing across tasks.
Contributions
The paper introduces a Continual Low-Rank Adaptation (CL-LoRA) framework with the following contributions:
- Dual-Adapter Architecture: A combination of task-shared adapters for cross-task knowledge retention and task-specific adapters for learning unique features of new tasks. The shared adapters utilize random orthogonal matrices, and block-wise weights are introduced for task-specific adapters with orthogonal constraints to mitigate inter-task interference.
- Knowledge Distillation and Gradient Reassignment: Knowledge distillation is applied with an early exit strategy at the transition point between shared and specific adapters. Further, gradient reassignment leverages the L2 norm of weight vectors for task-shared adapters to enhance reliable knowledge transfer.
- Efficient and Scalable Approach: The methodological design maintains reduced training and inference computation, establishing a more efficient paradigm for continual learning with pre-trained models, overcoming limitations of existing methods.
Results and Analysis
Experiments were conducted on multiple benchmarks including CIFAR-100, ImageNet-R, and VTAB. Results show CL-LoRA achieving high final accuracy and average accuracy while significantly reducing the number of trainable parameters compared to existing state-of-the-art methods. Particularly in challenging scenarios like ImageNet-R and ImageNet-A, CL-LoRA demonstrated robust performance improvements, clearly indicating its capability to manage distribution shifts effectively.
Implications and Future Directions
The proposed CL-LoRA framework provides practical improvements in rehearsal-free continual learning scenarios and highlights the importance of shared knowledge retention and specific task adaptation. The findings in this paper suggest that leveraging shared knowledge can reduce reliance on task identity during inference, potentially paving the way for advances in online and blurry task boundary learning.
Future work could explore various configurations of LoRA, including adapting different aspects of MHSA layers and further optimizing the position of shared adapters in transformer blocks. Moreover, developing methods to dynamically adjust the splitting position between shared and task-specific adapters based on task complexity and data characteristics remains an open challenge. This paper lays a valuable foundation for these investigations and establishes a pathway for scalable, efficient continual learning with pre-trained models.