MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning (2412.09441v2)

Published 12 Dec 2024 in cs.LG and cs.CV

Abstract: Class-Incremental Learning (CIL) requires models to continually acquire knowledge of new classes without forgetting old ones. Despite Pre-trained Models (PTMs) have shown excellent performance in CIL, catastrophic forgetting still occurs as the model learns new concepts. Existing work seeks to utilize lightweight components to adjust the PTM, while the forgetting phenomenon still comes from {\em parameter and retrieval} levels. Specifically, iterative updates of the model result in parameter drift, while mistakenly retrieving irrelevant modules leads to the mismatch during inference. To this end, we propose MOdel Surgery (MOS) to rescue the model from forgetting previous knowledge. By training task-specific adapters, we continually adjust the PTM to downstream tasks. To mitigate parameter-level forgetting, we present an adapter merging approach to learn task-specific adapters, which aims to bridge the gap between different components while reserve task-specific information. Besides, to address retrieval-level forgetting, we introduce a training-free self-refined adapter retrieval mechanism during inference, which leverages the model's inherent ability for better adapter retrieval. By jointly rectifying the model with those steps, MOS can robustly resist catastrophic forgetting in the learning process. Extensive experiments on seven benchmark datasets validate MOS's state-of-the-art performance. Code is available at: https://github.com/sun-hailong/AAAI25-MOS

Summary

The paper introduces an adapter merging strategy using EMA to mitigate parameter-level forgetting in incremental tasks.
The paper proposes a training-free self-refined adapter retrieval mechanism that efficiently corrects retrieval errors during inference.
The paper presents a two-stage ensemble approach that balances stability and plasticity, achieving superior performance on multiple benchmark datasets.

Overview of "MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning"

The paper "MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning" presents an innovative approach to address the persistent challenge of catastrophic forgetting in Class-Incremental Learning (CIL). In CIL, models are required to learn new classes incrementally while retaining the knowledge of previously learned classes. This research leverages pre-trained models (PTMs) and proposes a technique called MOdel Surgery (MOS) to enhance the model's ability to resist forgetting, a phenomenon that occurs at both parameter and retrieval levels.

Key Contributions

The authors present several key contributions to the field:

Adapter Merging Strategy: The paper introduces a novel method to mitigate parameter-level forgetting through an adapter merging strategy based on Exponential Moving Average (EMA). This approach facilitates the retention of task-specific information while ensuring sufficient alignment across different tasks. By adapting previously learned task-specific adapters with the current task knowledge, the model can bridge the gap between tasks effectively, thereby managing parameter drift.
Self-Refined Adapter Retrieval Mechanism: To efficiently address retrieval-level forgetting, a training-free self-refined adapter retrieval mechanism is proposed. This allows the model to autonomously refine and correct its adapter retrieval during inference without any additional computational overhead. This strategy is crucial in eliminating performance decay caused by mistaken retrieval of irrelevant modules.
Two-Stage Model Ensemble: To balance stability and plasticity, MOS implements a two-stage model ensemble. This innovation exploits the benefits of both rapid pattern recognition from the initial incremental task and deep processing from progressively merged adapters. This ensemble approach ensures that the model maintains strong generalization capabilities while remaining adaptable to new information.

Experimental Validation

The paper provides extensive experimental results to validate the efficacy of MOS. The authors conduct experiments on seven benchmark datasets using various backbone weights, demonstrating that MOS consistently outperforms existing state-of-the-art methods in CIL. The numerical results highlight significant improvements in both average and final task performance, showcasing the robustness of the proposed approach across a variety of challenging scenarios.

Implications and Future Directions

The implications of this research are manifold. Practically, MOS offers a scalable and efficient solution for deploying robust CIL systems using pre-trained models, which are increasingly common in many artificial intelligence applications. Theoretically, this work adds to the growing discourse on balancing generalization and adaptation in machine learning systems, illuminating pathways for future research in domain-specific adapter design and retrieval mechanisms.

Looking ahead, the techniques introduced in this paper may inspire further explorations into few-shot class-incremental learning and its applicability to more diverse domains. There's also potential for developing more sophisticated adapter algorithms that could exploit additional contextual information from PTMs, further fine-tuning the model's ability to adapt to rapidly changing data distributions.

In summary, the MOS framework provides a promising direction for enhancing the performance of PTM-based CIL systems. Through the careful orchestration of adapter merging and retrieval, this research navigates the challenges of CIL by maintaining a fine balance between retaining old knowledge and acquiring new insights.

PDF Markdown

Related Papers

GitHub

GitHub - sun-hailong/AAAI25-MOS: The code repository for "MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning"(AAAI25) in PyTorch. (4 stars)