Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters (2403.11549v2)

Published 18 Mar 2024 in cs.CV

Abstract: Continual learning can empower vision-LLMs to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-LLMs. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-LLMs, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://github.com/JiazuoYu/MoE-Adapters4CL

PDF HTML Abstract

Summary of "Boosting Continual Learning of Vision-LLMs via Mixture-of-Experts Adapters"

The paper "Boosting Continual Learning of Vision-LLMs via Mixture-of-Experts Adapters" presents a parameter-efficient framework specifically designed to enhance the continual learning capabilities of large-scale vision-LLMs such as CLIP. The focus is on addressing the challenges of long-term forgetting and computational burdens typically associated with continual learning (CL) systems.

The authors introduce a novel architecture featuring Mixture-of-Experts (MoE) adapters, which facilitate the dynamic expansion of models to adapt to new tasks while preserving previously learned knowledge. The integration of MoE adapters serves to efficiently manage the adaptivity of the model to both seen and unseen data. A key component of this architecture is the Distribution Discriminative Auto-Selector (DDAS), which enables automatic task recognition, preserving zero-shot capabilities by routing in-distribution inputs to the MoE adapters and out-of-distribution inputs to the original CLIP model.

Key Contributions

Parameter-Efficient MoE-Adapters: By leveraging MoE structures with dynamic router mechanisms, the authors propose a training framework that reduces the parameter training burden by 60% compared to existing state-of-the-art methods. The training approach adopts a novel activate-freeze strategy to facilitate both intra-task learning and inter-task knowledge sharing among experts, marked by the systematic activation of specific experts based on task-related features.
Distribution Discriminative Auto-Selector (DDAS): The authors propose DDAS to automatically determine the task identity by predicting data distribution variations. This mechanism ensures effective routing of input data, either to exploit the fine-tuned expertise encapsulated in MoE adapters or to retain the zero-shot generalization abilities of the frozen CLIP model.
Extensive Evaluation: Empirical results provided indicate that the proposed method consistently surpasses prior state-of-the-art solutions across multiple continual learning benchmarks. Notably, the approach demonstrates robustness even under few-shot settings, significantly enhancing memory retention of past tasks and maintaining compelling zero-shot performance.

Implications and Future Prospects

This paper contributes significantly to the field of continual learning by demonstrating that appropriate architectural modifications can substantially enhance parameter efficiency and task adaptability in large-scale models. The incorporation of MoE-based dynamic adjustment strategies showcases a promising direction towards addressing both the catastrophic forgetting and generalization challenges inherent to lifelong learning settings.

From a theoretical perspective, this work exemplifies the potential of leveraging sparse expert models for balancing task-specific and generalizable knowledge in neural architectures. Practically, such advancements could be vital in deploying AI models across dynamic and resource-constrained environments, where adaptive learning and fast scalability are critical.

Looking forward, potential extensions of this work include investigating the impact of varying the number of experts, exploring alternative selection strategies in the MoE frameworks, and refining the automatic data distribution mechanisms to further streamline task recognition without manual threshold adjustments or identity references. Additionally, the approach's application to domains beyond vision-language tasks, such as NLP and reinforcement learning, could be explored to evaluate its adaptability and effectiveness in different data paradigms.

The framework presented in this paper creates pathways for efficient lifelong learning systems, promoting a shift towards more robust and scalable AI applications capable of seamlessly integrating new information over extended operational lifespans.

PDF Markdown Bookmark Chat (Pro)

References (80)

Authors (7)

Jiazuo Yu (3 papers)
Yunzhi Zhuge (17 papers)
Lu Zhang (373 papers)
Dong Wang (628 papers)
Huchuan Lu (199 papers)
You He (13 papers)
Ping Hu (49 papers)

Citations (39)

View on Semantic Scholar

GitHub

GitHub - JiazuoYu/MoE-Adapters4CL: Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024 (223 stars)

Tweets

https://twitter.com/carterleffen/status/1770240357621473588