Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models (2402.12048v1)

Published 19 Feb 2024 in cs.CL

Abstract: Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal LLMs (MLLMs), where improving performance on unseen tasks often leads to a significant performance drop on the original tasks. This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor. Our method primarily preserves the pre-trained parameters while replacing a small number ($\leq$ 10\%) of fine-tuned parameters, maintaining $\sim$ 99\% effectiveness on original tasks versus pre-training, and achieving $\sim$ 97\% on new tasks compared to standard fine-tuning. Specifically, we derive a sparse mask to identify the "model patch", based on a fusion strategy that integrates salience and sensitivity analysis. Subsequently, a compensation mechanism is introduced to "decorate the patch", enhancing the model's performance on both target and original tasks. Additionally, our method is adaptable to multi-task scenarios. Through extensive experiments on InstructBLIP and LLaVA-1.5 in both image captioning and visual question answering tasks, our approach demonstrates significant task adaptability while preserving inherent pre-trained capabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Didi Zhu (19 papers)
  2. Zhongyi Sun (5 papers)
  3. Zexi Li (26 papers)
  4. Tao Shen (87 papers)
  5. Ke Yan (102 papers)
  6. Shouhong Ding (90 papers)
  7. Kun Kuang (114 papers)
  8. Chao Wu (137 papers)
Citations (13)