Modality-Inconsistent Continual Learning of Multimodal Large Language Models (2412.13050v1)

Published 17 Dec 2024 in cs.LG, cs.AI, cs.CL, cs.CV, cs.SD, and eess.AS

Abstract: In this paper, we introduce Modality-Inconsistent Continual Learning (MICL), a new continual learning scenario for Multimodal LLMs (MLLMs) that involves tasks with inconsistent modalities (image, audio, or video) and varying task types (captioning or question-answering). Unlike existing vision-only or modality-incremental settings, MICL combines modality and task type shifts, both of which drive catastrophic forgetting. To address these challenges, we propose MoInCL, which employs a Pseudo Targets Generation Module to mitigate forgetting caused by task type shifts in previously seen modalities. It also incorporates Instruction-based Knowledge Distillation to preserve the model's ability to handle previously learned modalities when new ones are introduced. We benchmark MICL using a total of six tasks and conduct experiments to validate the effectiveness of our proposed MoInCL. The experimental results highlight the superiority of MoInCL, showing significant improvements over representative and state-of-the-art continual learning baselines.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (5)

Tweets

https://twitter.com/YukinaLoveLisa/status/1880605177373499752

Modality-Inconsistent Continual Learning of Multimodal Large Language Models (2412.13050v1)

Summary

Follow-up Questions

Related Papers

Authors (5)

Tweets