Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression (2406.11354v2)
Abstract: Humans can retain old knowledge while learning new information, but LLMs often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal LLMs (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.
- Zilun Zhang (12 papers)
- Yutao Sun (18 papers)
- Tiancheng Zhao (48 papers)
- Leigang Sha (3 papers)
- Ruochen Xu (35 papers)
- Kyusong Lee (16 papers)
- Jianwei Yin (71 papers)