Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression (2406.11354v2)

Published 17 Jun 2024 in cs.CL, cs.AI, and cs.CV

Abstract: Humans can retain old knowledge while learning new information, but LLMs often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal LLMs (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zilun Zhang (12 papers)
  2. Yutao Sun (18 papers)
  3. Tiancheng Zhao (48 papers)
  4. Leigang Sha (3 papers)
  5. Ruochen Xu (35 papers)
  6. Kyusong Lee (16 papers)
  7. Jianwei Yin (71 papers)