Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models (2401.08295v3)

Published 16 Jan 2024 in cs.CL

Abstract: The continual learning (CL) ability is vital for deploying LLMs in the dynamic world. Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input, aiming at handling the challenges of catastrophic forgetting and knowledge transfer in CL. However, these methods tend to address only one of the challenges, ignoring the potential of aligning the two modules to effectively address catastrophic forgetting and knowledge transfer simultaneously. To this end, we propose a novel Shared Attention Framework (SAPT), to align the PET learning and selection via the Shared Attentive Learning & Selection module. Extensive Experiments on two CL benchmarks demonstrate the superiority of SAPT. Moreover, SAPT consistently demonstrates its superiority when we scale it to different model sizes (from 770M to 13B), different model architectures (T5 and LLaMA-2) and unseen tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Weixiang Zhao (21 papers)
  2. Shilong Wang (20 papers)
  3. Yulin Hu (37 papers)
  4. Yanyan Zhao (39 papers)
  5. Bing Qin (186 papers)
  6. Xuanyu Zhang (34 papers)
  7. Qing Yang (138 papers)
  8. Dongliang Xu (19 papers)
  9. Wanxiang Che (152 papers)
Citations (4)