Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning (2402.18865v1)

Published 29 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Existing research has shown that LLMs exhibit remarkable performance in language understanding and generation. However, when LLMs are continuously fine-tuned on complex and diverse domain-specific downstream tasks, the inference performance on historical tasks decreases dramatically, which is known as a catastrophic forgetting problem. A trade-off needs to be kept between learning plasticity and memory stability. Plenty of existing works have explored strategies like memory replay, regularization and parameter isolation, but little is known about the geometric connection of various adjacent minima in the continual LLMs fine-tuning scenarios. In this work, we investigate the geometric connections of different minima through the lens of mode connectivity, which means different minima can be connected by a low-loss valley. Through extensive experiments, we uncover the mode connectivity phenomenon in the LLMs continual learning scenario and find that it can strike a balance between plasticity and stability. Building upon these findings, we propose a simple yet effective method called Interpolation-based LoRA (I-LoRA), which constructs a dual-memory experience replay framework based on LoRA parameter interpolations. Extensive experiments and analysis on eight domain-specific CL benchmarks demonstrate that I-LoRA consistently show significant improvement over the previous state-of-the-art approaches with up to $11\%$ performance gains, providing a strong baseline and insights for future research on the LLM continual learning problem. Our code is available at \url{https://github.com/which47/LLMCL}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Piqa: Reasoning about physical commonsense in natural language.
  2. Dark experience for general continual learning: a strong, simple baseline.
  3. Efficient lifelong learning with a-GEM. In International Conference on Learning Representations.
  4. On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486.
  5. OpenCompass Contributors. 2023. Opencompass: A universal evaluation platform for foundation models. https://github.com/open-compass/opencompass.
  6. Continual learning beyond a single model. In Conference on Lifelong Learning Agents, pages 961–991. PMLR.
  7. The role of permutation invariance in linear mode connectivity of neural networks. arXiv preprint arXiv:2110.06296.
  8. Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning, pages 3259–3269. PMLR.
  9. A unified continual learning framework with general parameter-efficient tuning. arXiv preprint arXiv:2303.10070.
  10. Loss surfaces, mode connectivity, and fast ensembling of dnns. Advances in neural information processing systems, 31.
  11. Continual learning via neural pruning. Neurips.
  12. Measuring massive multitask language understanding.
  13. Lora: Low-rank adaptation of large language models.
  14. Meetingbank: A benchmark dataset for meeting summarization.
  15. Forget-free continual learning with winning subnetworks. In International Conference on Machine Learning, pages 10734–10750. PMLR.
  16. 20 minuten: A multi-task news summarisation dataset for german.
  17. Introducing language guidance in prompt-based continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11463–11473.
  18. Achieving a better stability-plasticity trade-off via auxiliary networks in continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11930–11939.
  19. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
  20. Generative models from the perspective of continual learning. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
  21. Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947.
  22. Learn to explain: Multimodal reasoning via thought chains for science question answering.
  23. Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495.
  24. Numglue: A suite of fundamental yet challenging mathematical reasoning tasks. arXiv preprint arXiv:2204.05660.
  25. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Proceedings of the Conference on Health, Inference, and Learning, volume 174 of Proceedings of Machine Learning Research, pages 248–260. PMLR.
  26. Franco Pellegrini and Giulio Biroli. 2022. Neural network pruning denoises the features and makes local connectivity emerge in visual tasks. In International Conference on Machine Learning, pages 17601–17626. PMLR.
  27. Exploring mode connectivity for pre-trained language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6726–6746.
  28. Progressive prompts: Continual learning for language models. In The Eleventh International Conference on Learning Representations.
  29. Experience replay for continual learning. Advances in Neural Information Processing Systems, 32.
  30. Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762.
  31. Trillion dollar words: A new financial dataset, task & market analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6664–6679, Toronto, Canada. Association for Computational Linguistics.
  32. Challenging big-bench tasks and whether chain-of-thought can solve them.
  33. A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv:2302.00487.
  34. Trace: A comprehensive benchmark for continual learning in large language models. arXiv preprint arXiv:2310.06762.
  35. Learning to prompt for continual learning.
  36. Optimizing mode connectivity for class incremental learning. In International Conference on Machine Learning, pages 36940–36957. PMLR.
  37. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  38. C-stance: A large dataset for chinese zero-shot stance detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13369–13385.
  39. Bridging mode connectivity in loss landscapes and adversarial robustness. arXiv preprint arXiv:2005.00060.
  40. Jec-qa: A legal-domain question answering dataset. In Proceedings of AAAI.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Weijieying Ren (11 papers)
  2. Xinlong Li (3 papers)
  3. Lei Wang (975 papers)
  4. Tianxiang Zhao (26 papers)
  5. Wei Qin (68 papers)
Citations (20)