Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs (2405.17830v2)

Published 28 May 2024 in cs.CL

Abstract: The performance on general tasks decreases after LLMs are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and domain knowledge within a single instance. The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks. Taking legal domain as an example, we carefully design three groups of training and testing tasks without lacking practicability, and construct the corresponding datasets. To better incorporate general capabilities across domain-specific scenarios, we introduce ALoRA, which utilizes a multi-head attention module upon LoRA, facilitating direct information transfer from preceding tokens to the current one. This enhancement permits the representation to dynamically switch between domain-specific knowledge and general competencies according to the attention. Extensive experiments are conducted on the proposed tasks. The results exhibit the significance of our setting, and the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Llemma: An open language model for mathematics.
  2. Qwen technical report.
  3. Language models are few-shot learners.
  4. Codetf: One-stop transformer library for state-of-the-art code llm.
  5. Evaluating large language models trained on code.
  6. Continual pre-training mitigates forgetting in language and vision.
  7. Qlora: Efficient finetuning of quantized llms.
  8. Mixture-of-domain-adapters: Decoupling and injecting domain knowledge to pre-trained language models memories.
  9. Lora: Low-rank adaptation of large language models.
  10. Lorahub: Efficient cross-task generalization via dynamic lora composition.
  11. Mathprompter: Mathematical reasoning using large language models.
  12. Mistral 7b.
  13. Understanding catastrophic forgetting and remembering in continual learning with optimal relevance mapping.
  14. Continual pre-training of language models.
  15. The power of scale for parameter-efficient prompt tuning.
  16. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning.
  17. Improving large language model fine-tuning for solving math problems.
  18. An empirical study of catastrophic forgetting in large language models during continual fine-tuning.
  19. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  20. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  21. Adapterfusion: Non-destructive task composition for transfer learning.
  22. Chengwei Qin and Shafiq Joty. 2022. Lfpt5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5.
  23. Code llama: Open foundation models for code.
  24. Conpet: Continual parameter-efficient tuning for large language models. CoRR, abs/2309.14763.
  25. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  26. Llama: Open and efficient foundation language models.
  27. Llama 2: Open foundation and fine-tuned chat models.
  28. Huatuo: Tuning llama model with chinese medical knowledge.
  29. Orthogonal subspace learning for language model continual learning.
  30. Robust fine-tuning of zero-shot models.
  31. Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097.
  32. Baichuan 2: Open large-scale language models.
  33. YangMu Yu. 2023. Cornucopia-llama-fin-chinese.
  34. GLM-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations (ICLR).
  35. Investigating the catastrophic forgetting in multimodal large language models.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets