More Than Catastrophic Forgetting: Integrating General Capabilities For Domain-Specific LLMs (2405.17830v2)
Abstract: The performance on general tasks decreases after LLMs are fine-tuned on domain-specific tasks, the phenomenon is known as Catastrophic Forgetting (CF). However, this paper presents a further challenge for real application of domain-specific LLMs beyond CF, called General Capabilities Integration (GCI), which necessitates the integration of both the general capabilities and domain knowledge within a single instance. The objective of GCI is not merely to retain previously acquired general capabilities alongside new domain knowledge, but to harmonize and utilize both sets of skills in a cohesive manner to enhance performance on domain-specific tasks. Taking legal domain as an example, we carefully design three groups of training and testing tasks without lacking practicability, and construct the corresponding datasets. To better incorporate general capabilities across domain-specific scenarios, we introduce ALoRA, which utilizes a multi-head attention module upon LoRA, facilitating direct information transfer from preceding tokens to the current one. This enhancement permits the representation to dynamically switch between domain-specific knowledge and general competencies according to the attention. Extensive experiments are conducted on the proposed tasks. The results exhibit the significance of our setting, and the effectiveness of our method.
- Llemma: An open language model for mathematics.
- Qwen technical report.
- Language models are few-shot learners.
- Codetf: One-stop transformer library for state-of-the-art code llm.
- Evaluating large language models trained on code.
- Continual pre-training mitigates forgetting in language and vision.
- Qlora: Efficient finetuning of quantized llms.
- Mixture-of-domain-adapters: Decoupling and injecting domain knowledge to pre-trained language models memories.
- Lora: Low-rank adaptation of large language models.
- Lorahub: Efficient cross-task generalization via dynamic lora composition.
- Mathprompter: Mathematical reasoning using large language models.
- Mistral 7b.
- Understanding catastrophic forgetting and remembering in continual learning with optimal relevance mapping.
- Continual pre-training of language models.
- The power of scale for parameter-efficient prompt tuning.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning.
- Improving large language model fine-tuning for solving math problems.
- An empirical study of catastrophic forgetting in large language models during continual fine-tuning.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Adapterfusion: Non-destructive task composition for transfer learning.
- Chengwei Qin and Shafiq Joty. 2022. Lfpt5: A unified framework for lifelong few-shot language learning based on prompt tuning of t5.
- Code llama: Open foundation models for code.
- Conpet: Continual parameter-efficient tuning for large language models. CoRR, abs/2309.14763.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Huatuo: Tuning llama model with chinese medical knowledge.
- Orthogonal subspace learning for language model continual learning.
- Robust fine-tuning of zero-shot models.
- Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097.
- Baichuan 2: Open large-scale language models.
- YangMu Yu. 2023. Cornucopia-llama-fin-chinese.
- GLM-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations (ICLR).
- Investigating the catastrophic forgetting in multimodal large language models.