Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Role Prompting Guided Domain Adaptation with General Capability Preserve for Large Language Models (2403.02756v1)

Published 5 Mar 2024 in cs.CL

Abstract: The growing interest in LLMs for specialized applications has revealed a significant challenge: when tailored to specific domains, LLMs tend to experience catastrophic forgetting, compromising their general capabilities and leading to a suboptimal user experience. Additionally, crafting a versatile model for multiple domains simultaneously often results in a decline in overall performance due to confusion between domains. In response to these issues, we present the RolE Prompting Guided Multi-Domain Adaptation (REGA) strategy. This novel approach effectively manages multi-domain LLM adaptation through three key components: 1) Self-Distillation constructs and replays general-domain exemplars to alleviate catastrophic forgetting. 2) Role Prompting assigns a central prompt to the general domain and a unique role prompt to each specific domain to minimize inter-domain confusion during training. 3) Role Integration reuses and integrates a small portion of domain-specific data to the general-domain data, which are trained under the guidance of the central prompt. The central prompt is used for a streamlined inference process, removing the necessity to switch prompts for different domains. Empirical results demonstrate that REGA effectively alleviates catastrophic forgetting and inter-domain confusion. This leads to improved domain-specific performance compared to standard fine-tuned models, while still preserving robust general capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Does an LSTM forget more than a CNN? an empirical study of catastrophic forgetting in NLP. In Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, pages 77–86. Australasian Language Technology Association.
  2. Language models are few-shot learners.
  3. ChatLaw: Open-source legal large language model with integrated external knowledge bases.
  4. Specializing smaller language models towards multi-step reasoning. In Proceedings of the 40th International Conference on Machine Learning, pages 10421–10430. PMLR. ISSN: 2640-3498.
  5. Textbooks are all you need.
  6. Mix-review: Alleviate forgetting in the pretrain-finetune framework for neural language generation models.
  7. LoRA: Low-rank adaptation of large language models.
  8. Pubmedqa: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577. Association for Computational Linguistics.
  9. Better zero-shot reasoning with role-play prompting.
  10. Zhizhong Li and Derek Hoiem. 2018. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947.
  11. Speciality vs generality: An empirical study on catastrophic forgetting in fine-tuning foundation models.
  12. The flan collection: Designing data and methods for effective instruction tuning.
  13. David Lopez-Paz and Marc’ Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  14. Bbt-fin: Comprehensive construction of chinese financial domain pre-trained language model, corpus and benchmark.
  15. An empirical study of catastrophic forgetting in large language models during continual fine-tuning.
  16. Training language models to follow instructions with human feedback.
  17. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Proceedings of the Conference on Health, Inference, and Learning, pages 248–260. PMLR.
  18. One model to serve all: Star topology adaptive recommender for multi-domain CTR prediction.
  19. Moss: Training conversational language models from synthetic data.
  20. LLaMA: Open and Efficient Foundation Language Models. ArXiv:2302.13971 [cs].
  21. Llama 2: Open foundation and fine-tuned chat models.
  22. HuaTuo: Tuning LLaMA model with chinese medical knowledge.
  23. Decoupled training: Return of frustratingly easy multi-domain learning.
  24. Self-instruct: Aligning language model with self generated instructions.
  25. Chain-of-thought prompting elicits reasoning in large language models. Version: 1.
  26. PMC-LLaMA: Towards building open-source language models for medicine.
  27. Large language models are diverse role-players for summarization evaluation.
  28. BloombergGPT: A large language model for finance.
  29. WizardLM: Empowering large language models to follow complex instructions.
  30. CBLUE: A Chinese biomedical language understanding evaluation benchmark. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7888–7915, Dublin, Ireland. Association for Computational Linguistics.
  31. Multi-scale attentive interaction networks for chinese medical question answer selection. IEEE Access, 6:74061–74071.
  32. XuanYuan 2.0: A large chinese financial chat model with hundreds of billions parameters.
  33. Judging LLM-as-a-judge with MT-bench and chatbot arena.
  34. When does pretraining help? assessing self-supervised learning for law and the casehold dataset.
  35. Continual prompt tuning for dialog state tracking.
Citations (5)

Summary

We haven't generated a summary for this paper yet.