A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest (2311.10614v1)
Abstract: LLMs, despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.
- Ruohong Zhang (11 papers)
- Luyu Gao (26 papers)
- Chen Zheng (52 papers)
- Zhen Fan (21 papers)
- Guokun Lai (16 papers)
- Zheng Zhang (486 papers)
- Fangzhou Ai (8 papers)
- Yiming Yang (151 papers)
- Hongxia Yang (130 papers)