A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest (2311.10614v1)

Published 17 Nov 2023 in cs.CL and cs.AI

Abstract: LLMs, despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (9)

Ruohong Zhang (11 papers)
Luyu Gao (26 papers)
Chen Zheng (52 papers)
Zhen Fan (21 papers)
Guokun Lai (16 papers)
Zheng Zhang (486 papers)
Fangzhou Ai (8 papers)
Yiming Yang (151 papers)
Hongxia Yang (130 papers)

Citations (2)

View on Semantic Scholar

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest (2311.10614v1)

Related Papers