Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning (2404.00213v2)
Abstract: In recent years, LLMs have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on the domain of recent sporting events. We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. We present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge. This study contributes to the understanding of domain adaptation for LLMs and highlights the potential of SFT in enhancing the factuality of LLM responses in specific knowledge domains.
- Nick Mecklenburg (3 papers)
- Yiyou Lin (1 paper)
- Xiaoxiao Li (144 papers)
- Daniel Holstein (3 papers)
- Leonardo Nunes (5 papers)
- Sara Malvar (6 papers)
- Bruno Silva (16 papers)
- Ranveer Chandra (30 papers)
- Vijay Aski (6 papers)
- Pavan Kumar Reddy Yannam (1 paper)
- Tolga Aktas (3 papers)
- Todd Hendry (2 papers)