RAFT: Adapting Language Model to Domain Specific RAG (2403.10131v1)

Published 15 Mar 2024 in cs.CL and cs.AI

Abstract: Pretraining LLMs on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla.

PDF Abstract

Enhancing Domain Specific Retrieval-Augmented Generation with RAFT

Introduction

The adaptation of pre-trained LLMs to specific domains or applications remains a pivotal challenge in NLP. This challenge is particularly pronounced in scenarios where LLMs need to integrate new, possibly domain-specific knowledge post-training. The standard approaches involve fine-tuning LLMs with additional data or using Retrieval-Augmented Generation (RAG) techniques to augment LLM capabilities dynamically. However, the optimal strategy for efficiently and effectively imbuing LLMs with new knowledge remains an open research question.

RAFT Methodology

We introduce Retrieval Augmented Fine Tuning (RAFT), a novel training architecture designed to optimize LLMs' performance in answering questions in domain-specific RAG settings. At its core, RAFT trains models to differentiate and disregard non-essential documents (distractors) when provided with a set of retrieved documents, improving precision in answering questions. A key innovation in RAFT is its ability to cite directly from relevant documents, facilitating a chain-of-thought-style reasoning process. This approach is intended to not only enhance the model's reasoning capabilities but also its ability to leverage contextual information more effectively.

Experimental Setup

Our evaluation of RAFT spans a variety of datasets, including PubMed, HotpotQA, and Gorilla, which together encapsulate a wide range of domain-specific knowledge from biomedical research to software development frameworks. The benchmarks consistently show that models fine-tuned with RAFT outperform their counterparts that underwent standard supervised finetuning, both with and without the inclusion of RAG at inference time. Furthermore, RAFT's superiority is evident across different metrics, illustrating its robustness and versatility as a fine-tuning approach for domain-specific RAG.

Implications and Future Directions

The RAFT training methodology has significant practical and theoretical implications for the field of natural language processing and AI at large. Practically, RAFT offers an efficient approach to imbue LLMs with domain-specific knowledge, enhancing their applicability and performance in specialized settings. Theoretically, the success of RAFT raises interesting questions about the role of distractor documents in model training and the importance of chain-of-thought reasoning for domain-specific knowledge integration.

Looking ahead, we anticipate that domain-specific RAG will garner increasing interest, both in academic research and industrial applications. The current trends suggest a shift towards smaller, domain-specialized models that can efficiently handle specific tasks, compared to more generic, larger models. RAFT represents a significant step forward in this paradigm, offering a viable pathway to harnessing the full potential of LLMs in domain-specific applications.

Conclusion

RAFT provides a promising new avenue for fine-tuning LLMs to enhance performance in domain-specific RAG tasks. By effectively leveraging distractor documents and incorporating chain-of-thought reasoning, RAFT allows models to utilize contextual information more accurately and efficiently. As we continue to explore the capabilities and limitations of LLMs, methodologies like RAFT will play a crucial role in unlocking the next generation of AI-driven applications, tailored to specific domains and challenges.