Prompt-RAG: Pioneering Vector Embedding-Free Retrieval-Augmented Generation in Niche Domains, Exemplified by Korean Medicine (2401.11246v1)

Published 20 Jan 2024 in cs.CL and cs.IR

Abstract: We propose a natural language prompt-based retrieval augmented generation (Prompt-RAG), a novel approach to enhance the performance of generative LLMs in niche domains. Conventional RAG methods mostly require vector embeddings, yet the suitability of generic LLM-based embedding representations for specialized domains remains uncertain. To explore and exemplify this point, we compared vector embeddings from Korean Medicine (KM) and Conventional Medicine (CM) documents, finding that KM document embeddings correlated more with token overlaps and less with human-assessed document relatedness, in contrast to CM embeddings. Prompt-RAG, distinct from conventional RAG models, operates without the need for embedding vectors. Its performance was assessed through a Question-Answering (QA) chatbot application, where responses were evaluated for relevance, readability, and informativeness. The results showed that Prompt-RAG outperformed existing models, including ChatGPT and conventional vector embedding-based RAGs, in terms of relevance and informativeness. Despite challenges like content structuring and response latency, the advancements in LLMs are expected to encourage the use of Prompt-RAG, making it a promising tool for other domains in need of RAG methods.

PDF Abstract

Introduction to Prompt-RAG

The paper in focus presents Prompt-RAG, a novel natural language prompt-based retrieval augmented generation method expressly designed for niche domains, exemplified through its application in Korean Medicine (KM). Traditional Retrieval-Augmented Generation (RAG) models utilize vector embeddings to fetch relevant information necessary for generating responses. However, vector embeddings derived from generic LLMs may not faithfully capture specialized knowledge, an issue that is particularly pronounced in niche fields. Prompt-RAG distinguishes itself from its predecessors by operating without vector embeddings, potentially bypassing this limitation.

Methodology Overview

Prompt-RAG involves three steps: preprocessing, heading selection, and retrieval-augmented generation. The methodology begins with creating a Table of Contents (ToC) from the target document(s), which becomes the basis for retrieval. A large-scale pre-trained generative model assesses the ToC in conjunction with a user query to select the most pertinent headings. This is followed by gathering content correlated with these headings to construct a contextual reference. The generative model then produces a response to the query using this reference. Underlying this process is the advanced natural language understanding possessed by modern LLMs.

Experimentation and Findings

In assessing Prompt-RAG's efficacy, the researchers created a Question-Answering (QA) chatbot for KM. The paper found that for KM documents, vector embeddings correlated better with token overlaps rather than human-assessed document relatedness - a trend not observed in Conventional Medicine (CM) embeddings. When comparing Prompt-RAG's QA chatbot to ChatGPT and conventional RAG models, the results indicated superior performance in the aspects of relevance and informativeness, albeit with certain challenges like content structuring and increased response latency.

Implications for Future Research and Application

The findings of the current investigation suggest that Prompt-RAG has considerable potential for applications in specialized domains. By leveraging the linguistic capabilities of LLMs, Prompt-RAG circumvents the limitations associated with the conventional RAG models. Its utility is exemplified in the domain of KM but is not confined to it. The authors anticipate that as generative abilities of LLMs progress and the associated costs decrease, Prompt-RAG could become a powerful tool for information retrieval in a variety of other domains as well. Despite apparent current challenges such as document structuring requirements and longer response times, the researchers remain optimistic about the model's evolving practicality.