Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup (2403.00820v1)

Published 26 Feb 2024 in cs.IR and cs.CL

Abstract: Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-LLM outputs with domain specific and time sensitive data. Very recently a shift is happening from simple RAG setups that query a vector database for additional information with every user input to more sophisticated forms of RAG. However, different concrete approaches compete on mostly anecdotal evidence at the moment. In this paper we present a rigorous dataset creation and evaluation workflow to quantitatively compare different RAG strategies. We use a dataset created this way for the development and evaluation of a boolean agent RAG setup: A system in which a LLM can decide whether to query a vector database or not, thus saving tokens on questions that can be answered with internal knowledge. We publish our code and generated dataset online.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (13)

Authors (3)

Tristan Kenneweg (7 papers)
Philip Kenneweg (13 papers)
Barbara Hammer (125 papers)

Citations (2)

View on Semantic Scholar

Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup (2403.00820v1)

Related Papers