Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup (2403.00820v1)
Abstract: Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-LLM outputs with domain specific and time sensitive data. Very recently a shift is happening from simple RAG setups that query a vector database for additional information with every user input to more sophisticated forms of RAG. However, different concrete approaches compete on mostly anecdotal evidence at the moment. In this paper we present a rigorous dataset creation and evaluation workflow to quantitatively compare different RAG strategies. We use a dataset created this way for the development and evaluation of a boolean agent RAG setup: A system in which a LLM can decide whether to query a vector database or not, thus saving tokens on questions that can be answered with internal knowledge. We publish our code and generated dataset online.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” 2020.
- M. Mitchell, A. B. Palmarini, and A. Moskvichev, “Comparing humans, gpt-4, and gpt-4v on abstraction and reasoning tasks,” 2023.
- N. Kandpal, H. Deng, A. Roberts, E. Wallace, and C. Raffel, “Large language models struggle to learn long-tail knowledge,” 2023.
- P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.
- Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, “Gpteval: Nlg evaluation using gpt-4 with better human alignment,” arXiv preprint arXiv:2303.16634, 2023.
- Y.-T. Lin and Y.-N. Chen, “LLM-eval: Unified multi-dimensional automatic evaluation for open-domain conversations with large language models,” in Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), Y.-N. Chen and A. Rastogi, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 47–58. [Online]. Available: https://aclanthology.org/2023.nlp4convai-1.5
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018. [Online]. Available: http://arxiv.org/abs/1810.04805
- OpenAI, “New and improved embedding model,” https://openai.com/blog/new-and-improved-embedding-model, 2022, accessed: 2024-01-24.
- P. Finardi, L. Avila, R. Castaldoni, P. Gengo, C. Larcher, M. Piau, P. Costa, and V. Caridá, “The chronicles of rag: The retriever, the chunk and the generator,” 2024.
- L. Gao, X. Ma, J. Lin, and J. Callan, “Precise zero-shot dense retrieval without relevance labels,” 2022.
- Pinecone, “Rag with guardrails,” https://www.pinecone.io/learn/fast-retrieval-augmented-generation/, accessed: 2024-01-24.
- Z. Dai, V. Y. Zhao, J. Ma, Y. Luan, J. Ni, J. Lu, A. Bakalov, K. Guu, K. B. Hall, and M.-W. Chang, “Promptagator: Few-shot dense retrieval from 8 examples,” 2022.
- S. Es, J. James, L. Espinosa-Anke, and S. Schockaert, “Ragas: Automated evaluation of retrieval augmented generation,” 2023.
- Tristan Kenneweg (7 papers)
- Philip Kenneweg (13 papers)
- Barbara Hammer (125 papers)