RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots (2403.01193v3)
Abstract: LLMs like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.
- Artificial hallucinations in chatgpt: implications in scientific writing. Cureus, 15(2), 2023.
- Climbing towards nlu: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198, 2020.
- Ali Borji. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494, 2023.
- Benchmarking large language models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431, 2023.
- Trapping llm hallucinations using tagged context prompts. arXiv preprint arXiv:2306.06085, 2023.
- The impact of chatgpt on human data collection: A case study involving typicality norming data. Behavior Research Methods, pages 1–8, 2023.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896, 2023.
- Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552, 2023.
- Large language models are strong zero-shot retriever, 2023.
- Herbert A Simon. The sciences of the artificial. MIT press, 1996.
- Curriculum vitae: challenges and potential solutions. KOME: An International Journal of Pure Communication Inquiry, 8(2):109–127, 2020.
- Prevalence and prevention of large language model use in crowd work, 2023.
- On the security of containers: Threat modeling, attack analysis, and mitigation strategies. Computers & Security, 128:103140, 2023.
- Exploring AI ethics of ChatGPT: A diagnostic analysis. arXiv preprint arXiv:2301.12867, 2023.
- Shimei Pan (28 papers)
- Philip Feldman (19 papers)
- James R. Foulds (12 papers)