Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots (2403.01193v3)

Published 2 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Artificial hallucinations in chatgpt: implications in scientific writing. Cureus, 15(2), 2023.
  2. Climbing towards nlu: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198, 2020.
  3. Ali Borji. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494, 2023.
  4. Benchmarking large language models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431, 2023.
  5. Trapping llm hallucinations using tagged context prompts. arXiv preprint arXiv:2306.06085, 2023.
  6. The impact of chatgpt on human data collection: A case study involving typicality norming data. Behavior Research Methods, pages 1–8, 2023.
  7. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  8. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896, 2023.
  9. Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552, 2023.
  10. Large language models are strong zero-shot retriever, 2023.
  11. Herbert A Simon. The sciences of the artificial. MIT press, 1996.
  12. Curriculum vitae: challenges and potential solutions. KOME: An International Journal of Pure Communication Inquiry, 8(2):109–127, 2020.
  13. Prevalence and prevention of large language model use in crowd work, 2023.
  14. On the security of containers: Threat modeling, attack analysis, and mitigation strategies. Computers & Security, 128:103140, 2023.
  15. Exploring AI ethics of ChatGPT: A diagnostic analysis. arXiv preprint arXiv:2301.12867, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shimei Pan (28 papers)
  2. Philip Feldman (19 papers)
  3. James R. Foulds (12 papers)
Citations (9)