Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models
Abstract: LLMs are powerful computational models trained on extensive corpora of human-readable text, enabling them to perform general-purpose language understanding and generation. LLMs have garnered significant attention in both industry and academia due to their exceptional performance across various NLP tasks. Despite these successes, LLMs often produce inaccuracies, commonly referred to as hallucinations. Prompt engineering, the process of designing and formulating instructions for LLMs to perform specific tasks, has emerged as a key approach to mitigating hallucinations. This paper provides a comprehensive empirical evaluation of different prompting strategies and frameworks aimed at reducing hallucinations in LLMs. Various prompting techniques are applied to a broad set of benchmark datasets to assess the accuracy and hallucination rate of each method. Additionally, the paper investigates the influence of tool-calling agents (LLMs augmented with external tools to enhance their capabilities beyond language generation) on hallucination rates in the same benchmarks. The findings demonstrate that the optimal prompting technique depends on the type of problem, and that simpler techniques often outperform more complex methods in reducing hallucinations. Furthermore, it is shown that LLM agents can exhibit significantly higher hallucination rates due to the added complexity of external tool usage.
- X. Amatriain. 2024. Measuring and Mitigating Hallucinations in Large Language Models: A Multifaceted Approach.
- Training Verifiers to Solve Math Word Problems. arXiv:2110.14168Â [cs.LG]
- Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv:2309.11495Â [cs.CL]
- B. H. Dowden. 1993. Logical reasoning. Wadsworth, Sacramento.
- Improving Factuality and Reasoning in Language Models through Multiagent Debate. arXiv:2305.14325Â [cs.CL] Preprint.
- Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting. arXiv:2311.13314Â [cs.CL] https://arxiv.org/abs/2311.13314
- Measuring Massive Multitask Language Understanding. arXiv:2009.03300Â [cs.CY] https://arxiv.org/abs/2009.03300
- A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv:2311.05232Â [cs.CL]
- Survey of Hallucination in Natural Language Generation. Comput. Surveys 55, 12 (March 2023), 1–38. https://doi.org/10.1145/3571730
- TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Vancouver, Canada, 1601––1611.
- Large Language Models: A Survey. arXiv:2402.06196Â [cs.CL]
- M. Minsky. 1988. The Society of Mind. Simon & Schuster, New York. 97–101 pages.
- O. Mortensen. 2024. How many users does ChatGPT have? Statistics & facts (2024). https://seo.ai/blog/how-many-users-does-chatgpt-have#:~:text=How%20Many%20Users%20on%20ChatGPT,boasts%20approximately%20180.5%20million%20users. Accessed: 24 September 2024.
- Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation. In The Twelfth International Conference on Learning Representations. OpenReview, Virtual/Online. https://openreview.net/forum?id=EmQSOi1X2f
- Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- D. Vrandecic and M. Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
- Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171Â [cs.CL]
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903Â [cs.CL]
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601Â [cs.CL]
- ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629Â [cs.CL] https://arxiv.org/abs/2210.03629
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.