Emergent Mind

Why Does ChatGPT Fall Short in Providing Truthful Answers?

(2304.10513)
Published Apr 20, 2023 in cs.CL and cs.AI

Abstract

Recent advancements in LLMs, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in providing reliable and accurate answers to user questions. To better understand the model's particular weaknesses in providing truthful answers, we embark an in-depth exploration of open-domain question answering. Specifically, we undertake a detailed examination of ChatGPT's failures, categorized into: comprehension, factuality, specificity, and inference. We further pinpoint factuality as the most contributing failure and identify two critical abilities associated with factuality: knowledge memorization and knowledge recall. Through experiments focusing on factuality, we propose several potential enhancement strategies. Our findings suggest that augmenting the model with granular external knowledge and cues for knowledge recall can enhance the model's factuality in answering questions.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. The process of question answering - a computer simulation of cognition. American Journal of Computational Linguistics, 6(3-4)
  2. Towards a Human-like Open-Domain Chatbot
  3. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity
  4. A. Borji. A categorical archive of chatgpt failures
  5. V. Braun and V. Clarke. Thematic analysis. American Psychological Association
  6. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  7. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota, 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1300.
  8. Mathematical capabilities of chatgpt
  9. How close is chatgpt to human experts? comparison corpus, evaluation, and detection
  10. J. Huang and K. C.-C. Chang. Towards reasoning in large language models: A survey. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics
  11. Are large pre-trained language models leaking your personal information? In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2038–2047, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics.
  12. Can language models be specific? how? In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, 2023a.
  13. RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
  14. Atlas: Few-shot learning with retrieval augmented language models. arXiv, 2208
  15. Is chatgpt a good translator? yes with gpt-4 as the engine
  16. Chatgpt: Jack of all trades, master of none
  17. Internet-augmented dialogue generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8460–8478, Dublin, Ireland, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.579.
  18. Internet-augmented language models through few-shot prompting for open-domain question answering
  19. Factuality enhanced language models for open-ended text generation
  20. A survey on multi-hop question answering and generation
  21. M. McHugh. Interrater reliability: The kappa statistic. Biochemia medica : časopis Hrvatskoga društva medicinskih biokemičara / HDMB, 22:276–82, 2012. doi: 10.11613/BM.2012.031.
  22. OpenAI. Chatgpt: Optimizing language models for dialogue. OpenAI
  23. OpenAI. Gpt-4 technical report
  24. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1250.
  25. Is chatgpt a general-purpose natural language processing task solver?
  26. P. P. Ray. Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems
  27. REPLUG: Retrieval-Augmented Black-Box Language Models
  28. An analysis of the automatic bug fixing performance of chatgpt
  29. Evaluation of chatgpt as a question answering system for answering complex questions
  30. On the robustness of chatgpt: An adversarial and out-of-distribution perspective
  31. Chain-of-thought prompting elicits reasoning in large language models
  32. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium, 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1259.
  33. Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert
  34. Exploring ai ethics of chatgpt: A diagnostic analysis

Show All 34