Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Large Language Models' Hallucination with Regard to Known Facts (2403.20009v2)

Published 29 Mar 2024 in cs.CL and cs.LG

Abstract: LLMs are successful in answering factoid questions but are also prone to hallucination. We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations. We are able to conduct this analysis via two key ideas. First, we identify the factual questions that query the same triplet knowledge but result in different answers. The difference between the model behaviors on the correct and incorrect outputs hence suggests the patterns when hallucinations happen. Second, to measure the pattern, we utilize mappings from the residual streams to vocabulary space. We reveal the different dynamics of the output token probabilities along the depths of layers between the correct and hallucinated cases. In hallucinated cases, the output token's information rarely demonstrates abrupt increases and consistent superiority in the later stages of the model. Leveraging the dynamic curve as a feature, we build a classifier capable of accurately detecting hallucinatory predictions with an 88\% success rate. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Guillaume Alain and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644.
  2. A review on language models as knowledge bases. arXiv preprint arXiv:2204.06031.
  3. Amos Azaria and Tom Mitchell. 2023. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734.
  4. Eliciting latent predictions from transformers with the tuned lens. ArXiv, abs/2303.08112.
  5. Pythia: A suite for analyzing large language models across training and scaling. ArXiv, abs/2304.01373.
  6. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR.
  7. Invariant rationalization. In International Conference on Machine Learning, pages 1448–1458. PMLR.
  8. Dola: Decoding by contrasting layers improves factuality in large language models. arXiv preprint arXiv:2309.03883.
  9. Crawling the internal knowledge-base of language models. arXiv preprint arXiv:2301.12810.
  10. Analyzing transformers in embedding space. In Annual Meeting of the Association for Computational Linguistics.
  11. Explaining how transformers use context to build predictions. arXiv preprint arXiv:2305.12535.
  12. Dissecting recall of factual associations in auto-regressive language models. arXiv preprint arXiv:2304.14767.
  13. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 30–45.
  14. Overthinking the truth: Understanding how language models process false demonstrations. ArXiv, abs/2307.09476.
  15. How does gpt-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. ArXiv, abs/2305.00586.
  16. Understanding transformer memorization recall through idioms. arXiv preprint arXiv:2210.03588.
  17. Linearity of relation decoding in transformer language models. ArXiv, abs/2308.09124.
  18. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
  19. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
  20. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning.
  21. Trustworthy ai: From principles to practices. ACM Comput. Surv., 55(9).
  22. Inference-time intervention: Eliciting truthful answers from a language model. arXiv preprint arXiv:2306.03341.
  23. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372.
  24. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229.
  25. Emergent linear representations in world models of self-supervised sequence models. ArXiv, abs/2309.00941.
  26. nostalgebraist. 2020. interpreting gpt: the logit lens. https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens.
  27. Do machine learning models memorize or generalize? https://pair.withgoogle.com/explorables/grokking/.
  28. Language models as knowledge bases? arXiv preprint arXiv:1909.01066.
  29. Retrieval augmentation reduces hallucination in conversation. In Conference on Empirical Methods in Natural Language Processing.
  30. The curious case of hallucinatory (un) answerability: Finding truths in the hidden states of over-confident large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3607–3625.
  31. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  32. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. ArXiv, abs/2305.04388.
  33. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada. Association for Computational Linguistics.
  34. Give me the facts! a survey on factual knowledge probing in pre-trained language models. In Conference on Empirical Methods in Natural Language Processing.
  35. Rethinking cooperative rationalization: Introspective extraction and complement control. In Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing. Association for Computational Linguistics.
  36. Attention satisfies: A constraint-satisfaction lens on factual errors of language models. In The Twelfth International Conference on Learning Representations.
  37. Fred Zhang and Neel Nanda. 2023. Towards best practices of activation patching in language models: Metrics and methods. arXiv preprint arXiv:2309.16042.
  38. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  39. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
  40. Factual probing is [mask]: Learning vs. learning to recall. In North American Chapter of the Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Che Jiang (8 papers)
  2. Biqing Qi (37 papers)
  3. Xiangyu Hong (2 papers)
  4. Dayuan Fu (13 papers)
  5. Yang Cheng (50 papers)
  6. Fandong Meng (174 papers)
  7. Mo Yu (117 papers)
  8. Bowen Zhou (141 papers)
  9. Jie Zhou (687 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.