Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations (2410.18860v1)

Published 24 Oct 2024 in cs.CL and cs.AI

Abstract: LLMs often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

Overview of DeCoRe: Mitigating Hallucinations in LLMs

The paper "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations" introduces a novel strategy aimed at addressing hallucinations in LLMs. Hallucinations, defined as unfaithful or factually incorrect outputs, pose a significant challenge in the deployment of LLMs in critical applications. This research leverages insights into retrieval heads within Transformer architectures to propose a decoding method that mitigates hallucinated generations.

Key Concepts and Methodology

The authors focus on specific attention heads known as "retrieval heads," identified by previous studies as responsible for extracting relevant contextual knowledge. The hypothesis driving this research is that masking these retrieval heads can induce hallucinations, thereby allowing for contrastive decoding to improve output faithfulness. The proposed method, DeCoRe, operates with a training-free approach to dynamically enhance the model's reliability.

Key elements of the DeCoRe methodology include:

  • Masking Retrieval Heads: By selectively masking retrieval heads, the model is intentionally made to generate hallucinations, setting a foundation for contrastive analysis.
  • Contrastive Decoding: The method contrasts the outputs of the base LLM and the hallucinating variant, using conditional entropy as a metric to guide this process. A dynamic scaling factor, based on entropy, adjusts the strength of this contrastive approach.
  • Dynamic Conditioning: Conditional entropy serves not only to mitigate hallucinations but also to assess model uncertainty, playing a pivotal role in improving contextual adherence.

Experimental Evaluation

The authors conduct extensive experiments across datasets requiring faithfulness and factuality. Notable improvements are highlighted in tasks such as summarization (XSum), instruction following (MemoTrap), and open-book QA (NQ-Open and NQ-Swap). Improvements of XSum by 18.6%, MemoTrap by 10.9%, and NQ adjustments exemplify the model's effectiveness.

Additionally, the DeCoRe approach is examined in multi-hop reasoning tasks using Chain of Thought (CoT) prompting. Results reveal superior accuracy compared to existing techniques, showcasing DeCoRe's robust performance across various model families, including Llama3, Mistral, and Qwen2.

Implications and Future Directions

The implications of this research extend to both theoretical understanding and practical deployment of LLMs. By exploring the interaction of hallucination mechanisms and retrieval heads, DeCoRe provides a framework applicable in domains where reliability is paramount. The research speculatively opens paths for further exploration into entropy-based dynamic adjustments and more granular retrieval mechanisms in LLM architectures.

While the DeCoRe framework demonstrates significant improvements, its complementary nature suggests avenues for future enhancements. For example, integrating DeCoRe with additional uncertainty quantification methods or domain-specific fine-tuning remains an open field to increase model robustness further.

In conclusion, this paper contributes a compelling decoding strategy that harnesses intrinsic model components to mitigate a fundamental issue in LLMs. DeCoRe stands as a progressive step in advancing reliable and contextually faithful natural language generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Creating trustworthy llms: Dealing with hallucinations in healthcare ai. arXiv preprint arXiv:2311.01463, 2023.
  2. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  3. In-context sharpness as alerts: An inner representation perspective for hallucination mitigation. In ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024. URL https://openreview.net/forum?id=24U6vAHnYM.
  4. Dola: Decoding by contrasting layers improves factuality in large language models. arXiv preprint arXiv:2309.03883, 2023.
  5. Lookback lens: Detecting and mitigating contextual hallucinations in large language models using only attention maps. arXiv preprint arXiv:2407.07071, 2024.
  6. Large legal fictions: Profiling legal hallucinations in large language models. arXiv preprint arXiv:2401.01301, 2024.
  7. Knowledge neurons in pretrained transformers. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8493–8502, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.581. URL https://aclanthology.org/2022.acl-long.581.
  8. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  9. Depth-adaptive transformer. In ICLR 2020-Eighth International Conference on Learning Representations, pp.  1–14, 2020.
  10. A mathematical framework for transformer circuits. In Transformer Circuits Thread, 2021. URL https://transformer-circuits.pub/2021/framework/index.html.
  11. Detecting hallucinations in large language models using semantic entropy. Nat., 630(8017):625–630, 2024.
  12. FactKB: Generalizable factuality evaluation using language models enhanced with factual knowledge. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  933–952, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.59. URL https://aclanthology.org/2023.emnlp-main.59.
  13. A framework for few-shot language model evaluation, 07 2024. URL https://zenodo.org/records/12608602.
  14. The benefits of bad advice: Autocontrastive decoding across model layers. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  10406–10420, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.580. URL https://aclanthology.org/2023.acl-long.580.
  15. Transformer feed-forward layers are key-value memories. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  5484–5495, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.446. URL https://aclanthology.org/2021.emnlp-main.446.
  16. Dissecting recall of factual associations in auto-regressive language models. In Empirical Methods in Natural Language Processing (EMNLP), 2023. URL https://arxiv.org/abs/2304.14767.
  17. The hallucinations leaderboard–an open effort to measure hallucinations in large language models. arXiv preprint arXiv:2404.05904, 2024.
  18. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023.
  19. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  20. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  21. triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. arXiv e-prints, art. arXiv:1705.03551, 2017.
  22. Language models (mostly) know what they know. CoRR, abs/2207.05221, 2022. doi: 10.48550/ARXIV.2207.05221. URL https://doi.org/10.48550/arXiv.2207.05221.
  23. Greg Kamradt. Needle in a haystack - pressure testing llms. https://github.com/gkamradt/LLMTest_NeedleInAHaystack, 2023.
  24. Large language models struggle to learn long-tail knowledge. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  15696–15707. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/kandpal23a.html.
  25. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics, 2019.
  26. Latent retrieval for weakly supervised open domain question answering. In Anna Korhonen, David Traum, and Lluís Màrquez (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  6086–6096, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1612. URL https://aclanthology.org/P19-1612.
  27. The dawn after the dark: An empirical study on factuality hallucination in large language models. arXiv preprint arXiv:2401.03205, 2024a.
  28. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36, 2024b.
  29. Contrastive decoding: Open-ended text generation as optimization. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  12286–12312, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.687. URL https://aclanthology.org/2023.acl-long.687.
  30. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp.  74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/W04-1013.
  31. Truthfulqa: Measuring how models mimic human falsehoods. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp.  3214–3252. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.ACL-LONG.229. URL https://doi.org/10.18653/v1/2022.acl-long.229.
  32. The memotrap dataset, 2023. URL https://github.com/liujch1998/memo-trap.
  33. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024. doi: 10.1162/tacl˙a˙00638. URL https://aclanthology.org/2024.tacl-1.9.
  34. Entity-based knowledge conflicts in question answering. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  7052–7063, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.565. URL https://aclanthology.org/2021.emnlp-main.565.
  35. Andrey Malinin and Mark J. F. Gales. Uncertainty estimation in autoregressive structured prediction. In ICLR. OpenReview.net, 2021.
  36. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In ACL (1), pp.  9802–9822. Association for Computational Linguistics, 2023.
  37. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pp.  50–60, 1947.
  38. Daniel McFadden et al. Conditional logit analysis of qualitative choice behavior.(1973). Frontiers in Econometrics, ed. P. Zarembka, pp.  105–42, 1973.
  39. Locating and editing factual associations in gpt. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  17359–17372. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf.
  40. The Parallelism Tradeoff: Limitations of Log-Precision Transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 06 2023. ISSN 2307-387X. doi: 10.1162/tacl˙a˙00562. URL https://doi.org/10.1162/tacl_a_00562.
  41. Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018.
  42. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
  43. Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  27730–27744. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf.
  44. Language models are unsupervised multitask learners. In Technical report, OpenAi, 2019. URL https://api.semanticscholar.org/CorpusID:160025533.
  45. A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922, 2023.
  46. Trusting your evidence: Hallucinate less with context-aware decoding. In Kevin Duh, Helena Gomez, and Steven Bethard (eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pp.  783–791, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-short.69. URL https://aclanthology.org/2024.naacl-short.69.
  47. Student. The probable error of a mean. Biometrika, pp.  1–25, 1908.
  48. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), pp.  2464–2469. IEEE, 2016.
  49. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  50. ♪ MuSiQue: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554, 2022. doi: 10.1162/tacl˙a˙00475. URL https://aclanthology.org/2022.tacl-1.31.
  51. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  52. BERTnesia: Investigating the capture and forgetting of knowledge in BERT. In Afra Alishahi, Yonatan Belinkov, Grzegorz Chrupała, Dieuwke Hupkes, Yuval Pinter, and Hassan Sajjad (eds.), Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp.  174–183, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.blackboxnlp-1.17. URL https://aclanthology.org/2020.blackboxnlp-1.17.
  53. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=gEZrGCozdqR.
  54. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022b.
  55. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
  56. Retrieval head mechanistically explains long-context factuality. arXiv preprint arXiv:2404.15574, 2024.
  57. Qwen2 technical report. arXiv preprint arXiv:2407.10671, 2024.
  58. Mechanisms of non-factual hallucinations in language models. arXiv preprint arXiv:2403.18167, 2024.
  59. Attention satisfies: A constraint-satisfaction lens on factual errors of language models. In International Conference on Learning Representations (ICLR), 2024. URL https://arxiv.org/abs/2309.15098.
  60. How language model hallucinations can snowball. arXiv preprint arXiv:2305.13534, 2023a.
  61. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=SkeHuCVFDr.
  62. Alleviating hallucinations of large language models through induced hallucinations. arXiv preprint arXiv:2312.15710, 2023b.
  63. Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023c.
  64. Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Aryo Pradipta Gema (18 papers)
  2. Chen Jin (18 papers)
  3. Ahmed Abdulaal (6 papers)
  4. Tom Diethe (26 papers)
  5. Philip Teare (8 papers)
  6. Beatrice Alex (21 papers)
  7. Pasquale Minervini (88 papers)
  8. Amrutha Saseendran (5 papers)