Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

162

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations (2410.18860v1)

Published 24 Oct 2024 in cs.CL and cs.AI

Abstract: LLMs often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

PDF HTML Abstract

Overview of DeCoRe: Mitigating Hallucinations in LLMs

The paper "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations" introduces a novel strategy aimed at addressing hallucinations in LLMs. Hallucinations, defined as unfaithful or factually incorrect outputs, pose a significant challenge in the deployment of LLMs in critical applications. This research leverages insights into retrieval heads within Transformer architectures to propose a decoding method that mitigates hallucinated generations.

Key Concepts and Methodology

The authors focus on specific attention heads known as "retrieval heads," identified by previous studies as responsible for extracting relevant contextual knowledge. The hypothesis driving this research is that masking these retrieval heads can induce hallucinations, thereby allowing for contrastive decoding to improve output faithfulness. The proposed method, DeCoRe, operates with a training-free approach to dynamically enhance the model's reliability.

Key elements of the DeCoRe methodology include:

Masking Retrieval Heads: By selectively masking retrieval heads, the model is intentionally made to generate hallucinations, setting a foundation for contrastive analysis.
Contrastive Decoding: The method contrasts the outputs of the base LLM and the hallucinating variant, using conditional entropy as a metric to guide this process. A dynamic scaling factor, based on entropy, adjusts the strength of this contrastive approach.
Dynamic Conditioning: Conditional entropy serves not only to mitigate hallucinations but also to assess model uncertainty, playing a pivotal role in improving contextual adherence.

Experimental Evaluation

The authors conduct extensive experiments across datasets requiring faithfulness and factuality. Notable improvements are highlighted in tasks such as summarization (XSum), instruction following (MemoTrap), and open-book QA (NQ-Open and NQ-Swap). Improvements of XSum by 18.6%, MemoTrap by 10.9%, and NQ adjustments exemplify the model's effectiveness.

Additionally, the DeCoRe approach is examined in multi-hop reasoning tasks using Chain of Thought (CoT) prompting. Results reveal superior accuracy compared to existing techniques, showcasing DeCoRe's robust performance across various model families, including Llama3, Mistral, and Qwen2.

Implications and Future Directions

The implications of this research extend to both theoretical understanding and practical deployment of LLMs. By exploring the interaction of hallucination mechanisms and retrieval heads, DeCoRe provides a framework applicable in domains where reliability is paramount. The research speculatively opens paths for further exploration into entropy-based dynamic adjustments and more granular retrieval mechanisms in LLM architectures.

While the DeCoRe framework demonstrates significant improvements, its complementary nature suggests avenues for future enhancements. For example, integrating DeCoRe with additional uncertainty quantification methods or domain-specific fine-tuning remains an open field to increase model robustness further.

In conclusion, this paper contributes a compelling decoding strategy that harnesses intrinsic model components to mitigate a fundamental issue in LLMs. DeCoRe stands as a progressive step in advancing reliable and contextually faithful natural language generation.

PDF Markdown Bookmark Chat (Pro)

References (64)

Authors (8)

Aryo Pradipta Gema (18 papers)
Chen Jin (18 papers)
Ahmed Abdulaal (6 papers)
Tom Diethe (26 papers)
Philip Teare (8 papers)
Beatrice Alex (21 papers)
Pasquale Minervini (88 papers)
Amrutha Saseendran (5 papers)

Tweets

https://twitter.com/aryopg/status/1849812217765433674

https://twitter.com/PMinervini/status/1850514515110048109

https://twitter.com/_reachsumit/status/1849667809665744932

https://twitter.com/arXivGPT/status/1850286157982036328

https://twitter.com/arXivGPT/status/1851014013066625237

https://twitter.com/arXivGPT/status/1850648433457021139