Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding (2305.14739v1)

Published 24 May 2023 in cs.CL

Abstract: LLMs (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we present context-aware decoding (CAD), which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. Our experiments show that CAD, without additional training, significantly improves the faithfulness of different LM families, including OPT, GPT, LLaMA and FLAN-T5 for summarization tasks (e.g., 14.3% gain for LLaMA in factuality metrics). Furthermore, CAD is particularly effective in overriding a model's prior knowledge when it contradicts the provided context, leading to substantial improvements in tasks where resolving the knowledge conflict is essential.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Weijia Shi (55 papers)
  2. Xiaochuang Han (23 papers)
  3. Mike Lewis (78 papers)
  4. Yulia Tsvetkov (142 papers)
  5. Luke Zettlemoyer (225 papers)
  6. Scott Wen-tau Yih (5 papers)
Citations (154)

Summary

Context-aware Decoding for Mitigating LLM Hallucinations

The paper "Trusting Your Evidence: Hallucinate Less with Context-aware Decoding" introduces a novel approach termed Context-aware Decoding (CAD) aimed at enhancing the fidelity of LLMs (LMs) to provided input contexts. The methodology is devised to tackle a prevalent issue where LMs generate unfaithful content by neglecting, or hallucinating beyond, the context they are given.

Problem Statement and Motivation

Existing literature has highlighted the challenge of LMs producing outputs that include incorrect information or hallucinations, especially in summarization tasks. This issue is amplified when the context contradicts the model's prior knowledge, which is often predetermined by outdated training datasets. Such limitations necessitate innovative decoding strategies that ensure generated content remains true to the context.

Context-aware Decoding (CAD) Approach

The CAD strategy proposed in the paper adjusts the typical autoregressive generation process by contrasting the LM's outputs when considering its context versus its prior knowledge. Specifically, the CAD algorithm modifies the likelihood of generated tokens by amplifying differences between the output probabilities with and without incorporating contextual information. CAD utilizes a contrastive decoding technique that computes Pointwise Mutual Information (PMI) between the context and the generation to emphasize more relevant outputs.

Mathematically, this is expressed by augmenting the conventional probability with a context-conditioning term multiplied by a hyperparameter α, allowing CAD to effectively demote the influence of the LM's prior knowledge when contradictions in the context are detected. Importantly, this method does not necessitate additional training and can be seamlessly applied to pre-existing LLMs.

Experimental Setup

The authors validated CAD using various LMs, including OPT, GPT-Neo, LLaMA, and FLAN-T5, across tasks that require strict context adherence, such as summarization and knowledge conflict resolution. The paper utilizes two datasets, CNN-DM and XSUM, to evaluate summarization, while MemoTrap and NQ-Swap datasets assess knowledge conflict resolution.

The evaluation employs standard metrics like ROUGE-L and BERT-Precision for summarization quality and factual consistency, and Exact Match (EM) for knowledge conflict tasks.

Results and Implications

Empirical results underscore CAD's effectiveness, demonstrating substantial improvements in both the fidelity and factuality of LM outputs. For instance, application to LLaMA-30B on the CNN-DM dataset yields a 21% increase in ROUGE-L score and a 14.3% gain in factual consistency metrics. Similarly, CAD shows significant enhancements in conflicting knowledge tasks, with a 2.9x improvement on LLaMA-30B when dealing with contradictory information.

The analysis reveals CAD’s particular efficacy in larger model frameworks, where the influence of prior knowledge is more pronounced due to expansive training datasets. This affirms the utility of CAD in aligning model outputs with real-time context, especially in domains necessitating up-to-date information.

Future Directions

The implications of CAD extend to the potential for improving the reliability of LMs in real-time applications where dynamic and contextually accurate information is critical—such as in news summarization or automated customer service. Future research may explore adaptive hyperparameter tuning (α) for different model architectures and contexts or integrate CAD with retrieval-augmented models to further enhance faithfulness to external knowledge sources.

Conclusion

This work advances an essential decoding innovation aimed at mitigating hallucinations in LLM outputs by reinforcing model attention toward contextually grounded information. Through CAD, practitioners can expect robust improvements in the alignment of LMs with the provided context, thereby enhancing the accuracy and dependability of automated text generation systems.