Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps (2407.07071v2)

Published 9 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: When asked to summarize articles or answer questions given a passage, LLMs can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple approach for detecting such contextual hallucinations. We hypothesize that contextual hallucinations are related to the extent to which an LLM attends to information in the provided context versus its own generations. Based on this intuition, we propose a simple hallucination detection model whose input features are given by the ratio of attention weights on the context versus newly generated tokens (for each attention head). We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model. The lookback ratio-based detector -- Lookback Lens -- is found to transfer across tasks and even models, allowing a detector that is trained on a 7B model to be applied (without retraining) to a larger 13B model. We further apply this detector to mitigate contextual hallucinations, and find that a simple classifier-guided decoding approach is able to reduce the amount of hallucination, for example by 9.6% in the XSum summarization task.

PDF HTML Abstract

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in LLMs Using Only Attention Maps

Introduction and Key Contributions

LLMs have shown remarkable proficiency in generating coherent and context-aware text. However, a prevalent issue remains: contextual hallucinations, where the model introduces extraneous information not supported by the provided context. The paper "Lookback Lens: Detecting and Mitigating Contextual Hallucinations in LLMs Using Only Attention Maps" addresses this specific problem through an innovative methodology that leverages attention maps to detect and mitigate such hallucinations.

Methodology

The essence of the proposed approach is the Lookback Lens, a model designed to detect contextual hallucinations based on the attention patterns of LLMs. The authors introduce the concept of the "lookback ratio," defined as the ratio of attention weights allocated to context tokens versus newly generated tokens. This metric is calculated for each attention head at every layer of the transformer model.

Lookback Ratio Computation: For each head $h$ in layer $l$ at time step $t$ , the lookback ratio $\text{LR}^{l,h}_t$ is computed to assess the degree of focus on context tokens in contrast to generated tokens.

$\text{LR}^{l,h}_t = \frac{A^{l,h}_t(\text{context})}{A^{l,h}_t(\text{context}) + A^{l,h}_t(\text{new})}$
Linear Classifier: A logistic regression classifier is trained using these lookback ratios as features to classify spans of generated text as either factual or hallucinated. This model, termed the Lookback Lens, shows comparable performance to more complex detection systems that utilize hidden states or entailment models.
Generalization Across Tasks and Models: One of the significant findings is the robustness and transferability of the Lookback Lens detector. It exhibits strong performance even when transferred from one task to another (e.g., from summarization to QA) or across different model sizes (e.g., from a 7B parameter model to a 13B parameter model).

Experimental Setup and Results

The experiments are primarily conducted on summarization (CNN/DM, XSum), QA (Natural Questions), and multi-turn conversation (MT-Bench). Key highlights include:

Detection Performance:

The Lookback Lens achieves robust AUROC scores when detecting contextual hallucinations. It surpasses hidden-state-based detectors and is on par with large-scale text entailment models but trained on a significantly smaller dataset (~1k examples compared to ~731k examples).
Mitigation Through Guided Decoding:

The authors integrate the Lookback Lens into a guided decoding strategy. During generation, multiple candidate chunks are sampled, and the chunk with the highest score from the Lookback Lens is chosen. This method reduces hallucinations effectively:
- In the XSum summarization task, the approach reduces hallucinated summaries by 9.6%.
- In the Natural Questions task, there is a noticeable improvement in factual answer generation.
Cross-Model Transfer:

The Lookback Lens also demonstrates promising results when transferred to larger model architectures without retraining. For instance, a detector trained on LLaMA-2-7B-Chat reduces hallucinations in LLaMA-2-13B-Chat by 3.2% in the XSum task.

Discussion

The paper provides a nuanced understanding of how attention maps can be harnessed to enhance the reliability of LLMs. By focusing on lookback ratios, a relatively simple feature derived from attention weights, the authors present a potent tool that is both interpretable and computationally efficient.

Limitations:

The paper acknowledges certain limitations, including:

The inference time is increased due to the need for sampling multiple candidates per decoding step.
The model requires annotated examples for training, which might not always be feasible.

Future Prospects:

Future research could explore real-time applications of Lookback Lens in interactive AI systems, further enhancements in reducing the computational overhead, and potential integrations with other detection mechanisms to handle more complex hallucinations.

Conclusion

This paper presents an effective method for detecting and mitigating contextual hallucinations in LLMs through the innovative use of attention maps. The Lookback Lens showcases the potential to improve the fidelity of LLM outputs across different tasks and models, marking a significant step towards more reliable AI-generated content.