Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models (2407.08039v1)

Published 10 Jul 2024 in cs.CL

Abstract: Hallucination is often regarded as a major impediment for using LLMs, especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, LLMs still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a LLM with multiple conditions, some conditions overshadow others, leading to hallucinated outputs. This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes.From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns). We show that the hallucination rate grows with both the imbalance ratio (between the popular and unpopular condition) and the length of dominant condition description, consistent with our derived generalization bound. Finally, we propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced, along with a training-free self-contrastive decoding method to alleviate hallucination during inference. Our proposed approach showcases up to 82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control, with different models and datasets.

PDF HTML Abstract

Knowledge Overshadowing and Hallucination in LLMs

The paper presents a focused examination of a phenomenon termed "knowledge overshadowing" and its relationship with hallucinations in LLMs. Hallucinations in LLMs refer to the generation of incorrect or unfounded information, which becomes particularly problematic in knowledge-intensive tasks. Despite training on accurate data, LLMs can still produce misleading outputs when queried with complex conditions.

Core Findings and Methodology

The paper identifies that when LLMs are queried under multiple concurrent conditions, dominant conditions can overshadow less prevalent ones. This results in amalgamated hallucinations, a type of hallucinatory output where the model fails to properly account for all given conditions. This overshadowing effect is attributed to imbalanced training data that cause certain conditions to be prioritized over others. The paper rigorously investigates this phenomenon across various model types and architectures using both pretrained and fine-tuned models.

The researchers quantify how the hallucination rate correlates with the imbalance ratio between popular and less popular conditions, finding that this rate increases with both the imbalance and the dominance length of the condition description. Their theoretical exploration includes the derivation of a generalization bound for auto-regressive LLMing, which correlates with empirical observations.

Experimentation and Results

The team conducted extensive experiments involving various conditions, such as time-event and location-event relations, gender pronoun resolution, and negation queries. Their results highlight that the hallucination rate consistently rises alongside the imbalance ratio (up to 100:1) and confirm the substantial prevalence of knowledge overshadowing.

Statistically, proposed solutions such as a self-contrastive decoding method demonstrate notable improvements, reducing hallucinations by 11.2% to 39.4% in experimental setups. This technique emphasizes leveraging overshadowing conditions as a signal, thereby enabling the anticipatory detection of hallucinations before they occur.

Implications and Future Directions

The research elucidates important insights into how LLMs handle multi-condition queries and recognizes data imbalance as a critical factor in hallucination. This understanding emphasizes the need for more balanced training datasets to counteract overshadowing effects, potentially ameliorating the accuracy and trustworthiness of LLM outputs.

The paper opens avenues for further refinement of inference-time interventions and model architectures that can inherently manage condition imbalances. Future work could extend these concepts to larger, more complex datasets and model classes, testing the universality and scalability of proposed solutions across diverse AI applications.

In conclusion, the investigation not only deepens our comprehension of the intrinsic hallucination issues in LLMs but also provides practical methodologies for improving the models' fidelity by mitigating knowledge overshadowing effects.