Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs (2505.16894v1)

Published 22 May 2025 in cs.CL

Abstract: Hallucinations -- plausible yet erroneous outputs -- remain a critical barrier to reliable deployment of LLMs. We present the first systematic study linking hallucination incidence to internal-state drift induced by incremental context injection. Using TruthfulQA, we construct two 16-round "titration" tracks per question: one appends relevant but partially flawed snippets, the other injects deliberately misleading content. Across six open-source LLMs, we track overt hallucination rates with a tri-perspective detector and covert dynamics via cosine, entropy, JS and Spearman drifts of hidden states and attention maps. Results reveal (1) monotonic growth of hallucination frequency and representation drift that plateaus after 5--7 rounds; (2) relevant context drives deeper semantic assimilation, producing high-confidence "self-consistent" hallucinations, whereas irrelevant context induces topic-drift errors anchored by attention re-routing; and (3) convergence of JS-Drift ($\sim0.69$) and Spearman-Drift ($\sim0$) marks an "attention-locking" threshold beyond which hallucinations solidify and become resistant to correction. Correlation analyses expose a seesaw between assimilation capacity and attention diffusion, clarifying size-dependent error modes. These findings supply empirical foundations for intrinsic hallucination prediction and context-aware mitigation mechanisms.

PDF Abstract

Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs

The paper "Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs" presented by Zeyu Wei et al. undertakes a comprehensive investigation into the mechanics of hallucination phenomena within LLMs. These hallucinations, defined as outputs that are plausible yet erroneous, pose a significant challenge to the reliability of LLM deployments across various domains, especially those requiring strict adherence to factual information. The research delineates the impact of gradual context injections on internal representation drift, which primarily results in hallucinated outputs.

The methodology involves a rigorous examination of six open-source LLMs by employing a "titration" process with the TruthfulQA dataset. For each question in this dataset, two types of context snippets—relevant and misleading—are incrementally introduced across 16 rounds. The hallucination propensity is tracked using advanced detection metrics, alongside the analysis of internal-state drift through cosine, entropy, JS divergence, and Spearman rank correlation metrics. The results indicate a monotonically increasing frequency of hallucinations, plateauing after several rounds of context injection. A pivotal finding is the "attention-locking" threshold, identified by the convergence of JS-Drift (~0.69) and Spearman-Drift (~0), beyond which hallucinations become self-sustaining and impervious to rectification.

Several notable claims and observations arise from this paper:

Contextual Influence: Relevant contexts drive deeper semantic assimilation, resulting in highly self-consistent hallucinations. Conversely, irrelevant contexts lead to errors through attention re-routing, hence, propagating topic-drift.
Internal Representation Drift: Larger models exhibit higher context sensitivity leading to an increased hallucination rate under irrelevant conditions. This suggests a size-dependent relationship between model capacity and context processing.
Error Mode Dynamics: The paper highlights a compensatory mechanism between semantic assimilation and attention diffusion, which modulates the generation of hallucinations. Larger models tend towards high-confidence yet erroneous outputs due to improved context assimilation.

Empirically, the research provides foundational insights into predicting inherent hallucination risks based on the dynamics of representation shifts. The implications extend towards developing context-aware mitigation strategies and designing LLM architectures, which account for these inherent tendencies while improving error detection capabilities.

Future developments may focus on optimizing attention mechanisms to mitigate these erroneous generation patterns, alongside the exploration of architectural modifications that enhance robustness against misleading contexts. Additionally, there remains potential in leveraging findings for real-time adaptation mechanisms within LLM deployments, ensuring the reliability of outputs in critical and sensitive applications.

This paper marks a significant contribution to the understanding of LLM hallucination dynamics, providing a detailed empirical basis for further exploration in AI research communities intent on refining model reliability and oversight in real-world applications.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Zeyu Wei (6 papers)
Shuo Wang (382 papers)
Xiaohui Rong (2 papers)
Xuemin Liu (2 papers)
He Li (88 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos