Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs (2505.09338v2)

Published 14 May 2025 in cs.CL

Abstract: We observe a novel phenomenon, contextual entrainment, across a wide range of LLMs (LMs) and prompt settings, providing a new mechanistic perspective on how LMs become distracted by irrelevant'' contextual information in the input prompt. Specifically, LMs assign significantly higher logits (or probabilities) to any tokens that have previously appeared in the context prompt, even for random tokens. This suggests that contextual entrainment is a mechanistic phenomenon, occurring independently of the relevance or semantic relation of the tokens to the question or the rest of the sentence. We find statistically significant evidence that the magnitude of contextual entrainment is influenced by semantic factors. Counterfactual prompts have a greater effect compared to factual ones, suggesting that while contextual entrainment is a mechanistic phenomenon, it is modulated by semantic factors. We hypothesise that there is a circuit of attention heads -- the entrainment heads -- that corresponds to the contextual entrainment phenomenon. Using a novel entrainment head discovery method based on differentiable masking, we identify these heads across various settings. When weturn off'' these heads, i.e., set their outputs to zero, the effect of contextual entrainment is significantly attenuated, causing the model to generate output that capitulates to what it would produce if no distracting context were provided. Our discovery of contextual entrainment, along with our investigation into LM distraction via the entrainment heads, marks a key step towards the mechanistic analysis and mitigation of the distraction problem.

Summary

Mechanistic Analysis of Contextual Entrainment and Distraction in LLMs

The paper by Jingcheng Niu et al. presents a detailed exploration of contextual entrainment in LLMs (LMs), a concept defining how models disproportionately prioritize previously presented elements within a context prompt notwithstanding their semantic relevance. This phenomenon, observed across various LLMs and configurations, suggests an underlying mechanistic behavior that can lead to distraction, where irrelevant context information skews model outputs toward misconstrued conclusions.

Contextual Entrainment Phenomenon

Contextual entrainment is predicated on the observation that LMs show a significant inclination to reproduce tokens that have already appeared in the given context. This tendency applies even if such tokens are randomly chosen and semantically incongruent with the query's content. The paper elucidates this through rigorous experimentation involving different LM architectures including GPT-2 XL and LLaMA variants. The experimental results robustly indicate that LMs, regardless of their sophistication or scale, succumb to this bias, elevating the probabilities of tokens presented within the prompt’s context irrespective of their utility in resolving the main query posed.

Semantic Influence and Counterfactual Prompts

While the core of contextual entrainment is mechanistic, the paper also identifies that semantic factors modulate its intensity. Counterfactual prompts, containing misinformation, exert stronger distortions, showing that although LMs mechanistically latch onto contextual tokens, semantic discrepancies amplify this effect. This highlights an important vector for potential misinformation risks, as LMs might amplify inaccurate information if presented contextually, thus facilitating disinformation propagation.

Identification of Entrainment Heads

Niu et al. postulate that a subset of attention heads, which they term "entrainment heads," is primarily responsible for contextual entrainment. Using a differentiable masking approach, the paper identifies these heads and demonstrates that their deactivation significantly mitigates contextual entrainment without severely impacting other functionalities of the LLM. This analysis contributes to a more nuanced understanding of how intrinsic model components facilitate or mitigate distracted inference, thereby opening avenues for context sensitivity refinement in LLMs.

Theoretical and Practical Implications

The implications of this research are manifold. Theoretically, defining contextual entrainment as a mechanistic but semantically tunable construct aids in reconciling inductive biases within neural networks with their emergent information processing capabilities. Furthermore, it suggests that in-context learning and sequence learning may not distinctly differ, but rather overlap with contextual entrainment 'induction' as a common thread linking disparate phenomena observed in LLM operations.

Practically, the identification of entrainment heads offers a tangible target for enhancing prompt resilience in real-world applications. For instance, retrieval-augmented generation (RAG) systems can leverage these insights to tailor retrieval strategies that counteract entrainment, thereby improving robustness to irrelevant distractors.

Future Research Directions

This work suggests several pathways for future AI developments. Enhancing LLM architectures to self-detect and mitigate entrainment could bolster model robustness against misinformation. Further exploration of interaction effects between different attention heads might yield new training regimes that reduce reliance on irrelevant context. Additionally, this work may inspire novel architectures incorporating feedback loops that dynamically adjust focus, promoting a finer balance between inductive and deductive reasoning processes within AI systems.

In conclusion, this inquiry into LLM distraction through contextual entrainment not only elucidates inherent biases in token prioritization but also lays the groundwork for refined interpretability and robustness in AI language processing. As LMs increasingly underpin complex information systems, such insights are indispensable for designing future models that are not just powerful but also contextually discerning and reliable.

Tweets

https://twitter.com/fly51fly/status/1923130309307961377

https://twitter.com/GptMaestro/status/1926362034368544958