Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 58 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 12 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 95 tok/s Pro

Kimi K2 179 tok/s Pro

GPT OSS 120B 463 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Unable to Forget: Proactive lnterference Reveals Working Memory Limits in LLMs Beyond Context Length (2506.08184v2)

Published 9 Jun 2025 in cs.CL, cs.AI, and q-bio.NC

Abstract: Information retrieval in LLMs is increasingly recognized as intertwined with generation capabilities rather than mere lookup. While longer contexts are often assumed to improve retrieval, the effects of intra-context interference remain understudied. To address this, we adapt the proactive interference (PI) paradigm from cognitive science, where earlier information disrupts recall of newer updates. In humans, susceptibility to such interference is inversely linked to working memory capacity. We introduce PI-LLM, an evaluation that sequentially streams semantically related key-value updates and queries only the final values. Although these final values are clearly positioned just before the query, LLM retrieval accuracy declines log-linearly toward zero as interference accumulates; errors arise from retrieving previously overwritten values. Attempts to mitigate interference via prompt engineering (e.g., instructing models to ignore earlier input) yield limited success. These findings reveal a fundamental constraint on LLMs' ability to disentangle interference and flexibly manipulate information, suggesting a working memory bottleneck beyond mere context access. This calls for approaches that strengthen models' ability to suppress irrelevant content during retrieval.

Summary

The paper introduces the PI-LLM evaluation paradigm to show that proactive interference, not context length, chiefly limits LLM memory.
It reveals a log-linear decline in retrieval accuracy as accumulated interference increases across models from small scales to 600B parameters.
The study proposes the Interference Endurance Score and suggests new architectural strategies to enhance LLM memory management.

Limitations in Working Memory Capacity of LLMs: Insights from Proactive Interference

This paper presents a detailed investigation into the constraints of working memory in LLMs, focusing on the effects of proactive interference (PI) as a fundamental barrier to efficient information retrieval. The paper is anchored in adapting cognitive science paradigms and highlights how interference impacts LLM performance beyond mere limitations imposed by input context length.

Key Findings and Analytical Methodology

The authors introduce the PI-LLM evaluation paradigm, systematically testing LLMs by presenting semantically similar but progressively irrelevant information that precedes each query. This setup mirrors proactive interference observed in human cognitive processes, where recent updates to key-value pairs—clearly positioned for retrieval—are overshadowed by preceding irrelevant information. As a result, LLM accuracy displays a robust, log-linear decline in retrieval performance as interference accumulates, observed consistently across all tested models.

The core outcome of the paper is that interference, rather than context length, serves as an independent determinant of retrieval effectiveness. This is corroborated by experiments that fix input length while varying interference magnitude, demonstrating that interference distinctly limits LLM capacity. Models tested range from small-scale structures to vast architectures exceeding 600 billion parameters, yet the interference bottleneck persists, reinforcing the hypothesis that this limitation is architecturally ingrained.

Implications for Model Design and Cognitive Resilience

The paper introduces the Interference Endurance Score (IES), a metric to quantify models' resistance to proactive interference, revealing that larger models with higher parameter counts exhibit superior memory handling capabilities, contrary to expectations based solely on context window size. Interestingly, some architectural designs like Mixture-of-Experts (MoE) are less effective than dense models in mitigating interference, suggesting the need for deeper representation strategies within model architecture.

Additionally, attempts to leverage natural language prompts for strategic forgetting—simulated instructions for models to ignore preceding data—yield only marginal improvements. This reveals a critical gap in LLMs' ability to modulate memory representations through user-directed cues, highlighting the absence of effective executive control mechanisms akin to human cognition.

Proposed Directions for Enhancing LLM Memory Mechanisms

The findings emphasize the necessity of developing model architectures that can emulate human-like memory functionalities, particularly gating mechanisms to selectively suppress or prioritize content during retrieval. Incorporating such features might advance LLMs' capability to manage interference dynamically, thereby improving their suitability for tasks involving extensive and repeated data updates.

Furthermore, practical applications could benefit from incorporating strategies similar to the mock QA reset identified in the paper, which demonstrated temporary improvements by simulating context boundaries. However, a reliance on such hacks underlines the need for intrinsic memory control enhancements to fully close the gap between LLM and human working memory performance.

Conclusion and Prospect for Research

This work underlines proactive interference as an intrinsic limitation in LLMs, offering new metrics for evaluating retrieval capabilities while suggesting routes for architectural innovation. The paper effectively bridges cognitive psychology frameworks with artificial intelligence, providing structured insights into model evaluation and development of memory mechanisms. Future research should focus on refining memory management techniques in LLMs, essential for realizing their potential in complex, real-world data-processing environments. The public release of code and datasets encourages continued exploration and improvement of LLM architectures and their cognitive parallels.