A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

Published 17 Apr 2026 in cs.CR, cs.AI, and cs.CL | (2604.16548v1)

Abstract: Research on LLM security is shifting from "will the model leak training data" to a more consequential question: can an agent with persistent, long-term memory be continuously shaped, cross-session poisoned, accessed without authorization, and propagated across shared organizational state? Recent surveys cover memory architectures and agent mechanisms, but fewer center the epistemic and governance properties of persistent, writable memory as the reason memory is an independent security problem. This survey addresses that gap. Drawing on cognitive neuroscience and the philosophy of memory, we characterize agent memory as malleable, rewritable, and socially propagating, and develop a memory-lifecycle framework organized around six phases -- Write, Store, Retrieve, Execute, Share, Forget/Rollback -- cross-tabulated against four security objectives: integrity, confidentiality, availability, governance. We organize the literature on memory poisoning, extraction, retrieval corruption, control-flow hijacking, cross-agent propagation, rollback, and governance, and situate representative architectures as determinants of which phases are explicitly governable. Three findings stand out: the literature concentrates on write- and retrieve-time integrity attacks, while confidentiality, availability, store/forget, and benign-persistence failures remain sparsely studied; no published architecture covers all nine governance primitives we identify; and using LLMs themselves for memory security remains sparse yet essential. We unify these under mnemonic sovereignty -- verifiable, recoverable governance over what may be written, who may read, when updates are authorized, and which states may be forgotten -- arguing future secure agents will be differentiated not only by recall capacity, but by memory governance quality.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces mnemonic sovereignty by framing long-term memory as a mutable, persistent epistemic core and revealing unique security vulnerabilities.
It proposes a lifecycle-organized analytical framework that maps six operational phases against four security objectives to expose multifaceted attack surfaces.
Empirical findings highlight high attack success rates and critical research gaps in memory governance, compression risks, and forensics.

Surveying Security in LLM Agent Long-Term Memory: The Imperative of Mnemonic Sovereignty

Introduction: Context and Motivation

LLM system security has shifted focus from parametric memorization and single-session prompt manipulation to persistent, writable, long-term agent memory as the new central object of risk. The paper "A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty" (2604.16548) frames agent memory not merely as an add-on cache but as a mutable, cross-session, and propagating epistemic core of autonomous systems. The core argument is that the qualitative properties of this memory—statefulness, persistence, and propagation—create attack and governance challenges that are fundamentally distinct from prompt injection and RAG poisoning.

The authors introduce a lifecycle-organized analytical framework that cross-tabulates six phases—Write, Store, Retrieve, Execute, Share, and Forget/Rollback—against four security objectives: Integrity, Confidentiality, Availability, and Governance. This structure exposes the multifaceted attack surface and the pronounced research gaps in lifecycle management and memory governance.

From Human Memory to Agent Security: Conceptual Foundations

Drawing on cognitive neuroscience and philosophy of memory, the survey emphasizes that memory is reconstructive and malleable, not archival. Human mechanisms—such as source monitoring, reconsolidation, and social contagion—map structurally onto LLM agent phenomena: provenance failure, read-time rewriting, and memory propagation, respectively. The implication is not a naive biological analogy but a design-mandating structural correspondence that informs architectural requirements.

Figure 1: Five structural analogies between human memory vulnerabilities and agent memory security design requirements: provenance validation, separated read-write paths, scope isolation, conflict detection, and distinguished audit logs.

The central philosophical stake is captured by the notion of "mnemonic sovereignty": the requirement that an agent system verifiably governs what may be written, who may read, when updates are authorized, and which states may be forgotten. The inability to enforce these properties exposes not just the system’s data but its interpretive authority over its operational past.

Memory Lifecycle Framework

The memory lifecycle framework decomposes the attack surface across six operational stages:

Figure 2: The six-phase lifecycle for agent memory, with phase-specific attack points and cross-cutting security objectives. Feedback loops indicate reconsolidation and experiential learning.

The survey highlights that most primary work clusters in Write and Retrieve/Execute—i.e., entry and activation points for attack—while Store, Share, and Forget/Rollback phases are critically under-studied.

Figure 3: Heatmap of the literature’s distribution across lifecycle phases and objectives, showing severe undersupply in availability, store, and forget/governance cells.

Attack Taxonomy: From Write-Path Poisoning to Multi-Agent Contagion

Write-Path Attacks

The threat model has evolved from adversarial database writes to query-only and pure environment-based poisoning. Query-only attacks (e.g., MINJA) exploit agent mechanisms to induce the system to author its own infected memory; environmental attacks (e.g., eTAMP) show that simple observation manipulation is sufficient for persistent cross-session contamination.

Figure 4: Historical escalation of write-path attacks: reduced attacker privilege, increased persistence, broader propagation.

RAG corpus poisoning research—while not completely isomorphic—provides baseline threat models. Novel findings include environment-injected frustration amplification and the transferability of retrieval triggers across embedding architectures, with attack success rates above 80–90% under minimal poisoning budgets.

Storage and Compression Risks

Store-phase attacks primarily concern amplification: poisoned entries are disproportionately promoted during summarization and reflection, becoming high-priority "lessons." Critically, experimental validation of compression-amplified toxin survival and influence remains lacking, a gap flagged as high-priority. Provenance loss during compression and absent audit/versioning mechanisms foreclose possibility for reliable rollback and forgetting.

Retrieval, Execution, and Control-Flow Hijacking

Retrieved memory is shown to be an action steering, not neutral, phase. Attack surface expansion is evident via MCFA-style control-flow hijacking and procedural memory grafting, which demonstrate that retrieved entries can override explicit user instructions and directly bias planning or tool invocation flows.

Figure 5: Full attack chain from external content through memory write, summarization, retrieval, and action steering, illustrating multi-session and multi-phase decoupling.

Multi-agent and multi-user environments recapitulate social contagion. Classical RBAC and static access control are shown to be structurally inadequate; information-flow control (IFC) and capability-based models are necessary but not yet standard. Empirical results demonstrate worm-like propagation (e.g., Agent Smith, ComPromptMized), with internal agent channels and tool-call arguments dominating aggregate leakage, often unseen in output-focused monitoring.

Confidentiality: Extraction, User Invisibility, and Internal-Channel Leakage

Memory extraction is achievable with black-box locator/aligner query strategies (MEXTRA) and embedding inversion, both of which challenge assumptions regarding confidentiality of indexed or vectorized state. User studies confirm that retention and deletion visibility are systemically poor, undermining contextual integrity and GDPR-aligned consent standards. Critically, internal channels now produce higher system-wide exposure in multi-agent deployments than in classic single-agent settings.

Forgetting, Rollback, and Memory Forensics

No production architecture supports verifiable, cross-substrate, cascading deletion of poisoned or sensitive memory. Memory forensics—supporting actionable post-breach rollback and traceback—requires versioned, provenance-rich storage, which is broadly absent in deployed systems. The tension between audit-mandatory retention and privacy-driven deletion remains unresolved both technically and normatively. Without chain-of-custody lineage, downstream deletions often fail to reach derived summaries or model weights.

Defense Mechanisms and Architectural Gaps

Evaluated defenses cluster at retrieve/execute phases (clustering, trust-aware retrieval, IFC separation), with few write-time or store-time interventions. Pre-consolidation validation, provenance-rich (e.g., MemCube) memory units, and human-in-the-loop freezing are being explored but lack comprehensive adaptive attacker evaluation.

Lifecycle-spanning architectural analysis reveals that no system implements all nine necessary primitives for mnemonic sovereignty; deficiencies are particularly acute around write-gate enforcement and verified deletion.

Figure 6: Coverage of mnemonic-sovereignty primitives across leading memory architectures, revealing systemic governance gaps.

Figure 7: Five sovereignty primitives as a dependency tower, with deployment maturity dropping sharply as one ascends from basic authorization to verifiable forgetting.

Research Gaps and Agenda

Major unaddressed problems include:

Lifecycle-wide composability and cross-phase defensive evaluation remain unexplored; observed interactions suggest phase-specific defenses can be systematically evaded or undermined by lifecycle attack rerouting.
No existing benchmarks evaluate fate of poisoned or sensitive records across write, store, retrieve, and forget phases under both adversarial and benign persistence regimes.
Automated LLM-driven memory red teaming—a necessity for robust stress-testing and adaptive defense evaluation—is sparse, with no generalizable pipeline equivalent to recent adversarial prompt-injection evaluation.
Overhead of governance mechanisms (provenance, audit, enforced integrity) is largely unmeasured; deployability in production environments is unproven.

The survey identifies lifecycle-anchored benchmarks, Area 2 (LLM-as-tool for both attack and defense), composable defense architectures, and cross-substrate verified deletion as top priorities.

Implications and Speculation on Future Directions

Theoretical implications are profound: securing agent memory is no longer analogous to securing a database or context manager, but is the central locus for controlling AI interpretive authority, continuity, and reliable attribution. Practically, agents will only be deployable in domains requiring safety, compliance, or traceability if mnemonic sovereignty primitives are implemented, composably evaluated, and cost-effectively deployed. Architecturally, harness-level enforcement (e.g., harness-managed virtual memory, typed write gating) and causal dependency graphs for deletion/forensics will be critical for post-breach resilience and regulatory compliance.

As agent-agent interaction creates “gray-zone” memories lacking clear principal attribution, fundamentally new governance abstractions will be required.

Conclusion

The survey provides a rigorous, phase-organized synthesis of emerging risks and open challenges for long-term memory in LLM agents. It introduces and operationalizes mnemonic sovereignty as the necessary goal for governance, integrating integrity, confidentiality, availability, and provenance accountability. The field is characterized by attack-dominated research, missing composable defenses, and an absence of robust, adaptive benchmarking. The way forward requires architectural innovation, cross-lifecycle evaluation, and a shift from passive record-keeping to actively governed, audit-ready mnemonic infrastructures.

The capacity for secure, governable, and verifiably self-forgetting agent memory will shape the boundaries of trustworthy AI deployment in both closed and open world environments. Progress on the posed open problems will determine when, and if, LLM agents can be trusted with sensitive, enduring, and organizationally consequential memories.

Markdown Report Issue