Private Working Memory Agents

Updated 14 January 2026

Private Working Memory Agents are autonomous systems that maintain an isolated internal memory to manage secret states and ensure consistency in interactive tasks.
They employ transformer-based architectures, text-blob workflows, and hierarchical subgoal mechanisms to achieve robust performance in long-horizon and adversarial settings.
By integrating defenses like differential privacy and exact unlearning, these agents effectively balance privacy preservation with practical deployment in multi-agent and continual learning applications.

A private working memory agent is an autonomous system—often LLM- or Transformer-based—that is architected to generate, persist, and manage information in an explicit, agent-internal memory structure isolated from public observation and accessible only to the agent itself or a tightly-scoped set of programmatic tools. Such agents are theoretically motivated by tasks where secrecy and stateful consistency are essential, empirically required by long-horizon or multi-task settings, and practically designed to mitigate privacy leakage risks, catastrophic interference, and unintentional state exposure.

1. Formal Motivation and Necessity for Private Working Memory

The necessity of private working memory for agents is mathematically established for the class of Private State Interactive Tasks (PSITs), defined as interactive games or protocols in which an agent must privately generate a secret state and respond consistently without revealing that state through observable actions until protocol completion. In PSITs, for any public-only policy $\pi(\cdot|H_{t-1}, x_t)$ , it is provable that the agent cannot realize both consistency (always responding as if a fixed secret $s$ were chosen and never contradicted) and secrecy (never revealing $s$ before the user has logically determined it) for $|D| \geq 2$ . This is the core impossibility theorem for public-only agents. Empirical studies on LLMs in Hangman and diagnostic protocols confirm that standard and retrieval-augmented LLMs almost never preserve self-consistency in PSITs (e.g., <14% accuracy), whereas explicit private memory workflows achieve up to 100% (Baldelli et al., 11 Jan 2026).

A plausible implication is that any robust interactive agent prospective for deployment in adversarial, strategic, or privacy-critical domains inexorably requires an explicitly maintained, agent-private working memory mechanism.

2. Architectures and Mechanisms of Private Working Memory

2.1 Transformer-Based Agents with Working Memory

The Decision Transformer with Working Memory (DT-Mem) formalizes external memory as a parameterized matrix $M \in \mathbb{R}^{N \times d}$ , where each row is a slot updated and retrieved via attention-based addressing. Given an input trajectory, embedding tokens $E \in \mathbb{R}^{3K \times d}$ are processed by a Transformer backbone to produce $e_{seq}$ . The working memory receives:

slot addressing via $w = \operatorname{softmax}(Q K^T / \sqrt{d})$ with $Q = MW^q$ and $K = EW^k$ ;
write-erase and add signals determined by projections and attention weights; and
updates memory in-place by $M_t = M_{t-1} \circ (1 - \epsilon^e) + \epsilon^a$ .

Reading retrieves contextually relevant slots. Critical to privacy and adaptation, the module is privatized post-pretraining: all core parameters are frozen and fine-tuned only via per-task low-rank LoRA adapters, maintaining strong task isolation (Kang et al., 2023).

2.2 Workflow-Oriented and Text-Blob Agents

In dialog-based agents, private working memory is organized as an interposed text blob (or "scratchpad") with structured sections (e.g., “Goals/Plans”, “Facts/Episodic Buffer”, “Active Notes”). Operations such as READ, WRITE, OVERWRITE, APPEND, or PATCH are invoked per-turn; effective workflows impose a deterministic sequence: (1) generate a public reply, (2) update private memory. Policies are reparameterized as $\pi(\cdot | H_{t-1}, x_t, m_{t-1})$ with $m_t = \mathsf{UPDATE}(m_{t-1}, \Delta m_t)$ (Baldelli et al., 11 Jan 2026).

2.3 Hierarchical Subgoal-Based Working Memory

In HiAgent, the memory is split into subgoal chunks and summaries: $M_t = \left(g_0, s_0, \ldots, g_{n-1}, s_{n-1}, g_n, C_n\right)$ where $C_n$ is the current action-observation chunk for active subgoal $g_n$ , and $(g_i, s_i)$ are summarized completions of historical subgoals. Privacy is achieved by limiting LLM access only to the current chunk $(g_n, C_n)$ ; prior detailed chunks are reduced to summaries until explicitly and temporarily retrieved (Hu et al., 2024).

2.4 Service-Oriented Modular Memory (MaaS)

Under the "Memory as a Service" paradigm, each agent's private working memory is encapsulated in a Memory Container with strict access predicates: $P(u, m, a, c) = \begin{cases} 1, & \text{if } u = \mathrm{owner}(m) \text{ and } a \in \{\mathrm{read}, \mathrm{write}\} \ 0, & \text{otherwise} \end{cases}$ API-level enforcement, hierarchical policy management, and the separation of read, write, update, and delete operations are formalized. This abstraction enables orchestration of private memory modules in agent workflows, supports fragmentation (sharding by topic) and secure, auditable interaction (Li, 28 Jun 2025).

3. Privacy Risks and Defensive Principles

Memory modules storing private user-agent interactions are susceptible to black-box extraction via Memory EXTRaction Attacks (MEXTRA), which exploit the agent's retrieval API to exfiltrate private records. Extraction efficiency and risk scale with:

the vulnerability of the retrieval scoring function (edit-distance far more leaky than embedding cosine similarity),
memory size and retrieval depth,
number and specialization of adversarial prompts,
and model backbone.

Recommended defenses include:

session isolation and per-user memory containers,
aggressive input/output filtering and rate-limiting,
role-based and intent-aware permission schemas,
memory de-identification at insertion,
differentially-private retrieval (adding Laplace noise to similarity scores),
and hardware isolation (TEEs).

Defense trade-offs must balance privacy against utility and latency. For example, de-identification eliminates private entity exposure (EN=0), but reduces QA utility (F1 –12%), and DP-retrieval degrades hit@k (–15%) as privacy increases (Wang et al., 17 Feb 2025).

4. Continual Learning, Task Adaptation, and Exact Private Unlearning

Private working memory is essential for agents required to "forget" tasks cleanly while preserving other competencies. The Continual Learning and Private Unlearning (CLPU) framework formalizes exact unlearning: after an explicit forget request, the agent's parameters must be statistically indistinguishable from an agent that never saw the data. CLPU-DER++ achieves this via:

a main model for permanent knowledge,
per-task temporary "scratchpad" models as private memory slots for transient tasks,
episodic buffers containing only local task data and outputs.

Forgetting is strictly a deletion of the slot and buffer, guaranteeing compliance with the privacy definition. Empirical results demonstrate JS-ratio $\approx 0.1$ –$0.2$ and IRR up to $1.00$, indicating near-perfect unlearning (Liu et al., 2022).

5. Empirical Outcomes, Evaluation, and Limitations

5.1 Performance Highlights

DT-Mem: With only 10% of the data for unseen Atari games, DT-Mem achieves or surpasses human-best in 8/9 games when fine-tuned via LoRA adapters, with a 20M parameter model outperforming a 200M Multi-Game DT by +29.9% in DQN-normalized score for zero-shot (Kang et al., 2023).
HiAgent: Across five long-horizon agent benchmarks, hierarchical working memory management doubles success rates (21%→42%) and reduces context size (65% tokens vs baseline), enabling better performance for long-horizon and step-intensive tasks (Hu et al., 2024).
Private Memory in PSITs: Overwrite/patch-style explicit private memory workflows achieve self-consistency accuracy of up to 100% on discrete secret tasks where all standard and retrieval-based approaches fail (Baldelli et al., 11 Jan 2026).

5.2 Limitations and Open Problems

Scalability: Maintaining multiple temporary memory slots/networks inflates memory overhead. Hierarchical summarization and sharding are necessary as agent interaction length grows.
Transparency vs. Secrecy: Private memory shields reasoning from users and auditors; formal methods for verifiable memory snapshots and zero-knowledge protocols are required.
Multi-agent Complexity: Service-oriented memory requires complex policy and encryption management in cross-entity coordination.
Forgetting-Compositionality Trade-off: Selective unlearning is more difficult for highly correlated tasks; the Pareto trade-off between model size, learning performance, and unlearning privacy is not fully characterized (Liu et al., 2022).

6. Applications and Design Best Practices

Private working memory agents are foundational for:

Multi-agent collaboration and negotiation,
Privacy-preserving personal assistants operating with confidential user data,
Long-horizon planning and decompositional reasoning,
Continual lifelong learning with explicit unlearning requirements.

Best practices for design and operation include minimizing retrieval set size, de-identifying records at insertion, regular red-teaming and audit logging, fine-grained permission enforcement, and modularization of memory sharding. Integration with public memory is performed only through summarized or redacted exports via governed policy injection (Wang et al., 17 Feb 2025, Li, 28 Jun 2025).

Private working memory is now a theoretically required and empirically validated foundation for robust, privacy-aware, and universally composable agent autonomy across reinforcement learning, language-agent, and collaborative architectures. The field continues to advance along axes of efficiency, privacy, modularity, and explainable state management.