MemoryLLM: Self-Updatable Memory Pools

Updated 24 September 2025

MemoryLLM is a self-updatable memory pool architecture that enables large language models to dynamically absorb and integrate new information.
It employs a split design with static backbone weights and mutable memory parameters updated via an injection function to maintain long-term retention and controlled forgetting.
Empirical evaluations demonstrate superior model editing, long-context handling, and operational robustness even after millions of memory updates.

A self-updatable memory pool, as exemplified by MemoryLLM, is a mechanism that endows LLMs with the capacity to dynamically absorb and retain new knowledge throughout deployment. Rather than remaining static after training, such models can self-modify a substantial subset of their parameters—the memory pool—thereby facilitating knowledge injection, long-term retention, operational robustness under continual updates, and empirical controllability. This paradigm directly addresses the limitations of fixed parametric memory and opens new avenues for long-context reasoning, model editing, and real-world knowledge assimilation.

1. MemoryLLM Architecture and Latent Space Memory Pools

MemoryLLM augments a transformer-based LLM (notably Llama2) with a fixed-size memory pool embedded within the latent space of every transformer layer. In each layer $l$ , the memory pool is represented as $N$ hidden vectors of dimension $d$ , denoted as $\theta_l \in \mathbb{R}^{N \times d}$ . The complete model comprises static backbone weights $\varphi$ and the self-updatable memory parameters $\theta$ , forming the composite model $\mathcal{M}_{(\theta,\varphi)}$ .

During forward pass, both the input sequence tokens and all memory pool tokens are processed simultaneously by the self-attention modules, allowing each token to attend to the compressed knowledge stored in the layer’s memory pool. This explicit division between persistent parameters $\varphi$ and mutable memory $\theta$ ensures that new knowledge can be integrated at multiple abstraction levels without perturbing the core backbone.

Upon encountering new textual knowledge $x$ , the entire memory pool is updated via an injection function $U$ : $\theta' = U(\theta, x)$ For a sequence of updates $(x_1, \ldots, x_n)$ ,

$\theta_n = U(\ldots U(U(\theta, x_1), x_2), \ldots, x_n)$

This persistent yet bounded latent memory architecture creates a split-knowledge substrate: static truths in $\varphi$ , updatable facts in $\theta$ .

2. Self-Update Mechanism and Controlled Forgetting

The update mechanism for each layer is incremental, operating on a sliding window. For new context $x_c$ , the process is:

Extract the last $K$ memory tokens $\{e_\theta^l\}$ from $\theta_l$ .
Concatenate these $K$ tokens with the hidden states $h_l$ derived from $x_c$ .
Pass this extended sequence through the transformer layer $\varphi_l$ .
Overwrite the last $K$ entries in $\theta_l$ with the last $K$ output tokens, and shift the memory tokens: older memory is “pushed out” as updates are ingested.

This yields exponential memory decay. After each update, a fraction $K/N$ is replaced; thus, the retention for a token after $T$ updates is: $\text{Retention} \approx (1 - K/N)^T$ After $N/K$ updates, this retention approaches $1/e$, closely mapping to the Ebbinghaus Forgetting Curve and allowing both efficient freshness and graceful forgetting. Knowledge is thus neither immediately erased nor compressed into parametrization, but managed under explicit capacity constraints.

3. Empirical Evaluation and Benchmarks

MemoryLLM’s empirical properties are established via standard and custom benchmarks:

Model Editing: On zsRE and CounterFactual, memory editing performance is evaluated on three axes: efficacy (recall of newly injected facts), generalization (response accuracy to rephrased/related queries), and specificity (preservation of unrelated facts). MemoryLLM with editing enforcement (w/EF) surpasses fine-tuning, ROME, IKE, and others.
Long Context Handling: On LongBench, MemoryLLM maintains or improves F1 as sequence lengths increase, outperforming backbone models by leveraging its memory pool for persistent knowledge recall.
Retention Trajectory: On SQuAD and NaturalQA, after 20 sequential injections, theoretical and measured knowledge retention are compared. The model approaches the exponential decay upper bound, retaining substantial knowledge beyond natural parametric memory capacity.
Operational Robustness: Over 650,000–1,000,000 memory updates, no degradation in updated-fact recall or overall prediction performance is observed, confirming the regularized integrity of memory self-updates.

Benchmark	Metric	MemoryLLM (w/EF) Performance
zsRE, CounterFactual	Efficacy	Stronger than FT, ROME, IKE, etc.
LongBench (long context)	F1 Score	Maintained/improved with context length
SQuAD/NaturalQA (retention)	Recall	Approaches exponential decay upper bound

4. Applications and Deployment Considerations

MemoryLLM’s self-updatable design is applicable to a range of knowledge-intensive and long-context scenarios:

Dynamic Fact Integration: Enables continual knowledge injection for domains with rapid knowledge change (e.g., real-time news, scientific updates).
Model Editing: Supports surgical updates and corrections without full retraining, applicable to chatbots, virtual assistants, and interactive systems requiring factual grounding.
Long-context and Multi-turn QA: Maintains long-range coherence across large context windows, supporting summarization and agent dialog.
Continual Learning and Robustness: Demonstrated integrity after $10^6$ updates makes it suitable for persistent deployments in environments with streaming, evolving data.

Crucially, self-updatable memory pools can be integrated with other transformer architectures or extended to multi-modal settings. Provided model code and checkpoints (available at the cited repository) facilitate reproducibility and adaptation to different domains.

5. Open Source Release and Accessibility

The MemoryLLM codebase, including self-update algorithms, memory pool management, and training routines, is openly available. This enables:

Replication of empirical results and independent verification of memory update integrity.
Adaptation and extension to other LLMs or specialized architectures.
Community-driven research into dynamic memory integration and efficient long-context retention.

Such openness is intended to accelerate both benchmark-oriented development and practical deployment of self-updatable LLMs.

6. Theoretical and Practical Implications

The MemoryLLM framework embodies several critical theoretical and engineering advances:

Split Knowledge Representation: Decouples persistent and adaptive knowledge, enabling explicit operational control over updatable regions.
Efficient Exponential Forgetting: Balances freshness and retention, minimizing catastrophic forgetting while avoiding stale knowledge accumulation.
Operational Stability: Achieves robust continual operation at scale, which is a prerequisite for persistent, adaptive AI agents.
Practical Deployment: Supports robust, just-in-time knowledge updates with minimal overhead, suitable for real-world production systems.

Integration of self-updatable memory pools in LLMs signals a shift from static, monolithic modeling toward incremental, governable AI systems capable of adapting autonomously to dynamic information flows, extensive contexts, and evolving requirements.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Self-Updatable Memory Pools (MemoryLLM).