Papers
Topics
Authors
Recent
2000 character limit reached

Self-Updatable Memory Pools

Updated 19 December 2025
  • Self-updatable memory pools are dynamic architectures that allow for controlled post-deployment updates, supporting continual learning and system resilience.
  • They separate static core logic from dynamic memory segments, enabling rapid updates, controlled forgetting, and efficient parameter management.
  • Applications in neural models (MemoryLLM) and cloud systems (Vmem) demonstrate improvements in performance, scalability, and operational stability.

Self-updatable memory pools are a class of architectural and algorithmic solutions that permit components or systems to dynamically update, augment, and manage a pool of memory or parameters post-deployment, either for purposes of continual learning (as in neural models) or for online reliability, elasticity, and maintainability (as in high-availability cloud infrastructure). Recent advances span domains ranging from transformer-based neural architectures with “latent” memory tokens to OS-level cloud server memory management with live upgradeability. Implementations prioritize low-overhead, fine-grained updatability, operational stability under high rates of change, and preserved or improved task performance.

1. Architectural Principles and Core Design

Self-updatable memory pool designs separate the “static” core logic from memory regions or parameter pools that admit controlled, dynamic updates without disruptive retraining or rebooting. Two primary instantiations are prominent.

In neural architectures (MemoryLLM):

A transformer (e.g. Llama2-7B, denoted φ) is augmented with a set of fixed-size memory token pools θ, where, for each transformer layer l{1,,L}l\in\{1,\dots,L\}, the memory pool consists of NN tokens θlRN×d\theta_l \in \mathbb{R}^{N\times d}. At inference (“generation”), the model’s hidden states attend jointly to themselves and all memory tokens. At update (“self-update”), only a subset of the memory pool is processed to integrate new information, inducing an approximately exponential decay over older memories. No gradients are used for the update step at inference time, supporting rapid, low-latency inserts and deletions (Wang et al., 7 Feb 2024).

In cloud infrastructure (Vmem):

A reserved physical memory pool (MreservedM_{\mathrm{reserved}}) is managed by a two-module kernel architecture. The stable interface (vmem.ko) exposes device semantics (e.g., /dev/vmem) while the core logic module (vmem_mm.ko) manages allocations, reservations, and fast mapping. Hot-upgradeability is enabled via atomic function pointer swaps and RCU-protected mechanism; update and replacement of core logic occur without interruption of service, and existing allocations remain valid across versions (Zheng et al., 13 Nov 2025).

2. Formal Definition, Data Structures, and Update Algorithms

MemoryLLM Formalization:

  • For each layer ll, θlRN×d\theta_l \in \mathbb{R}^{N\times d} is trainable; initialization is random or from a pre-trained pool.
  • At inference, attention is over [hl;θl][h_l; \theta_l]:

Q=hlWQ,K=[hl;θl]WK,V=[hl;θl]WVQ = h_l W_Q,\quad K = [h_l; \theta_l] W_K,\quad V = [h_l; \theta_l] W_V

Output is hl+1=softmax(QKTd)Vh_{l+1} = \operatorname{softmax}\left(\frac{QK^T}{\sqrt{d}}\right)V.

  • At update:
    1. Extract last KK memory tokens eθl=θl[NK:N]e_\theta^l = \theta_l[N-K:N].
    2. Concatenate with hlh_l, propagate through φl\varphi_l.
    3. The resulting KK outputs replace KK randomly selected old memory tokens, maintaining pool size NN.

Vmem Formalization:

  • Reserved memory MreservedM_\mathrm{reserved} is divided into $2$MB slices, tracked per-NUMA node by a $1$-byte array.
  • Each VM maintains a “fastmap” structure for rapid virtual-to-physical address translation, with each entry mapping a contiguous memory segment.
  • Hot-upgrade proceeds by atomically swapping interface function pointers, iteratively updating VMA/file ops via RCU, and updating reference counts for safe core module unload.

Pseudocode Example (MemoryLLM self-update):

1
2
3
4
5
6
7
8
9
10
11
function SELF_UPDATE(x_c, θ):
    h_1  embed(x_c)
    for l in 1L do:
      e_l  θ[l][N-K : N]
      inp  concatenate(e_l, h_l)
      out  φ_l(inp)
      e_l' ← out[n_c : n_c+K]
      keep_idx  random_subset(0N-1, size=N-K)
      θ_l_new  concatenate(θ[l][keep_idx], e_l')
      h_{l+1}  out[0:n_c]
    return θ_new = {θ_l_new}_{l=1}^L
(Wang et al., 7 Feb 2024)

Pseudocode Example (Vmem hot-upgrade):

1
2
3
4
5
6
7
8
9
10
11
12
13
function hot_upgrade(vmem, old_mod, new_mod):
    insmod new_mod
    atomic_replace(vmem.cdev.ops, new_mod.cdev_ops)
    call_rcu(grace_period, () => {
      for f in vmem_fastmap_list:
        for v in f.vmas:
          rcu_assign_pointer(v->vm_ops, new_mod.vm_ops)
          rcu_assign_pointer(v->vm_file->f_op, new_mod.file_ops)
    })
    new_mod.refcnt += old_mod.refcnt
    old_mod.refcnt = 0
    synchronize_rcu()
    module_put(old_mod)
(Zheng et al., 13 Nov 2025)

3. Training Objectives, Performance Metrics, and Evaluation Protocols

MemoryLLM (Neural Model)

  • Training Losses:
    • LnewL_{\text{new}}: Next-token prediction after self-update with gradient/no-gradient over new memory insert.
    • LcontL_{\text{cont}}: Sequential document update, measuring long-range integration.
    • LforgetL_{\text{forget}}: Alternating main/side document updates to quantify controlled forgetting.
  • Evaluation:
    • Model editing: ZsRE and CounterFactual benchmarks, reporting efficacy, generalization, specificity, and harmonic mean (score).
    • Long-context QA: LongBench, F1 vs. context lengths up to $100$k.
    • Retention: SQuAD, NaturalQA; measure accuracy after repeated unrelated updates compared to the exponential decay bound:

    atbound=(a1abase)(1KN)t1+abasea_t^{\mathrm{bound}} = (a_1 - a_\mathrm{base})\left(1-\frac{K}{N}\right)^{t-1} + a_\mathrm{base} - Integrity: Accuracy on just-injected items after $650,000$ updates to detect drift or catastrophic forgetting.

Vmem (Cloud Memory Pool)

  • Performance Metrics:

    • Sellable memory increase (ΔM2%\Delta M \approx 2\%).
    • VM boot time (e.g., $373$GB: $100$s Hugetlb vs. $0.6$s Vmem).
    • Network throughput (+10%+10\% on DPU-accelerated VMs).
    • Metadata overhead (OVmem5O_\mathrm{Vmem} \approx 5MB vs OOS6O_\mathrm{OS} \approx 6GB on $384$GB host).
  • Upgrade Latency:
    • Mean 2.1μ2.1\mus, $99$th percentile 3.5μ3.5\mus per hot-upgrade event.

4. Comparative Quantitative Results

Model / System Benchmark Legacy Baseline Prior Art Self-updatable Pool Variant
MemoryLLM-7B ZsRE (score) Llama2-7B: 55.6 ROME: 69.3 79.2
MemoryLLM-7B CounterFactual (score) Llama2-7B: 20.7 ROME: 69.2 75.3
MemoryLLM-7B Knowledge retention (a₁) SQuAD: 0.80 / NatQA: 0.75
MemoryLLM-7B After 20 unrelated updates Accuracy: ~0.50 (bound ~0.53)
Vmem Sellable memory (ΔM) ~2% increase
Vmem VM boot, 373GB 100 s (Hugetlb) 0.6 s
Vmem Hot-upgrade latency 2.1–3.5 μs
Vmem Metadata overhead (384GB host) 6 GB (struct page) 5 MB (<0.0013× OS)

(Wang et al., 7 Feb 2024, Zheng et al., 13 Nov 2025)

5. Hyperparameters, Scalability, and Trade-Offs

MemoryLLM (hyperparameters for 7B backbone):

  • Layers L=32L = 32, hidden size d=4096d = 4096, N7680N \approx 7680 memory tokens per layer, K=256K = 256 slots updated per input, K/N0.0333K/N \approx 0.0333.
  • Update frequency: once per paragraph. Pure insertion/deletion at inference (no learning rate).

Vmem (scaling characteristics):

  • Slice tracking per node: $2$MB granularity. Metadata: \sim120B per VM, $24$B per segment.
  • Overhead grows linearly with VM count and slice count; remains \lesssim5MB on large hosts (e.g. 300,000 servers, hundreds of millions of VMs).
  • Upgrade overhead remains μ\mus-scale. Data structure compatibility must be rigorously maintained; fields added only with reserved padding, limiting flexibility for large structural changes.

Trade-offs include explicit forgetting (MemoryLLM, via replacement and exponential decay), complexity in testing hot-swap code paths (Vmem), limited extension flexibility for complex changes, and concurrency contention (Vmem) during upgrade.

6. Application Domains and Significance

LLMs and Continual Learning:

Self-updatable memory pools such as those in MemoryLLM enable post-deployment injection of new knowledge and long-term information retention, thus bridging the gap between static pre-trained models and dynamically updatable knowledge bases. Empirically, such models surpass existing architectural and model-editing baselines on efficacy, generalization, specificity, and integrity after repeated updates (Wang et al., 7 Feb 2024). This supports scalable, practical deployment in settings with evolving knowledge requirements.

Cloud Infrastructure:

Vmem demonstrates how self-updatable memory pools can enable production cloud platforms to maximize sellable memory, decrease VM start latency, improve network throughput, and support live module upgrades without disrupting running VMs. The separation of a stable interface and a swappable core logic module, combined with fast metadata and mapping techniques, supports highly elastic, stable, and scalable operations in environments serving hundreds of millions of VMs (Zheng et al., 13 Nov 2025).

These architectures illustrate convergent innovation in self-updatable pools for both AI systems and systems infrastructure, emphasizing design patterns of hierarchical separation, fine-grained memory tracking, upgrade-safe logic indirection, and lightweight metadata management.

7. Future Directions and Open Considerations

The design of self-updatable memory pools points to future work in several areas:

  • For neural models: integration of more sophisticated memory selection/replacement policies, adaptive memory windowing, and mechanisms to mitigate information loss beyond exponential decay.
  • For cloud platforms: increasing extension flexibility without sacrificing upgrade latency or memory overhead, enhancing compatibility across heterogeneous infrastructure, and further reducing contention during massive concurrent operations.

A plausible implication is that the separation-of-concerns and indirection techniques used in both LLM and cloud domains may serve as a template for designing further self-updating, low-downtime, and high-availability subsystems across a range of computational platforms. Current evidence demonstrates that with appropriate architecture and update strategies, self-updatable memory pools can jointly achieve high efficiency, flexibility, and operational stability (Wang et al., 7 Feb 2024, Zheng et al., 13 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Self-Updatable Memory Pools.