Papers
Topics
Authors
Recent
Search
2000 character limit reached

Memory Composer in Multi-Agent Systems

Updated 25 February 2026
  • Memory Composer is a specialized module that customizes and separates memory representations to mitigate homogenization and information overload in complex systems.
  • It employs lightweight Transformer networks and ILP-based cache partitioning to generate fixed-length, high-relevance memory tokens or allocated cache sets.
  • Empirical results demonstrate significant gains in accuracy, inference efficiency, and cache performance across LLM-powered multi-agent systems and embedded multiprocessor platforms.

A Memory Composer is a specialized architectural or algorithmic module that enforces separation, customization, or compositionality of memory representations for individual agents, tasks, or components within multi-agent systems or parallel processor environments. The intent is to mitigate key bottlenecks such as homogenization, information overload, or inter-component interference, thereby providing predictable, efficient, and high-utility memory interfaces between data stores and computation modules. Memory Composers have emerged prominently in LLM-powered multi-agent systems (MAS) and in compositional cache design for embedded multiprocessor architectures, with rigorous quantitative validation in both settings (Fu et al., 3 Feb 2026, 0710.4658).

1. Motivation and Definition

Memory Composers address critical bottlenecks in scalable intelligent systems. In LLM-powered MAS, conventional memory architectures result in:

  • Memory homogenization, where agents or roles share undifferentiated memory, precluding specialization;
  • Information overload, arising from large, fine-grained, or unboundedly long memory entries, rapidly exhausting model context windows.

In real-time parallel processing on multiprocessors, traditional shared caches are non-compositional: tasks evict each other's data unpredictably, making per-task timing and system integration difficult. Compositionality here refers to a property wherein system-level performance can be derived solely from the properties of the individual components, regardless of system configuration (0710.4658).

A Memory Composer serves as an explicit, learnable (or programmable in hardware) module that mediates between a non-parametric store of experiences (trajectories or cache sets) and a computational core (LLM backbone or processing core). It produces specialized, fixed-length, and high-relevance memory representations for each agent or task, either as latent vectors in the LLM setting (Fu et al., 3 Feb 2026) or as allocated cache sets in the hardware setting (0710.4658).

2. Architectural Implementations

2.1 Latent Memory Composer in MAS

Within LatentMem, the Memory Composer is instantiated as a lightweight Transformer network σ_φ, initialized from the target LLM but kept compact via LoRA adaptation (rank r=16, scaling α=32). It takes as input:

  • Retrieved trajectories T_q = {τ_i}, from a non-parametric experience bank, passed as text;
  • Agent role profile γ_k, which is a discrete identifier encoded as a learnable embedding.

For an agent step j, the composer synthesizes:

mj=σϕ(γαj,Tq)RL×Dm_j = \sigma_\phi(\gamma_{\alpha_j}, T_q) \in \mathbb{R}^{L' \times D}

with L′ (latent length) typically set to 8 and D determined by LLM hidden size (4096 or 8192). m_j is then prepended as latent tokens to the agent's prompt, with the LLM backbone kept entirely frozen (Fu et al., 3 Feb 2026).

2.2 Compositional Memory Composer in Multiprocessors

In embedded parallel processors, the Memory Composer operates at the cache allocator level. The last-level cache is partitioned into exclusive sets per task and buffer:

  • Tasks {t_i} and buffers {b_j} are allocated c(t_i) and c(b_j) sets, respectively, with ic(ti)+jc(bj)=S\sum_i c(t_i) + \sum_j c(b_j) = S
  • Load/store requests include a task- or buffer-ID, remapped via hardware address translation to ensure exclusivity
  • Predictable hit/miss behavior results: e.g., if c(b_j) matches buffer size, all post-cold accesses hit.

Allocation variables (x_{i,k}, y_{j,k}) select one cache-set size from a discrete candidate set for each entity (0710.4658).

3. Optimization and Learning Methodologies

3.1 Latent Memory Policy Optimization (LMPO)

LMPO propagates task-level optimization gradients through the latent memory composer σφ, leaving underlying LLM policies π{θ_{α_j}} entirely frozen for modularity. Grouped rollouts are evaluated under the current composer, rewards R(\hat τ_i) computed, and standardized group advantages formed. A token-level PPO-style surrogate loss is used:

JLMPO(ϕ)=Eq,{τ^i}[1i,j,t1i,j,tmin(ri,j,t(ϕ)A^i,clip(ri,j,t(ϕ),1ε,1+ε)A^i)],\mathcal{J}_{\mathrm{LMPO}(\phi)} = \mathbb{E}_{q,\{\hat τ_i\}}\Biggl[\, \frac{1}{\sum_{i,j,t}1} \sum_{i,j,t} \min\left(r_{i,j,t}(\phi)\,\hat A_i, \mathrm{clip}(r_{i,j,t}(\phi), 1-\varepsilon, 1+\varepsilon)\,\hat A_i\right)\Biggr],

where ri,j,t(ϕ)r_{i,j,t}(\phi) is the importance ratio between new and old latent memories (Fu et al., 3 Feb 2026).

3.2 ILP-based Cache Partitioning

The hardware Memory Composer uses an integer-linear programming (ILP) approach to optimize the allocation of cache sets. The objective is either the minimization of total L2 misses or the maximization of throughput, subject to constraints on cache size and unique assignment per entity. Performance is directly predictable from per-task and per-partitioned buffer measurements given the compositionality guarantee. The analytical formulations enable efficient, optimal division of limited cache resources (0710.4658).

4. Integration Workflows

4.1 MAS Agent Memory Pipeline

The system follows these steps:

  1. For a new query q, retrieve top-K trajectories T_q from the experience bank;
  2. For each agent step j:
    • Compute m_j = σφ(γ{α_j}, T_q);
    • Prepend m_j to agent’s prompt embeddings;
    • Call the frozen LLM policy.
  3. Append completed trajectory τ to the experience bank for continual adaptation.

Empirically, L′=8 and retrieval size K=1 are sufficient, with the composer comprising 2–4 Transformer layers (Fu et al., 3 Feb 2026).

4.2 Real-Time Cache Partitioning

The partitioning procedure consists of:

  • Measuring task-specific and buffer-specific miss and execution profiles in isolation;
  • Solving the ILP for optimal assignment of sets;
  • Programming the OS and address remapping hardware with the allocation table to guarantee isolation and compositional predictability during concurrent execution (0710.4658).

5. Empirical Performance and Sensitivity

5.1 Latent Memory Composer in MAS

Across six benchmarks (TriviaQA, KodCode, StrategyQA, PopQA, BigCodeBench, PDDL) and four MAS frameworks (AutoGen, MacNet, CAMEL, DyLAN), LatentMem attains average accuracy improvements of 6–8% over single- and multi-agent memory baselines, with peak gains up to 19.36%. Key observations include:

  • Role-aware latent memories yield >2% absolute improvement;
  • Fixed-length latent tokens (L′=8) cut inference costs by ≈50% over textual memory approaches;
  • Disabling continual memory bank updates leads to performance drops of up to 7.6%, emphasizing adaptation (Fu et al., 3 Feb 2026).

5.2 Compositional Memory Composer in Multiprocessors

On the CAKE platform (4×Trimedia cores, 512KB L2), compositional partitioning:

  • Reduces L2 miss rates by ~5× on mixed apps (JPEG+Canny) and ~6.5× on MPEG-2 decoder;
  • Delivers up to 20% per-core CPI improvements for memory-bound tasks;
  • Ensures expected vs. measured miss counts deviate by ≤2%, confirming analytical predictability (0710.4658).

Table: Quantitative Memory Composer Results

Context Metric Composer Gain
MAS (LatentMem) Avg. accuracy increase 6–8% vs. vanilla baselines
MAS (TriviaQA) Peak gain +16.20% (AutoGen/Qwen-3B)
MAS (KodCode) Peak gain +18.45% (AutoGen/Llama-8B)
Embedded HW L2 miss reduction ~5× (JPEG+Canny), ~6.5× (MPEG2)
Embedded HW CPI improvement Up to 20% (memory-bound)

6. Design Principles and Insights

Key guidelines are evident across domains:

  • Role-aware customization is essential to avoid homogenization and maximize utility per agent or task;
  • Fixed-bound memory/partition size constrains both computational and inference resource consumption without sacrificing performance;
  • Compositional mapping (per-ID partitioning) is crucial for both analytical predictability and robust adaptation;
  • Continual adaptation via experience bank or partition reallocation is necessary to sustain peak performance amid changes.

In hardware, discretizing partition sizes and enforcing static task-processor binding are recommended for maximal predictability. In LLM-based MAS, lightweight composer models and frozen backbones isolate the adaptation process, facilitating retrofitting to diverse frameworks (Fu et al., 3 Feb 2026, 0710.4658).

7. Broader Implications and Extensions

The Memory Composer paradigm generalizes across domains where multi-component systems require either role-sensitive contextualization or allocation of finite, shared memory resources. Its implementations highlight the advantages of compositionality—not only in predictability and efficiency but also in enabling new research and development workflows, such as continual integration of new roles or codecs without re-tuning all downstream policies or pipelines. A plausible implication is the expansion of this architectural template into other areas such as distributed reinforcement learning, task scheduling in heterogeneous compute fabrics, or adaptive context selection for foundation models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Memory Composer.