Papers
Topics
Authors
Recent
Search
2000 character limit reached

MemSkill Architecture: Adaptive Memory for LLMs

Updated 8 February 2026
  • MemSkill is a self-evolving memory-management architecture that models memory manipulation as a set of dynamic, natural-language skills.
  • It integrates a controller, executor, and designer in a closed-loop system to adaptively select, apply, and evolve memory update strategies.
  • Empirical evaluations demonstrate that MemSkill outperforms static baselines on benchmarks like LoCoMo, LongMemEval, and ALFWorld using reinforcement learning.

MemSkill is a self-evolving memory-management architecture for LLM agents, designed to eliminate the rigidity and inefficiency of static, hand-crafted memory operations. Instead of fixed procedures, MemSkill models memory manipulation as a set of structured, evolvable "memory skills," each represented by natural-language templates and selected dynamically according to context. At its core, the architecture is governed by a closed loop in which a controller learns to select relevant skills, an executor uses these skills to alter the agent’s memory, and a designer evolves the skill set by addressing recurring failures. This system jointly optimizes both the memory update policy and the evolving repertoire of skills, enabling adaptive and generalizable memory management across a variety of agent settings and interaction regimes (Zhang et al., 2 Feb 2026).

1. Design Objectives and Foundational Paradigms

MemSkill addresses three principal design goals:

  • Minimization of human priors: Memory management behaviors emerge from learning on agent data, not from hand-designed rules or heuristics.
  • Flexible extraction granularity: Skills are applicable to arbitrary text spans, rather than operating solely at the per-turn level.
  • Compositional, context-sensitive memory construction: In each generation, a small, context-dependent subset of skills is selected and composed by the controller for application by the executor.

MemSkill maintains two persistent structures:

Store Type Granularity Role
Memory bank Per-interaction/trace Extracts, consolidates, and prunes facts for current trajectory
Skill bank Shared across traces Stores and evolves reusable natural-language skill templates

These structures are dynamically updated through alternating phases of skill-use learning (controller and executor) and skill evolution (designer).

2. Core System Components

MemSkill consists of three principal modules: controller, executor, and designer, together orchestrating a continual process of skill learning and evolution.

2.1 Skill Bank

  • Initialization: Contains four primitive skills: INSERT, UPDATE, DELETE, SKIP.
  • Skill Representation: Each skill sSs \in S encapsulates:
    • A short selection-oriented description.
    • A detailed, structured template describing its purpose, conditions for invocation, operational instructions, constraints, and action type.
  • Evolution: Over time, new skills are added or existing ones refined according to designer feedback emerging from empirical agent failures.

2.2 Controller (Skill-Selection Policy)

  • Context Encoding: Each processing step tt involves embedding the current input span xtx_t and the corresponding retrieved memories MtM_t into a shared vector space: ht=fctx(xt,Mt)h_t = f_{\rm ctx}(x_t,M_t).
  • Skill Embedding: Skill descriptions desc(si)\mathrm{desc}(s_i) are embedded as ui=fskill(desc(si))u_i = f_{\rm skill}(\mathrm{desc}(s_i)), allowing for compatibility with a dynamically changing skill bank.
  • Skill Scoring and Selection: Compute selection scores and sample an ordered Top-KK skill set AtA_t using a Gumbel-Top-K strategy, with the probabilities:

pθ(iht)=exp(zt,i)jexp(zt,j) where zt,i=htui.p_\theta(i\mid h_t)=\frac{\exp(z_{t,i})}{\sum_j\exp(z_{t,j})} \text{ where } z_{t,i} = h_t\cdot u_i.

2.3 Executor (Skill-Conditioned Memory Construction)

  • Input: Receives the span xtx_t, current memory set MtM_t, and selected skills {sat,1,,sat,K}\{s_{a_{t,1}},…,s_{a_{t,K}}\} via a fixed prompt to an LLM.
  • Action Generation: Outputs a structured sequence of skill-parameterized memory update actions in a single LLM generation step. Actions include:
    • INSERT: Creating new memory items.
    • UPDATE: Revising specific memory items by index.
    • DELETE: Removing specified memory items.
  • Effect: Updates the trace-specific memory bank according to parsed actions.

2.4 Designer (Skill Evolution Mechanism)

  • Activation: Triggered periodically (every EE training steps).
  • Stage 1—Hard-Case Aggregation:
    • Maintains a buffer of recent query failures, with associated metadata: query, retrieved memories, answers, ground truth, scalar reward r(q)r(q), and failure count c(q)c(q).
    • Assigns a difficulty score: d(q)=(1r(q))c(q)d(q) = (1-r(q))\cdot c(q).
    • Clusters queries and samples representative problematic cases using, for instance, k-means in embedding space.
  • Stage 2—LLM-Guided Skill Update:
    • Prompts an LLM with cases and current skill bank to produce:
    • Refined templates for existing skills.
    • Proposals for new skills to address as-yet-uncovered memory behavior.
    • Applies at most MM changes per evolution cycle.
    • Reverts to earlier snapshots if validation rewards post-update do not improve.

New skill adoption is encouraged by temporarily biasing controller logits, ensuring at least a fraction TT of total probability mass is assigned to new skills for the first τ\tau steps after introduction.

3. Algorithmic Procedure and Data Flow

Memory processing in MemSkill is structured around sequential, span-level analysis of agent interaction traces. The procedural flow comprises the following steps:

  1. Segmentation: Each input trace is split into spans x1,,xTx_1,\ldots,x_T.
  2. Iterative Processing (for t=1t=1 to TT):
    • Retrieve the top RR relevant items from the current trace-specific memory bank (MtM_t).
    • Controller computes contextual encodings and samples an ordered Top-KK skill set for the current span.
    • Executor LLM generates and parses memory update actions under the selected skills.
    • Memory bank is updated accordingly.
  3. Trace Finalization:
    • Evaluate memory-dependent queries to obtain a terminal reward (RR).
    • Use PPO to update the controller’s policy based on this reward.
    • Log failures to the designer’s buffer for eventual skill evolution cycles.

4. Training Objectives and Optimization

4.1 Reinforcement Learning for Skill-Selection

  • Reward Structure: Only the terminal step receives the episode reward (rT=Rr_T=R), all intermediate per-span rewards are zero.
  • Return Calculation:

Gt=k=tTγktrkG_t = \sum_{k=t}^T \gamma^{k-t} r_k

Lpolicy(θ)=Et[min(ρtA^t,clip(ρt,1ϵ,1+ϵ)A^t)]L^{\rm policy}(\theta) = \mathbb{E}_t [ \min(\rho_t \hat{A}_t, \mathrm{clip}(\rho_t, 1-\epsilon, 1+\epsilon)\hat{A}_t) ]

with likelihood ratios ρt\rho_t computed from the current and previous controller parameters.

  • Value Loss:

Lvalue(ϕ)=Et[(Vϕ(ht)Gt)2]L^{\rm value}(\phi) = \mathbb{E}_t [ (V_\phi(h_t) - G_t)^2 ]

  • Entropy Regularization:

H(θ)=Et[ipθ(iht)logpθ(iht)]H(\theta) = \mathbb{E}_t[-\sum_i p_\theta(i\mid h_t)\log p_\theta(i\mid h_t)]

  • Overall Objective: Maximize

LpolicycvLvalue+cHH(θ)L^{\rm policy} - c_v L^{\rm value} + c_H H(\theta)

4.2 Skill Exploration Incentivization

To ensure exploration and integration of newly introduced skills following designer-driven evolution, controller logits are biased for τ\tau steps so that newly added skills collectively receive at least Tt=T0(1t/τ)T_t = T_0 (1-t/\tau) probability mass at each step. This encourages rapid evaluation of new skills before annealing exploration pressure.

5. Experimental Results and Representative Ablations

MemSkill was benchmarked on LoCoMo and LongMemEval (long-context dialogues), HotpotQA (question answering with domain and format shift), and ALFWorld (simulated embodied tasks). Evaluation metrics include F1, LLM-judge scores, and task-specific success metrics.

Baseline comparisons included models such as No-Memory, Chain-of-Notes, ReadAgent, MemoryBank, A-MEM, Mem0, LangMem, and MemoryOS.

Key Results

  • On all evaluation suites and using two 70B–80B LLMs, MemSkill surpasses static-memory and RL-only baselines.
  • Demonstrated strong zero-shot transfer: skills learned on LoCoMo generalize to LongMemEval and HotpotQA without fine-tuning.
  • For embodied tasks, skill-conditioned memories increase success rates on both seen and unseen ALFWorld splits.

Ablation Results (LoCoMo, LLM-judge metric):

Experiment Variant LLaMA Qwen
Full MemSkill 50.96 52.07
– Controller replaced by random 45.86 41.24
– Designer replaced by static skills 44.11 34.71
– Designer: refinement only (no new) 44.90 46.97
  • Learning to select relevant skills provides an approximate 5-point gain.
  • Designer-driven evolution yields an additional 6–17-point improvement, especially under model distribution shift.
  • This suggests that both adaptive skill selection and ongoing skill discovery are crucial for generalizable, robust memory systems in LLM agents.

Representative Evolved Skills

  • LoCoMo: “Capture Temporal Context,” “Capture Activity Details,” “Handle Entity Relationships,” “Refine Temporal Details with Context.”
  • ALFWorld: “Capture Action Constraints,” “Track Object Location,” “Track Object Movements.”

6. Interpretation and Implications

MemSkill empirically demonstrates that (i) memory operations can be abstractionized as learnable skills rather than fixed routines, (ii) RL-based learning of skill selection yields measurable improvements in handling diverse and lengthy traces, and (iii) the skill bank itself benefits from continual, data-driven evolution based on agent error. The integration of controller, executor, and designer modules into a closed-loop produces a memory manager for LLM agents that adapts and generalizes without fixed heuristics or reliance on static, human-encoded priors. This architecture offers evidence supporting emergent, compositional memory management as a practical paradigm for advanced agent systems (Zhang et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MemSkill Architecture.