- The paper introduces a unified bi-level memory system integrating semantic, episodic, and procedural memories to overcome long-horizon task challenges.
- The methodology employs a triadic multi-agent setup with actor, critic, and memory agents for online planning, reward annotation, and automatic memory consolidation.
- Empirical results across benchmarks like AlfWorld demonstrate significant improvements in task completeness compared to ReAct and other baselines.
AdMem: A Unified Memory Architecture for LLM-Based Task-Solving Agents
Context and Motivation
LLMs have reached a significant level of proficiency in tool-use, reasoning, and decision-making across agentic settings. However, their effectiveness in complex, long-horizon tasks is fundamentally bottlenecked by memory systems that are shallow, fragmented, or narrowly scoped. Most existing frameworks focus myopically on factual or episodic recall, or relegate procedural memory to offline, post-hoc replay of successful task instances. These designs suffer in online deployment: symbolic memories are not adaptively consolidated, procedural traces are not reward-annotated, and little attention is paid to self-improvement through continuous experience and credit assignment.
"AdMem: Advanced Memory for Task-solving Agents" (2606.06787) addresses these deficiencies by proposing an agent-centric memory architecture. The design realizes a bi-level memory system that organizes semantic, episodic, and procedural stores, supported by a multi-agent scaffolding that adapts and refines memory through parallelized actor, critic, and memory agents. The emphasis is on automatic memory generation, reward-based evaluation, active consolidation, and robust scalability in task-solving contexts.
Memory Architecture and Agent Design
The AdMem framework is grounded in a cognitive taxonomy: semantic memory for facts and world-models, episodic memory for event traces, and procedural memory for decision-guiding control policies. These memories are divided into short-term (STM) and long-term (LTM) layers:
- Short-Term Memory (STM): Maintained by the actor agent for context compaction within the active task, STM leverages stack-based context management inspired by program execution semantics, allowing for efficient local planning, subgoal decomposition, and scope-based summarization to retain salient working material.
- Long-Term Memory (LTM): Managed by a dedicated memory agent, LTM stores compressed semantic, episodic, and procedural entries. Semantic and episodic recall is supported by dense retrieval. Procedural entries are annotated with reward feedback and alternative strategies.
A crucial aspect is the role of a critic agent, which tracks actions and outcomes, annotates procedural traces with explicit reward signals, and supports EM-style adaptation of memory efficacy. The critic agent completes credit assignment by comparing expected vs. realized outcomes, updating the procedural store in response to performance and guiding future retrieval.
Together, these agents realize a fully automatic pipeline:
- The actor agent interacts with the environment, plan-executes using STM, and leverages LTM retrieval.
- The critic agent observes actions, judges them post-hoc with respect to observed outcomes, and produces reflective, reward-marked procedural memories.
- The memory agent consolidates and prunes memories using merge, replicate, evict, and rank operations according to reward statistics.
This triadic interaction mitigates catastrophic forgetting, supports hierarchical planning, and enables continual online learning.
Memory Management and Retrieval
A central technical innovation is the reward-augmented management and retrieval protocol:
- Procedural memory entries are evaluated both by similarity to the current context (via embedding-space retrieval) and by an adaptive utility parameter, vm​, that estimates downstream effectiveness in actual decision guidance.
- Each time an action is credited with success or failure (as determined via critic’s annotation), the contributing procedural entries are updated via EM-driven maximum-likelihood assignment, similar to multi-armed bandit approaches.
- Redundant or low-use memories are merged or evicted according to access frequencies and retrieval co-occurrence statistics with context similarity thresholds for effective pruning and consolidation.
For context compaction in STM, subgoal stack management prevents overflow, and completed or abandoned subgoals trigger memory summarization and LTM update. Semantic and episodic entries are updated via LLM-driven summarization on every turn, maximizing interpretability and auditability.
Experimental Outcomes
Experiments span a suite of long-horizon domains from embodied navigation (AlfWorld, BabyAI) and tool-use (Tool-query variants) to natural language web interaction (WebShop, Science World). Across these varied benchmarks, AdMem demonstrates higher task completeness rates and average progress compared to strong procedural-memory and ReAct-style baselines, especially in domains characterized by transferable procedural knowledge. Notably, in AlfWorld, AdMem achieves 63.4% completeness vs. 49.3% for ReAct and 47.0% for AWM; similar gains are seen in domains where online, reward-driven procedural learning is pivotal.
Ablation studies confirm that the combination of short-term stack-based planning with reward-structured long-term procedural and semantic memory yields the best robustness and cumulative learning. Notably, integrating only long-term procedural recall without STM or episodic summarization sometimes degrades performance due to retrieval confusion—the architecture's strength is most evident when all components are used synergistically.
Implications and Future Directions
The AdMem architecture moves beyond conventional approaches by emphasizing reward-annotated procedural memory, automatic credit assignment, online consolidation, and multi-agent parallelism. The explicit integration of semantic, episodic, and procedural knowledge with adaptively tuned retrieval and management aligns more closely with cognitive architectures in artificial intelligence.
Practical implications include:
- Agents that robustly generalize and improve over long deployment horizons without suffering from context window limitations or memory bloat.
- Online, continual adaptation to dynamic environments, supporting personalization and lifelong learning without manual intervention or expensive, unscalable re-training.
- Explicitly reward-driven procedural learning, enabling the agent to retain not only successful strategies but also critical information about past errors.
Theoretical implications are substantial:
- Establishes a strong blueprint for scalable agent architectures capable of credit assignment and meta-learning.
- Bridges the gap between cognitive (human-inspired) memory architectures and symbolic/algorithmic implementations in deep RL and agentic LLM systems.
Future work will likely involve extending AdMem with memory systems that natively support multi-modal inputs, long-horizon multi-agent coordination, and hierarchical reflective processing at scale. Moreover, optimizing LLM and memory agent parallelism to reduce overhead and token-costs will be necessary for deployment in real-world interactive settings. A formal treatment of memory consolidation under varied reward propagation regimes (beyond binary task-level feedback) is also promising.
Conclusion
AdMem presents a unified short-long term memory architecture for LLM-based agents, combining semantic, episodic, and procedural components in a reward-driven, multi-agent pipeline. This framework yields strong empirical improvements in long-horizon, multi-task environments, validating the necessity of comprehensive, adaptive, and credit-assigned memory systems. As agent capabilities mature, such memory architectures will be central for scalable, robust, and self-evolving decision-making in artificial agents (2606.06787).