Agentic Memory: Unified Long-Term and Short-Term Management for Large Language Model Agents
This presentation explores AgeMem, a breakthrough framework that treats memory management as a learnable behavior in language model agents. Unlike prior approaches that separate long-term and short-term memory using rigid heuristics, AgeMem unifies both memory types into the agent's action space and optimizes their use through reinforcement learning. The talk covers the methodology, empirical results across five benchmarks, and the implications of this unified, agent-centered approach for building more adaptive and efficient autonomous systems.Script
What if language model agents could learn how to manage their own memory, deciding what to remember and what to forget based on what actually helps them succeed? That fundamental question drives this breakthrough work on unified memory management.
Building on that challenge, existing agent architectures face a fundamental bottleneck. Prior methods treat long-term and short-term memory as separate systems managed by fixed rules, creating fragmented and inefficient information flow that hampers the agent's ability to reason over extended interactions.
AgeMem takes a radically different approach by unifying memory management into the agent's own decision-making process.
This unification means the agent can invoke memory operations at any decision point, treating memory management not as an external process but as part of its core reasoning strategy. The learning signal comes directly from task performance, allowing the system to discover optimal memory behaviors end-to-end.
This architectural comparison reveals the key innovation. Traditional frameworks keep short-term memory static and trigger long-term storage through fixed rules. Some recent work added agent-based control for long-term memory but still kept the two systems separate. AgeMem breaks this separation entirely, giving the agent unified control over both memory types through a coherent tool interface that can be optimized jointly.
The training methodology addresses the core challenge of sparse rewards in memory management. By structuring learning in progressive stages and using a multi-faceted reward function, AgeMem learns to balance immediate task needs with long-term memory quality, discovering when compression helps and when detailed storage matters.
Now let's examine how this unified approach performs in practice.
These results span five challenging benchmarks including ALFWorld, SciWorld, and HotpotQA. The consistent improvements demonstrate that learned memory management translates to better reasoning across diverse task types, while the token efficiency gains show the agent is actively preventing context bloat through strategic compression and filtering.
This ablation study quantifies exactly where the gains come from. Adding long-term memory tools provides a solid baseline improvement. Reinforcement learning further boosts performance by optimizing tool usage timing and selection. The final integration of short-term memory tools delivers the largest gains on context-heavy tasks, validating that unified control over both memory types creates synergistic benefits that isolated systems cannot achieve.
AgeMem's implications extend beyond immediate performance metrics. By treating memory as part of the policy itself, this work opens a pathway toward agents that can truly adapt their information processing strategies to the demands of complex, evolving tasks. The framework is extensible to richer tool sets and multi-agent scenarios, establishing a foundation for the next generation of autonomous reasoning systems.
AgeMem demonstrates that when agents learn to manage their own memory, they become more capable, efficient, and adaptive reasoners. Visit EmergentMind.com to explore the full technical details and dive deeper into this unified approach to agentic memory.