- The paper introduces Mem^p as a procedural memory framework that encodes agent trajectories into reusable templates.
- It employs build, retrieval, and update mechanisms, using query-vector matching and reflection to optimize long-horizon task performance.
- Experimental results show enhanced task success and reduced steps, highlighting procedural memory transfer from robust to weaker models.
Procedural Memory in AI Agents: Insights and Framework
Procedural memory plays a vital role in the cognitive capabilities of AI agents, akin to its significance in human cognitive processes. This paper explores the investigation of procedural memory in LLM-based agents, discussing strategies for building, retrieving, and continuously updating this form of memory within a dynamic AI environment.
Framework for Procedural Memory
Procedural Memory Model
The framework, named Memp, treats procedural memory as a high-level optimization object. It aims to encode past agent trajectories from step-by-step instructions to script-like abstractions, providing a structured repository that evolves with new experiences.
Figure 1: The procedural memory framework consists of Build, Retrieve, and Update, which respectively involve encoding stored procedural memory, forming new procedural memories, and modifying existing ones in light of new experiences.
Build Phase
The build phase constitutes capturing the trajectories from previous agent interactions. Memp leverages these historical trajectories to create procedural templates that guide future interactions, essentially distilling this data into reusable knowledge.
Retrieval Phase
During retrieval, Memp employs various strategies such as query-vector matching and keyword-vector matching to select the most relevant procedural memory. This retrieval process ensures that valuable past insights can be effectively utilized to tackle similar upcoming tasks.
Memory Update Mechanisms
Memory updating is implemented through strategies such as validation filtering and reflection-based mechanisms. These enable the procedural memory repository to dynamically adjust, discard obsolete or erroneous data, and augment valuable new information.
Figure 2: Reward gain and steps reduction vs. trajectory group index with procedural memory.
Experimental Analysis
Datasets and Models
The framework was empirically validated on diverse datasets including TravelPlanner and ALFWorld. These tests scrutinized agents' ability to deal with long-horizon tasks requiring procedural knowledge.
Results
Results demonstrated marked improvements in both task success rates and computational efficiency with procedural memory in place. The framework was instantiated using state-of-the-art LLMs like GPT-4o, Claude-3.5-sonnet, and Qwen2.5-72B-Instruct, showing superior performance relative to approaches lacking memory integration.
Figure 3: With procedural memory, agents can improve both the success rate (accuracy ↑) and execution efficiency (steps ↓) when solving similar tasks.
Insights and Implications
Procedural Memory Transferability
The paper showcased the transferability of procedural memory from stronger models to weaker ones, illustrating how knowledge from a robust system can enhance the operational efficiency of less capable models.
Figure 4: (a) Transfer result of GPT-4o's procedural memory to Qwen2.5-14B-Instruct and its performance on TravelPlanner dataset.(b) The relationship between the quantity of procedural memory retrieved for GPT-4o's performance on the ALFWorld dataset.
Scaling Considerations
Scalability was observed as an advantage of vector-based retrieval, allowing agents to sift through extensive repositories of procedural knowledge to optimize performance on novel tasks.
Figure 5: Compare trajectories with and without procedural memory, shortens the process by 9 steps and saves 685 tokens.
Conclusion
Procedural memory significantly enhances the cognitive capabilities and adaptability of AI agents, marking an essential step toward self-improving systems. Future work can explore even more sophisticated retrieval strategies and judge-based task completion assessments, enabling agents to further refine their expertise dynamically. By fostering memory transferability and efficient update mechanisms, Memp promises noteworthy advancements in AI agent capabilities.