Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework (2508.16629v1)

Published 15 Aug 2025 in cs.LG, cs.AI, cs.CL, and cs.IR

Abstract: LLM-based agents have been extensively applied across various domains, where memory stands out as one of their most essential capabilities. Previous memory mechanisms of LLM-based agents are manually predefined by human experts, leading to higher labor costs and suboptimal performance. In addition, these methods overlook the memory cycle effect in interactive scenarios, which is critical to optimizing LLM-based agents for specific environments. To address these challenges, in this paper, we propose to optimize LLM-based agents with an adaptive and data-driven memory framework by modeling memory cycles. Specifically, we design an MoE gate function to facilitate memory retrieval, propose a learnable aggregation process to improve memory utilization, and develop task-specific reflection to adapt memory storage. Our memory framework empowers LLM-based agents to learn how to memorize information effectively in specific environments, with both off-policy and on-policy optimization. In order to evaluate the effectiveness of our proposed methods, we conduct comprehensive experiments across multiple aspects. To benefit the research community in this area, we release our project at https://github.com/nuster1128/learn_to_memorize.

Summary

  • The paper introduces an adaptive memory framework that dynamically optimizes memory retrieval, utilization, and storage in LLM agents.
  • It employs a Mix-of-Expert gate function and learnable aggregation with SFT and DPO methods to enhance decision-making efficiency.
  • Experimental results on tasks like HotpotQA demonstrate that on-policy optimization reduces reasoning steps and improves complex task performance.

"Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework"

Introduction

The paper "Learn to Memorize: Optimizing LLM-based Agents with Adaptive Memory Framework" addresses the limitations of current memory mechanisms in LLM-based agents. Traditional memory mechanisms largely rely on predefined rules set by human experts, which leads to higher labor costs and suboptimal performance, especially in interactive scenarios. The authors propose an adaptive memory framework that optimizes LLM-based agents by modeling memory cycles. Figure 1

Figure 1: (a) In memory retrieval, the optimal weights for different aspects vary across different tasks. Similarly, in memory storage, the attention of information storage is task-dependent as well. However, manual model adaptation by human experts results in higher labor costs and suboptimal performance. (b) We demonstrate the memory cycle during interactions between agents and environments.

Methodology

The proposed framework centers around three main procedures: memory retrieval, utilization, and storage. For memory retrieval, a Mix-of-Expert (MoE) gate function is employed to dynamically weigh different metrics depending on the task at hand. This approach contrasts with traditional methods which use fixed weights and often fail to optimize retrieval efficiency across diverse environments.

The memory utilization process involves a learnable aggregation mechanism that is optimized using both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) methods. This process allows for the integration of dynamic memory usage into prompts fed to LLMs, thereby enhancing decision-making capabilities. Figure 2

Figure 2: Overview of the memory cycle effect and adaptive memory framework.

For memory storage, task-specific reflections are used to adaptively store observations as memories. This adaptive process replaces rigid, manually-designed instructions with a reflection-based approach that learns from interactions to optimize what information is stored.

Optimization Strategies

The authors propose both off-policy and on-policy optimization strategies to train the memory framework. Off-policy optimization leverages pre-collected interaction data to refine memory mechanisms, while on-policy optimization continuously interacts with the environment to adjust model parameters in real-time. The latter strategy, though computationally intensive, addresses potential distribution shifts encountered in the off-policy mode.

Experimental Results

The experimental evaluation conducted across varied datasets such as HotpotQA (with different difficulty levels) demonstrates the effectiveness of the proposed framework. The results show that the on-policy optimized model significantly outperforms baselines, especially in harder tasks that require complex reasoning across multiple steps.

One notable observation is that models utilizing this framework tend to require fewer reasoning steps to arrive at correct answers, suggesting improved efficiency in decision-making. The results also indicate variability in performance depending on the inference models used, highlighting differences in their in-context learning capabilities.

Implications and Future Work

The introduction of an adaptive, data-driven memory framework for LLM-based agents offers a substantial improvement over traditional memory methods. Practically, this research could lead to more efficient and versatile AI systems in areas where dynamic interaction and continuous learning are paramount.

Future work should focus on extending these techniques to domains that rely on implicit memory or require different reasoning structures. There is also potential to further explore optimization strategies that mitigate the costs associated with on-policy updates.

Conclusion

This paper contributes an innovative memory framework that enables LLM-based agents to adaptively optimize memory retrieval, utilization, and storage in interactive environments. By implementing both off-policy and on-policy optimization techniques, the proposed framework enhances the learning efficiency and decision-making capability of agents, making it a valuable advancement in the development of intelligent systems.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets