Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 99 tok/s

Gemini 2.5 Pro 43 tok/s Pro

GPT-5 Medium 28 tok/s

GPT-5 High 35 tok/s Pro

GPT-4o 94 tok/s

GPT OSS 120B 476 tok/s Pro

Kimi K2 190 tok/s Pro

2000 character limit reached

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs (2508.16153v2)

Published 22 Aug 2025 in cs.LG and cs.CL

Abstract: In this paper, we introduce a novel learning paradigm for Adaptive LLM agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely \emph{Memento}, which attains top-1 on GAIA validation ($87.88\%$ Pass@$3$) and $79.40\%$ on the test set. It reaches $66.6\%$ F1 and $80.4\%$ PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds $4.7\%$ to $9.6\%$ absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/Memento.

Collections

Summary

The paper introduces Memento, a memory-augmented framework that enables continual adaptation of LLM agents without fine-tuning the underlying LLMs.
It leverages a memory-based Markov Decision Process with both non-parametric and parametric retrieval to integrate case-based reasoning for dynamic tool use and multi-step planning.
Empirical results show state-of-the-art performance across benchmarks, with significant improvements in long-horizon research tasks and out-of-distribution generalization.

Memory-Augmented Continual Adaptation for LLM Agents: The Memento Framework

Introduction and Motivation

The paper introduces Memento, a learning paradigm for LLM-based agents that enables continual adaptation without fine-tuning the underlying LLM parameters. The motivation stems from the limitations of current LLM agent paradigms: static, workflow-based systems lack flexibility, while parameter fine-tuning approaches are computationally expensive and impractical for real-time, open-ended adaptation. Memento addresses this by leveraging external, episodic memory and case-based reasoning (CBR), formalized as a memory-augmented Markov Decision Process (M-MDP), to enable agents to learn from experience in a non-parametric, scalable manner.

Memory-Based Markov Decision Process and Case-Based Reasoning

Memento formalizes the agent's decision process as an M-MDP, extending the standard MDP tuple $\langle \mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma \rangle$ with a memory space $\mathcal{M}$ that stores episodic trajectories. At each timestep, the agent retrieves a relevant case from memory using a learned retrieval policy $\mu$ , adapts the retrieved solution via the LLM, executes the action, and appends the new experience to memory. This process is governed by a policy:

$\pi(a|s,M) = \sum_{c\in M} \mu(c|s,M) p_{\text{LLM}}(a|s,c)$

where $M$ is the case bank, $c$ is a case, and $p_{\text{LLM}}$ is the LLM's action likelihood conditioned on the current state and retrieved case.

Figure 1: A graphical model of memory-based Markov Decision Process.

The retrieval policy $\mu$ is optimized via maximum entropy RL (soft Q-learning), encouraging both exploitation of high-utility cases and exploration/diversity in retrieval. The Q-function $Q(s, M, c)$ estimates the expected return of selecting case $c$ in state $s$ with memory $M$ , and the optimal retrieval policy is a softmax over Q-values:

$\mu^*(c|s,M) = \frac{\exp(Q^*(s,M,c)/\alpha)}{\sum_{c' \in M} \exp(Q^*(s,M,c')/\alpha)}$

To address the challenge of high-dimensional, natural language state and case spaces, the Q-function can be approximated via kernel-based episodic control or a neural network, depending on the memory variant.

Planner–Executor Architecture and Memory Management

Memento is instantiated as a planner–executor framework. The planner is an LLM-based CBR agent that alternates between retrieving relevant cases from memory (Read) and recording new experiences (Write), with the retrieval policy either similarity-based (non-parametric) or Q-function-based (parametric). The executor is an LLM-based client that invokes external tools via the Model Context Protocol (MCP), enabling compositional tool use and dynamic reasoning.

Figure 2: The architecture of Memento with parametric memory, alternating between Case-Based Planning and Tool-Based Execution.

The memory module supports both non-parametric (vectorized similarity search) and parametric (Q-function) retrieval. In the non-parametric setting, retrieval is based on cosine similarity between the current state and stored cases. In the parametric setting, the Q-function is trained online (using cross-entropy loss for binary rewards) to predict the utility of each case, and retrieval is performed by selecting the top-K cases with the highest Q-values.

Tool Integration and Deep Research Scenarios

Memento is designed for deep research tasks requiring long-horizon planning, multi-step tool use, and reasoning over heterogeneous data. The MCP-based executor supports a suite of tools for web search, crawling, multimodal document processing, code execution, and mathematical computation. This enables the agent to acquire, process, and reason over external information in real time, supporting complex research workflows.

Empirical Evaluation

Memento is evaluated on four benchmarks: GAIA (long-horizon tool use), DeepResearcher (real-time web research), SimpleQA (factual precision), and HLE (long-tail academic reasoning). The agent achieves:

GAIA: 87.88% Pass@3 on validation and 79.40% on the test set, outperforming all open-source agent frameworks.

Figure 3: Memento vs. Baselines on GAIA validation and test sets.

DeepResearcher: 66.6% F1 and 80.4% PM, surpassing state-of-the-art training-based systems.
SimpleQA: 95.0% accuracy, establishing a new state-of-the-art for factual reliability.
HLE: 24.4% PM, ranking second overall and outperforming several strong baselines.
Figure 4: Performance on SimpleQA and HLE, demonstrating Memento's superiority in factual and academic reasoning tasks.

Ablation studies show that both parametric and non-parametric CBR yield consistent, additive improvements across all benchmarks. Notably, case-based memory provides 4.7% to 9.6% absolute gains on out-of-distribution tasks, highlighting its role in generalization.

Continual Learning and Memory Efficiency

Memento demonstrates continual learning capability: as the case bank grows, performance improves over successive iterations, with rapid convergence observed after a few iterations due to the finite environment. The optimal number of retrieved cases is small (K=4), as larger K introduces noise and computational overhead without further gains.

The system is efficient in terms of output token usage, with most computational cost arising from integrating multi-step tool outputs as task complexity increases. The architecture is robust to hallucination and maintains concise, structured planning, with fast planners outperforming slow, deliberative ones in modular settings.

Figure 5: The average number of each task type per level, highlighting the dominance of code, search, and crawl tasks as difficulty level increases.

Theoretical and Practical Implications

Memento provides a principled framework for continual, real-time adaptation of LLM agents without gradient updates. By decoupling agent learning from LLM parameter updates, it enables scalable, low-cost deployment in open-ended environments. The memory-augmented MDP formalism and CBR policy optimization bridge cognitive science and RL, offering a pathway for agents to accumulate, reuse, and generalize from experience.

The strong empirical results, especially on OOD tasks, challenge the assumption that parameter fine-tuning is necessary for agent adaptation. Instead, memory-based approaches can yield comparable or superior performance with greater efficiency and flexibility.

Future Directions

Potential extensions include:

Scaling to larger, more diverse memory banks with advanced curation and forgetting mechanisms to mitigate retrieval swamping.
Integrating richer forms of memory (e.g., semantic, procedural) and more sophisticated retrieval policies.
Applying the framework to multi-agent and collaborative research scenarios.
Exploring hybrid approaches that combine memory-based adaptation with lightweight parameter-efficient fine-tuning.

Conclusion

Memento demonstrates that memory-augmented, case-based reasoning enables LLM agents to achieve continual, real-time adaptation without fine-tuning LLM parameters. The framework achieves state-of-the-art results across multiple challenging benchmarks, with strong generalization and efficiency. These findings suggest that external memory and CBR are critical components for scalable, generalist LLM agents, and motivate further research into memory-based agent architectures for open-ended AI.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (11)

GitHub

GitHub - Agent-on-the-Fly/AgentFly: Official Code of AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs (10 stars)

Tweets

https://twitter.com/omarsar0/status/1960047046444085363

https://twitter.com/Altimor/status/1960755364208763321

https://twitter.com/Hesamation/status/1960039439876669497

https://twitter.com/aaditsh/status/1962870168369127908

https://twitter.com/LangChainJP/status/1960900292251209981

https://twitter.com/HuggingPapers/status/1959890498291675412