ParamMem: Augmenting Language Agents with Parametric Reflective Memory

Published 26 Feb 2026 in cs.LG and cs.MA | (2602.23320v2)

Abstract: Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies have attempted to address this limitation through various approaches, among which increasing reflective diversity has shown promise. Our empirical analysis reveals a strong positive correlation between reflective diversity and task success, further motivating the need for diverse reflection signals. We introduce ParamMem, a parametric memory module that encodes cross-sample reflection patterns into model parameters, enabling diverse reflection generation through temperature-controlled sampling. Building on this module, we propose ParamAgent, a reflection-based agent framework that integrates parametric memory with episodic and cross-sample memory. Extensive experiments on code generation, mathematical reasoning, and multi-hop question answering demonstrate consistent improvements over state-of-the-art baselines. Further analysis reveals that ParamMem is sample-efficient, enables weak-to-strong transfer across model scales, and supports self-improvement without reliance on stronger external model, highlighting the potential of ParamMem as an effective component for enhancing language agents.

Abstract PDF Upgrade to Chat

Summary

The paper introduces ParamMem, a novel parametric memory module that encodes cross-sample reflection patterns to enhance reflective diversity in language agents.
Methodology integrates temperature-controlled sampling with a unified memory framework, achieving superior performance on tasks like HumanEval, MBPP, MATH, and multi-hop QA.
Experimental results show a strong positive correlation between increased reflective diversity, measured by average pairwise cosine distance, and task success.

Augmenting Language Agents with Parametric Reflective Memory

Introduction

LLMs have displayed significant advancements in performing complex reasoning tasks, often employing test-time scaling to enhance performance. Reflection-based frameworks, where agents reflect on task feedback and accumulate self-reflections in episodic memory, have shown particular efficacy. However, these frameworks face limitations due to repetitive and inaccurate outputs produced by self-reflection mechanisms. This paper introduces ParamMem, a novel parametric memory module designed to enhance reflective diversity and improve reasoning performance in language agents.

Methodology

The core innovation of ParamMem is a parametric memory module that encodes cross-sample reflection patterns into the model parameters. This approach aims to generate diverse reflection outputs through temperature-controlled sampling. By integrating this module, ParamMem extends the traditional reflection-based agent framework by unifying parametric memory with episodic and cross-sample memory. The increased reflective diversity facilitates diverse reflection generation which, as demonstrated, has a positive correlation with task performance.

Empirical analyses showed a strong positive correlation between reflective diversity, measured by average pairwise cosine distance, and task success across multiple datasets (Figure 1).

Figure 1: Correlation between reflective diversity (measured by average pairwise cosine distance) and task performance across five datasets using LLaMA-3.1-8B under Reflexion, DoT, and DoT-bank.

Memory Mechanisms

ParamMem contrasts with traditional retrieval-based approaches like DoT-bank, which are limited by embedding similarity and suffer from issues like embedding collapse into low-rank subspaces. Instead, ParamMem's parametric module encodes reflective patterns within its parameters, drawing on a dataset of auxiliary reflections. The integration of parametric memory into existing systems, such as Reflexion, allows for a composite framework leveraging episodic, cross-sample, and parametric memories.

Figure 2: Comparison of memory mechanisms across different frameworks.

Experimental Results

Across various domains—programming, mathematical reasoning, and multi-hop question answering—ParamMem consistently delivered superior performance compared to state-of-the-art baselines. Notable improvements were observed in HumanEval, MBPP, MATH, HotpotQA, and 2WikiMultiHopQA datasets, with enhanced sample efficiency and the ability to self-improve without stronger external models. In particular, the parametric module demonstrated significant gains in task-specific assessments (Figure 3).

Figure 3: An output example on programming task.

Further experiments demonstrated that the reflective diversity induced by ParamMem extends beyond static settings into dynamic interactions (Figure 4). ParamMem's generated reflections provide agents with a broader set of diagnostic hypotheses, thereby increasing the likelihood of successful refinements.

Figure 4: Pairwise cosine distance distribution.

Implications and Future Work

ParamMem introduces a new paradigm for enhancing the capabilities of language agents through parametric reflective memory. Its potential lies in the lightweight nature of the parametric module, which can be integrated into existing frameworks to provide additional reflective diversity. This innovation suggests new directions for research in efficient memory-driven reasoning and diverse reflection generation in AI systems.

The implications extend to self-improvement capabilities of language agents, as ParamMem can continually enhance reasoning skills through reflective diversity, even in the absence of stronger external models. Future developments might focus on optimizing token efficiency, balancing the trade-offs between reflective diversity and computational costs.

Conclusion

ParamMem represents a significant advancement in the integration of parametric reflective memory for language agents, demonstrating substantial performance improvements across diverse domains. The unified memory framework broadens the potential for agents to achieve higher reasoning capabilities through enhanced reflective diversity. Further exploration will undoubtedly illuminate additional applications and refinements to this promising approach.

Markdown Report Issue