Memory-Aware Test-Time Scaling (MaTTS)

Updated 5 October 2025

Memory-Aware Test-Time Scaling (MaTTS) is a paradigm that combines enhanced test-time computation with structured memory extraction to improve agent reasoning.
It uses parallel scaling to generate diverse reasoning trajectories and sequential scaling for iterative refinement, thereby distilling transferable problem-solving strategies.
Integrating with ReasoningBank, MaTTS creates a feedback loop that continuously updates memory, leading to improved task efficiency and generalization across domains.

Memory-Aware Test-Time Scaling (MaTTS) is a paradigm in reasoning-augmented machine learning that couples the allocation of increased inference-time computational resources with mechanisms for extracting, consolidating, and deploying memory from prior reasoning episodes. The principal goal is to enable agents, or models, to improve their effectiveness and efficiency on continuous or challenging streams of tasks by leveraging richer, more diverse experience and using this experience to distill generalizable reasoning strategies. MaTTS accelerates self-evolution in LLM agents through synergistic interaction with persistent memory frameworks such as ReasoningBank, producing emergent compositional strategies and enhanced generalization.

1. Core Mechanisms of Memory-Aware Test-Time Scaling

MaTTS is articulated as the joint process of scaling up agent interaction experience at test time and constructing memory representations that are not limited to raw success/failure trajectories but distill higher-order reasoning principles. It operates via:

Parallel scaling: Agents generate $k$ independent trajectories for each query, leveraging extra compute to diversify reasoning and candidate solutions. The best trajectory $\tau^* = \mathrm{argmax}_{i=1,\ldots,k}\,f(\tau_i)$ is selected according to a self-assessment scoring function, often based on self-judgment using LLM evaluation.
Sequential scaling: Agents iteratively refine a single trajectory, capturing reasoning corrections, verifications, and stepwise self-improvement. Each refinement offers a contrastive signal for memory extraction, increasing the depth and compositionality of the resulting memory.

This expanded trajectory pool furnishes the agent with more opportunities to extract reasoning strategies from both successes and failures, exceeding the capabilities of prior mechanisms that store only raw or successful routines.

2. ReasoningBank: Structured Memory Framework

ReasoningBank is implemented to store distilled reasoning strategies in structured form:

Field	Description	Example Content
Title	Concise identifier	"Order Date Location Strategy"
Description	Brief summary of the reasoning task	"Find the earliest order date by checking history"
Content	Detailed reasoning steps/insights	Stepwise sequence outlining navigation decisions

Prior to agent deployment on a task, relevant ReasoningBank items are retrieved via embedding-based similarity search and used to guide interaction. Upon task completion, the agent synthesizes new memory items from generated trajectories using extraction pipelines (often involving LLM self-judgment). This process directly supports the closed-loop interaction between memory and exploration.

3. Experience Scaling and Contrastive Memory Extraction

MaTTS exploits experience scaling—the expansion of candidate trajectories via additional compute—to generate rich contrastive signals. In parallel scaling, candidate trajectories $\{\tau_1,\ldots,\tau_k\}$ are compared using contrastive self-assessment; in sequential scaling, intermediate reasoning signals $\{R_1,\ldots,R_T\}$ from iterative refinement are collected.

This contrast enables more precise identification of common elements in effective problem-solving and highlights pitfalls, thereby enhancing memory consolidation. Memory extraction functions $E(\cdot)$ operate on selected trajectories and reasoning signals, yielding new memory items that are added to ReasoningBank:

$\boldsymbol{M}_{\text{new}} = E(\tau^*)$

$\boldsymbol{M} \gets \boldsymbol{M} \cup \boldsymbol{M}_{\text{new}}$

4. Synergy: Virtuous Cycle Between Scaling and Memory

A defining feature of MaTTS is its integrated feedback loop:

Memory-driven exploration: Retrieved memory items from ReasoningBank inform exploration, narrowing the trajectory search space toward promising regions.
Exploration-driven memory improvement: Richer, more diverse exploration (enabled by scaling) generates contrastive evidence, improving the distillation quality and generalizability of memory.
Emergent behaviors: Over extended interaction, the agent’s memory transitions from procedural heuristics to complex compositional reasoning, with new strategies naturally arising.

This positive feedback loop accelerates agent adaptation and results in improved efficiency (fewer interaction steps per task) and higher task success rates on diverse benchmarks.

5. Empirical Results: Benchmarks and Applications

MaTTS has been empirically validated across domains including:

Web browsing (WebArena, Mind2Web): Parallel scaling with $k=5$ increased success rates (e.g., from 49.7% to 55.1%) and decreased average steps on shopping navigation tasks, indicating improved both effectiveness and efficiency.
Software engineering (SWE-Bench-Verified): Scaling agent interaction during repository-level patch generation led to higher resolution rates (e.g., increasing from 54% to 57.4%) and faster convergence to correct solutions.

These results demonstrate that MaTTS, combined with ReasoningBank, consistently outperforms prior memory mechanisms—such as those storing only raw trajectories or successful routines—by efficiently extracting and reusing transferable reasoning strategies.

6. Mathematical Representation and Policy Formalism

The memory-aware agent’s policy $\pi$ integrates both observation and memory, operating as:

$\pi(o_{0:t}, a_{0:t}; \boldsymbol{M}, \mathcal{A}) \rightarrow a_{t+1}$

where $\boldsymbol{M}$ is the ReasoningBank memory and $\mathcal{A}$ the available action set.

Parallel scaling process:

Generate $\tau_i$ for $i=1,\ldots,k$ .
Select $\tau^* = \mathrm{argmax}_i f(\tau_i)$ .
Extract $\boldsymbol{M}_{\text{new}} = E(\tau^*)$ and update memory.

Sequential scaling process:

Generate and refine reasoning signals $R_j$ at step $j$ .
Aggregate $R_j$ for memory extraction.

7. Significance for Continual Self-Evolution and Agent Generalization

The MaTTS paradigm establishes a new scaling dimension—memory-driven experience scaling—enabling agents to self-evolve with emergent reasoning behaviors. It facilitates continual improvement and generalization across domains without retraining, vital for persistent real-world applications. The synergy between memory extraction and experience scaling promotes both accuracy and sample efficiency, as memory-equipped agents use contrastive signals from expanded test-time computation to accrue compositional, transferable reasoning expertise.

This approach transforms the persistent agent into a continually self-improving learner, capable of adapting its scaling and reasoning strategies through structured experience, contrastive feedback, and incremental memory consolidation across tasks and domains (Ouyang et al., 29 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Memory-Aware Test-Time Scaling (MaTTS).