ArcMemo Framework for Continual Learning

Updated 7 September 2025

ArcMemo is an external memory framework that stores and retrieves modular, abstract reasoning concepts in natural language.
It employs dedicated MemWrite and MemRead operations to capture solution traces and integrate abstract concepts into LLM prompts.
Empirical results on ARC-AGI demonstrate a 7.5% relative improvement using the Program Synthesis format, enhancing reasoning performance.

ArcMemo is an external memory framework designed to augment LLMs with persistent, abstract reasoning capabilities. It facilitates the retention and reuse of key patterns and insights discovered during inference, overcoming the conventional limitation where such findings are discarded when the context window resets. By structuring memory at the concept level—storing reusable, modular abstractions distilled from solution traces in natural language—ArcMemo enables test-time continual learning without modifying model weights. The framework introduces dedicated strategies for abstraction, modular storage, and selective retrieval, yielding improved performance on reasoning-intensive tasks, as demonstrated on the ARC-AGI benchmark.

1. Framework Architecture and Memory Operations

ArcMemo is centered on two principal operations: MemWrite (concept abstraction and storage) and MemRead (selective retrieval of concepts). Memory is populated with entries representing distilled reasoning patterns, referred to as "concepts." These are maintained in natural language and organized for modular reusability. The framework provides two memory formulations:

Open-Ended (OE) Format: Memory entries are structured as flexible “situation–suggestion” pairs, capturing a broad range of abstraction levels.
Program Synthesis (PS) Format: A more rigid structure, annotating entries with fields such as titles, descriptions, parameter lists, and output typing, analogous to function signatures. This promotes modularity and recombinability.

During inference, the MemRead procedure selects a subset of relevant concepts from memory, integrating them into the LLM’s prompt context to guide reasoning. Memory expansion via MemWrite occurs retrospectively, distilling new takeaways from successful or partially successful solution traces.

2. Concept-Level Memory and Modular Abstraction

ArcMemo establishes concept-level memory by decomposing reasoning traces into individual, modular abstractions. Each concept entry encapsulates a stand-alone insight—such as a rule, a transformation subroutine, or a generalized solution fragment—independent of the specific query context. This modularity enables the recombination and flexible reuse of concepts across distinct queries. Such disentanglement stands in contrast to instance-based memory systems, which store complete query–response pairs or tightly coupled summaries, limiting adaptability when only partial knowledge transfer is required.

The concept-level approach fosters transfer and continual agent improvement, as concepts distilled from one instance are readily applicable to tasks that share underlying reasoning patterns, enabling coverage over superficially divergent problem sets.

3. Test-Time Continual Learning Without Weight Updates

A distinctive property of ArcMemo is its facilitation of test-time continual learning in LLMs without recourse to explicit weight updates, retraining, or fine-tuning. Upon solving a problem or receiving feedback (e.g., self-verification, execution outcomes), the system abstracts the trace into new concept-level entries via MemWrite. These concepts are then available for retrieval in subsequent queries, resulting in a feedback-driven memory expansion and refinement. The iterative pattern of solving, abstracting, and integrating enables the LLM to utilize accumulated experiential knowledge, adaptively improving reasoning across tasks.

This workflow circumvents the latency and resource requirements of model retraining, leveraging external, persistent memory as the locus of continual adaptation.

4. Performance and Evaluation Metrics

ArcMemo’s performance was rigorously validated on the ARC-AGI benchmark. Empirical results demonstrate that the structured Program Synthesis memory formulation yields a 7.5% relative gain over a strong no-memory baseline. Performance was assessed using oracle-based scoring protocols, notably Oracle@1 and Oracle@2. For example, the ArcMemo-PS approach achieved an Oracle@2 score of approximately 59.33, consistently outperforming baseline methods across all tested inference compute scales.

Incremental improvements with additional inference compute (such as retrying with further reasoning traces) illustrate the scalability of the method, with the integration of relevant abstract concepts in the prompt driving enhanced problem-solving depth.

Approach	Oracle@2 Score	Relative Gain (%)
No-Memory Baseline	(lower than 59.33)	Reference baseline
ArcMemo-PS	59.33	7.5

5. Dynamic Updates and Self-Improvement

ArcMemo supports dynamic, test-time memory updates, enabling self-improvement as more experiences are accumulated. Unlike fixed memory settings, the framework continuously incorporates new abstracted concepts following problem solutions. This dynamic process has empirically shown superior performance to static memory, indicating that on-the-fly abstraction and expansion promote adaptive generalization.

As additional puzzles are solved, the system’s concept repository grows richer, enhancing its ability to tackle novel or more challenging tasks through retrieval and recombination of accumulated reasoning patterns. This suggests the utility of adaptive, evolving memory structures in persistent agent development.

6. Implementation Methodology

Implementation details are provided via formal pseudocode algorithms, which specify the high-level workflow:

For each query:
- MemRead: Retrieve relevant concept abstractions.
- Prompt Construction: Insert these concepts into the LLM input context.
- Inference: Generate a prediction using the augmented prompt.
Periodically or in batches:
- MemWrite: Abstract new reasoning traces into memory based on feedback.

Memory format detail:

OE Format: Each entry minimally consists of “situation” and “suggestion” fields, abstracted from solution traces.
PS Format: Entries are structured with title, description, parameter annotation, and output typing, supporting compositional abstraction akin to function prototypes.

Preprocessing incorporates techniques such as vision-LLMs to caption input features (e.g., ARC puzzles) into natural language descriptors, facilitating concept matching. Retrieval optimization leverages mathematical functions (e.g., argmax) to maximize selection criteria.

7. Broader Implications and Directions

ArcMemo’s architecture paves the way for further research in modular, persistent lifelong memory for reasoning systems. Potential extensions include hierarchical consolidation strategies—merging, pruning, or reorganizing concepts to reduce redundancy and improve scalability. Addressing problem order sensitivity—in which the sequence of tasks influences memory evolution—remains an open challenge.

More broadly, the principle of abstract, modular external memory underpins emerging paradigms in lifelong learning, where agents continually accrete, refine, and retrieve knowledge as they encounter diverse tasks. A plausible implication is that such frameworks could catalyze the development of agents capable of robust, self-improving performance in a variety of reasoning-intensive domains.

In summary, ArcMemo advances persistent, adaptive reasoning for LLMs by distilling solution traces into abstract, modular memory entries, enabling dynamic, test-time continual learning. This approach is substantiated by improved generation metrics on ARC-AGI and offers promising trajectories for future agent and framework designs.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to ArcMemo Framework.