Dynamic Cheatsheet (DC) Framework

Updated 13 October 2025

Dynamic Cheatsheet is a lightweight framework that equips language models with persistent, adaptive memory for test-time learning by curating problem-solving strategies.
The framework uses a dual-module system—with Generator and Curator modules—and integrates retrieval synthesis to efficiently update its external memory during inference.
Empirical results show that DC significantly boosts accuracy on arithmetic, reasoning, and code tasks by enabling cumulative learning without modifying model parameters.

Dynamic Cheatsheet (DC) is a lightweight framework that equips black-box LLMs with persistent, adaptive memory to achieve test-time learning—enabling models to incrementally store, curate, and apply distilled problem-solving strategies, code snippets, and heuristics across sequential inference queries. Unlike static prompting or parameter finetuning, DC supplies external, self-curated memory that evolves during inference, allowing models to recall and reuse high-impact insights and systematically improve performance over time, even without explicit labels or human supervision. This approach bridges the gap between isolated inference and cumulative, experience-driven learning characteristic of human cognition.

1. Architectural Composition

Dynamic Cheatsheet operates as a modular extension on standard large LMs, introducing two principal components:

Generator Module (Gen):

Receives current query $x_i$ and curated memory $M_i$ .
Produces candidate output:

$\tilde{y}_i = \text{Gen}(x_i, M_i)$

Integrates LM reasoning and stored prior insights.

Curator Module (Cur):

Evaluates each output $\tilde{y}_i$ post-inference.
Updates memory non-parametrically:

$M_{i+1} = \text{Cur}(M_i, x_i, \tilde{y}_i)$

Curation emphasizes correctness, generality, and brevity; only essential strategies, heuristics, and executable snippets are retained.

Retrieval Synthesis (DC-RS):

Augments memory update by pre-selecting the top- $k$ most similar historical input–output pairs.
DC-RS integrates retrieval into the memory, further biasing the Generator toward relevant historical solutions.

This architecture is external to the LM core; it does not modify model parameters and operates solely at inference.

2. Persistent, Evolving Memory

The hallmark of DC is a self-curated, persistent memory $M_i$ that accumulates problem-solving knowledge throughout a session:

Memory is maintained outside LM weights—models are black-box.
After each answer, Cur extracts transferable solution details—code routines, algebraic strategies, reference guides—pruning irrelevant or erroneous information.
DC avoids context bloat from naive transcript appending; the curated memory consists of concise, implementation-ready artifacts for rapid reuse.
Memory is dynamic: as better strategies emerge or errors are detected, content is updated; failed heuristics are discarded, correcting the LM’s behavior in subsequent inference.

This adaptive curation mechanism is analogous to human note-taking—selecting only generalizable, high-yield insights for retention.

3. Quantitative Impact on Performance

Dynamic Cheatsheet demonstrably yields substantial accuracy improvements across diverse tasks:

Task	Baseline Accuracy	Accuracy with DC
Game of 24 (GPT-4o)	10%	99%
AIME 2024 (Claude 3.5)	23.3%	50%
Arithmetic Balancer	45–50%	98–100%
GPQA-Diamond	—	+9%
MMLU-Pro	—	+8%

In Game of 24, DC enabled GPT-4o to discover and store a brute-force Python solution; reusing this code eliminated manual errors, escalating accuracy from 10% to 99%.
On AIME math exams, Claude retained algebraic insights and templates, leading to more than double the baseline accuracy.
In error-prone numerical tasks (Equation Balancer), both models approached perfect accuracy by recalling validated computational snippets.
For knowledge-demanding tasks, performance gains were attributed to cumulative retention of reference tables and facts.

4. Comparative Analysis to Baselines

DC’s persistent memory distinguishes it from prevalent alternatives:

Static Prompting: Baseline approaches concatenate predefined instructions; ML performance plateaus, lacking cumulative adaptation.
Full Transcript History: Includes prior context in each query, flooding context window with irrelevant or redundant detail, impairing focus and efficiency.
DC- $\emptyset$ (Structured, Non-Evolving): Uses fixed, general prompts; lacks learning-by-retention, and performance remains comparable to static baselines.
Dynamic Retrieval (DR): Only retrieves historical answers but does not curate; inferior to DC in accuracy gains.

DC surpasses these by systematic retention and curation—fostering informed, error-corrected responses without internal model modification.

5. Application Domains and Use Cases

Dynamic Cheatsheet is broadly applicable across knowledge-intensive and error-prone domains:

Mathematical Reasoning: Models store algebraic, combinatorial strategies, reusing them across exam-style problems (e.g., AIME, MMLU-Pro).
Heuristic Puzzles/Coding: For problems reliant on algorithmic logic (Game of 24), DC propagates and reuses validated scripts.
Arithmetic/Equation Tasks: Recalling code routines prevents repeated calculation mistakes.
Domain Knowledge (Engineering, Physics): Retaining tables, formulas, and reference results boosts performance on general knowledge exams (GPQA).

In each use case, DC bridges isolated inferences, constructing a session-specific “cheatsheet” of distilled, actionable knowledge.

6. Self-Curation Mechanism and Error Correction

DC’s Curator module adopts a strategy of continual refinement:

Extracts correct and reusable solution fragments.
Prunes irrelevant, erroneous, or overly specific content.
Curation occurs without ground-truth labels—internal model validation or heuristic checks are used for correctness.
If a newly generated solution is found to supersede prior content (e.g., more general, less error-prone), memory is updated; if mistakes are detected, faulty heuristics are removed.
This active self-curation prevents error propagation and enables the model to incrementally adapt and improve per-task.

7. Implications and Future Directions

Dynamic Cheatsheet advances LM test-time learning without finetuning or supervision. The framework holds several implications and open directions:

Continuous Post-deployment Evolution: By maintaining an external, adaptively refined “cheatsheet,” models can incrementally adapt to new domains and user needs.
Wrappers for Black-Box APIs: Approaches like DC enable intelligent augmentation of proprietary/commercial LMs absent access to internal parameters or large-scale retraining capabilities.
Tool Use and Automation: The frequent adoption of code routines and computational heuristics suggests future integration with external APIs or tool-chains for more robust reasoning.
Scalability: Hierarchical or domain-specialized memory architectures may facilitate efficient retention of diverse reasoning strategies.
AI Reliability and Robustness: DC’s paradigm of session-based cumulative learning raises the standard for adaptive, reliable AI systems in real-world deployments.

Summary

Dynamic Cheatsheet (DC) implements a dual-module framework that retrofits black-box LMs with a persistent, self-curating external memory. By systematizing session-specific retention of high-impact strategies and code artifacts, DC achieves substantial, label-free test-time learning and robustness. Performance improvements are robust across mathematical reasoning, algorithmic puzzles, and knowledge-intensive tasks, demonstrating that dynamic, self-curated memory marks a promising path in augmenting LLMs with human-like, cumulative reasoning capabilities (Suzgun et al., 10 Apr 2025).

PDF Markdown Chat (Pro)

References (1)

Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Dynamic Cheatsheet (DC).