TRM: Tiny Recursive Model for Efficient Reasoning

Updated 7 October 2025

Tiny Recursive Model (TRM) is a recursive reasoning paradigm that uses a single, shallow network to update latent states and refine solutions iteratively.
The model leverages full backpropagation on the final recursion to enable effective error correction and strong generalization across benchmarks like ARC and Sudoku.
TRM achieves high performance with minimal parameters, making it a highly efficient alternative to hierarchical models, especially in settings with limited data.

A Tiny Recursive Model (TRM) is an architectural paradigm for recursive reasoning and adaptation in machine learning, characterized by its use of a minimal parameter footprint and a shallow, single-network structure. Unlike previous hierarchical or multi-module approaches, TRMs attain high levels of generalization and problem-solving capability by leveraging iterative refinement of latent reasoning and solution features using repeated internal updates. This architecture is especially notable for outperforming much larger models on hard combinatorial reasoning tasks, while using orders of magnitude fewer parameters and requiring only modest training data.

1. Model Architecture and Core Design

A TRM consists of a single, small neural network—typically with only two layers—which is applied recursively to update both a latent reasoning trace and the current solution. This sharply contrasts with hierarchical models such as the Hierarchical Reasoning Model (HRM), which utilize two separate networks—a low-level module and a high-level module operating at different recursion frequencies. In TRM, all reasoning occurs within one network, simultaneously updating both the latent feature $z$ and the answer token $y$ .

Key details:

The TRM network may be instantiated with self-attention blocks (TRM-Att) for tasks requiring large contexts, or with multilayer perceptrons (TRM-MLP) for more localized problems (e.g., Sudoku).
Inputs: embedded question $x$ , current embedded solution $y$ , and latent reasoning feature $z$ .
Outputs: updated $z$ and a refined $y$ , with the answer head performing an argmax over the final solution vector.

Formally, the primary update rules can be summarized as: $\begin{array}{l} z \gets \mathrm{net}(x, y, z) \ y \gets \mathrm{net}(y, z) \end{array}$ where the recursion is run for a fixed number of iterations $n$ .

A simplified architecture diagram: $\begin{array}{c} \textbf{Input:}\; x, y, z \ \downarrow \ \text{Tiny Network (2 layers)} \ \downarrow \ \text{Updated }\; z\rightarrow\text{Refined }\; y \ \downarrow \ \text{Output head (argmax decodes the final answer)} \end{array}$

2. Recursive Reasoning and Training Procedure

The distinguishing feature of the TRM is its recursive update process. Rather than relying on deep, static network architectures, the TRM iteratively refines its output through multiple forward passes. In each recursion:

$z$ is updated using both $x$ and $y$ ;
$y$ is improved using the most recent $z$ .

The training procedure applies a sequence of $T$ recursions: the first $T-1$ recursions are conducted without gradient tracking; the final recursion is computed with full backpropagation through the complete chain, allowing deep supervision over intermediate latent states.

This approach diverges from HRM’s “fixed-point” assumptions and implicit function theorem–based gradient shortcutting. Instead, TRM’s use of explicit deep supervision, leveraging the entire trajectory of latent updates, enables stronger credit assignment and error correction across all recursive steps.

3. Generalization, Empirical Results, and Task Performance

TRM achieves substantial gains in generalization and problem-solving benchmarks:

ARC Benchmarks:
- ARC-AGI-1: TRM-Att achieves $44.6\%$ test accuracy (vs. HRM’s $40.3\%$ and Gemini 2.5 Pro’s $4.9\%$ ).
- ARC-AGI-2: TRM-Att reaches $7.8\%$ test accuracy (vs. HRM’s $5.0\%$ ).
Sudoku-Extreme: TRM-MLP achieves $87.4\%$ accuracy ( $> 55.0\%$ for HRM).
Maze-Hard: TRM-Att attains $85.3\%$ accuracy.

These metrics indicate TRM’s capability to outperform hierarchical reasoning models and state-of-the-art LLMs—often by a clear margin and with less than $0.01\%$ the parameters.

4. Parameter Efficiency and Computational Aspects

TRMs are exceptionally lightweight:

Parameter counts of $7$ million in standard form, $5$ million for MLP variants.
Far lower than HRM ($27$ million) and orders of magnitude below large transformer LLMs (e.g., Deepseek R1 at $671$ billion).
Efficiency is achieved by recursive refinement rather than traditional depth, enabling higher “effective depth” via iteration without increased parameterization.

By operating on small datasets (e.g., $1000$ examples for Sudoku), TRMs sidestep overfitting, support efficient training, and retain competitive performance.

5. Simplified Reasoning without Hierarchical or Biological Assumptions

TRM design is explicitly non-hierarchical and avoids complex biological inspiration. In HRM, $z_H$ and $z_L$ are loosely motivated by biological systems recursing at different temporal frequencies and spatial scales. TRM reinterprets these as the answer feature $y$ and latent reasoning trace $z$ . There is no reliance on fixed-point convergence or neurologically plausible separation.

Iterative refinement in TRM is only minimally related to biological systems; the focus is on algorithmic clarity and computational soundness rather than neurobiological analogy.

6. Challenges, Limitations, and Future Directions

While TRM establishes clear improvements, notable challenges remain:

Increasing the recursion depth $n$ demands more memory; excessive recursion may result in out-of-memory (OOM) errors. Hyperparameters must be tuned per task.
The architecture is task-sensitive: MLPs excel in small contexts, while self-attention is preferable with larger input sizes (e.g., $30\times30$ ARC matrices).
The current form is deterministic and supervised, which may limit applications where multiple valid outputs exist—generative extensions are a natural direction.
Theoretical underpinnings regarding why iterative recursion delivers better generalization than increased static depth are unresolved.

7. Summary and Contextual Significance

The Tiny Recursive Model (TRM) introduces a paradigm shift in recursive reasoning and generalization, characterized by a single, shallow network that uses iterative latent refinement to achieve superior performance on hard tasks with minimal data and parameters. TRM’s recursive nonhierarchical reasoning pipeline avoids many of the complexities and limitations of earlier biologically inspired or hierarchical frameworks. The model demonstrates state-of-the-art accuracy across benchmarks while underscoring the impact of deep supervision through full backpropagation and the strategic separation of answer and reasoning latent states.

Open questions regarding optimal recursion depth, task-dependent architecture adaptation, and generative capabilities motivate further research. TRM exemplifies the potential for concise, efficient architectures in challenging cognitive domains, serving as an anchor for future developments in tiny model design and recursive inference.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Tiny Recursive Model (TRM).