Tiny Recursion Model (TRM)

Updated 7 October 2025

Tiny Recursion Model (TRM) is a minimalist recursive reasoning approach that uses a two-layer neural network to iteratively refine both the answer and a latent feature with shared parameters.
It employs a recursive update loop with deep supervision, allowing the model to emulate significant effective depth without increasing parameter count, facilitating efficient iterative computation.
TRM demonstrates superior generalization on tasks like ARC-AGI, Sudoku, and Maze challenges, outperforming larger models with metrics such as 45% versus 40.3% on ARC-AGI-1.

The Tiny Recursion Model (TRM) is a minimalistic approach to recursive reasoning and computation, emphasizing extreme parameter efficiency and simple iterative architectures. Originating in the paper of hard reasoning tasks such as ARC-AGI, Sudoku, and complex grid-based puzzles, TRM is defined by the use of a single small neural network that executes recursive updates on an answer and a latent feature. Recent formulations demonstrate that TRM attains superior generalization with drastically fewer parameters and minimal data relative to hierarchical or large neural models.

1. Minimal Architecture and Recursive Mechanics

TRM is instantiated as a two-layer neural network, with a total parameter count ranging from 5M to 7M, considerably smaller than both LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) and competing recursive designs such as the Hierarchical Reasoning Model (HRM, 27M parameters). The central recursion loop in TRM jointly updates two representations: the current answer $y$ and a latent reasoning feature $z$ . Each iteration comprises a latent refinement step:

$z \gets \mathrm{net}(x, y, z)$

where $x$ encodes the question/information input, followed by an answer update:

$y \gets \mathrm{net}(y, z)$

After $n$ recursions, the output head produces the tokenized answer prediction. The process may be stratified into $T-1$ iterations without gradients (i.e., inference), then a final iteration applying backpropagation for learning. Unlike HRM, which splits the reasoning pipeline into distinct networks recursing at different frequencies and relies on fixed-point approximations, TRM backpropagates through the entire recursion (the recursive chain), removing the need for theoretical equilibrium guarantees and simplifying both implementation and analysis. For large grids, self-attention can be incorporated but is omitted for small context sizes.

2. Recursive Reasoning Paradigm

TRM expresses recursive reasoning by repeatedly improving both solution and reasoning state. This process is algorithmically captured by:

for j in range(T-1):
    (y, z) = latent_recursion(x, y, z, n)  # gradient-free recursions
(y, z) = latent_recursion(x, y, z, n)      # final recursive update with gradients
return output_head(y)

This recursive loop enables refinement over intermediate states, akin to an iterative chain-of-thought, distinguishing between the current answer and the latent process used to derive it. Deep supervision methodology is used: each sample is presented for up to $N_\text{sup}=16$ iterations, with the last latent state from the previous step serving as initialization. This approach effectively yields large network depth through parameter re-use without increasing model size.

3. Generalization and Performance Metrics

TRM demonstrates strong generalization characteristics even in low-data regimes. On ARC-AGI-1, with only approximately 800 tasks and modest data augmentation, TRM achieves 45% test accuracy, outperforming HRM (40.3%) and large chain-of-thought LLMs (which do not surpass this performance despite extreme parameter counts). On ARC-AGI-2 (with about 1120 tasks and augmentation), TRM reaches 8% test accuracy, notably higher than HRM (5%). For Sudoku-Extreme with 1000 samples, TRM achieves 87.4% (MLP variant); for Maze-Hard with larger grids (30x30), TRM attains approximately 85.3%. These results emphasize parameter and data efficiency, with deep supervision and repeated recursion accounting for the ability to "unroll" effective depth far beyond the raw model size.

Table: Summary of TRM Performance

Task	Parameters (Millions)	TRM Test Accuracy	HRM Test Accuracy
ARC-AGI-1	~7	45%	40.3%
ARC-AGI-2	~7	8%	5.0%
Sudoku-Extreme	~7	87.4%	55%
Maze-Hard	~7	85.3%	(not reported)

4. Data Regime, Augmentation, and Capacity Control

TRM is designed for environments with severe data constraints. Typical training set sizes are on the order of $10^3$ samples. Despite this, TRM exploits aggressive data augmentation (color permutations, group rotations, flips, translations) to enhance generalization. The intentional restriction to a tiny network (2 layers, $\sim$ 7M parameters) minimizes overfitting in such sparse data settings and permits intensive recursive training steps without computational overhead.

The recursive process enables the emulation of substantial effective depth. For instance, under deep supervision, one configuration achieves up to 42 layers per supervision step (i.e., as many function compositions as would occur in a 42-layer feedforward network), yet all parameters are shared, and only the output head is separately evaluated.

5. Biological Motifs and Representation Splitting

Although the predecessor HRM model was biologically motivated—supposing hierarchical processing at multiple temporal frequencies—TRM eschews these analogies. Instead, TRM represents the reasoning process through a pragmatic split:

The answer ( $y$ ): holds the current candidate solution.
Latent feature ( $z$ ): encodes the evolving chain-of-thought.

Passing both quantities through recursive steps enables retention of solution and reasoning trace, with no explicit reliance on separate frequencies or biologically motivated modules. This representation split is posited as sufficient for effective recursive reasoning, regardless of biological parallels.

6. Empirically Observed Generalization and Practical Limitations

TRM's iterative improvement mechanism—recursive updates coupled with deep supervision—emerges as the critical driver of its generalization, particularly on puzzle tasks where solutions may be logically inferred or constructed through iterative refinement. However, the precise reasons for breakthrough generalization remain not fully explained. Hypotheses include avoidance of overfitting due to tiny network size and more efficient utilization of limited data.

Choice of recursion hyperparameters is task-specific, and optimal values are found by empirical tuning. For small grid/sequence tasks, MLP architectures suffice, but for larger or variable context problems, self-attention-based alternatives may be necessary.

TRM is deterministic: outputs are singular. Extension to generative or probabilistic modeling, which accommodates multiple valid solutions, is not present in current implementations and represents an avenue for future work.

7. Theoretical Minimality and Recursion Model Connections

TRM is consistent with minimalist traditions in recursion research, including connections to primitive recursive automata (0907.4169), minimal linear recursive systems (Alves et al., 2010), and resource-efficient logic recursion (Grohe et al., 2012). The design goal—achieving recursive reasoning capability with the smallest possible network and the minimal number of architectural components—is paralleled in theoretical work aiming for Turing completeness or logical expressiveness in extremely sparse or bounded formalisms. The use of iterative latent updates reflects theoretical recursive approaches, while the avoidance of complex state management aligns with findings in minimal automata composition.

8. Open Questions and Extensions

TRM establishes new performance standards in hard reasoning tasks for tiny models, yet several open questions remain:

The mechanism underlying substantial generalization from iterative recursion warrants deeper theoretical analysis, possibly invoking connections with proof search, logic programming, or algebraic recursion models.
Scaling to longer contexts may necessitate hybrid architectures or dynamic recursion parameterization.
Introducing stochastic, generative, or uncertainty-preserving outputs is an unsolved problem and is required for broader real-world applicability.

TRM's architecture and training methodology invite exploration in a variety of limited-data and reasoning-centric settings. The continued development and analysis of TRM may further clarify fundamental relationships between recursion depth, parameter minimization, and generalization in neural systems.