Dice Question Streamline Icon: https://streamlinehq.com

Explain why recursion yields superior generalization over scaling depth or size

Establish a theoretical explanation for why recursive reasoning with deep supervision, as implemented in the Tiny Recursion Model (TRM), provides markedly better generalization than increasing model size or depth in conventional supervised architectures, and ascertain whether mitigation of overfitting is the primary mechanism responsible for the observed gains.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper reports that TRM, which recursively updates a latent reasoning state and iteratively refines its answer, achieves substantially higher accuracy on several puzzle benchmarks compared with larger or deeper non-recursive models and with HRM. Despite this empirical success, the authors explicitly acknowledge a lack of theoretical understanding of why recursion confers such generalization benefits.

They hypothesize that reduced overfitting may be involved but state they have no theory to support this claim, leaving the mechanism and conditions under which recursion outperforms scaling as an unresolved question.

References

Although we simplified and improved on deep recursion, the question of why recursion helps so much compared to using a larger and deeper network remains to be explained; we suspect it has to do with overfitting, but we have no theory to back this explaination.

Less is More: Recursive Reasoning with Tiny Networks (2510.04871 - Jolicoeur-Martineau, 6 Oct 2025) in Conclusion