Hierarchical Reasoning Model (HRM)

Updated 1 July 2025

Hierarchical Reasoning Model (HRM) is an architecture that decomposes complex tasks into high-level planning and low-level execution layers for efficient sequential problem-solving.
HRM achieves high performance on complex reasoning benchmarks like Sudoku and ARC-AGI using significantly less data and parameters than large language models.
Distinct from models like Chain-of-Thought, HRM uses latent internal states for robust, dynamic reasoning and enables greater computational efficiency and depth.

A Hierarchical Reasoning Model (HRM) is an architectural framework for sequential decision-making and problem solving that decomposes complex reasoning tasks into nested layers of abstract planning and detailed execution. The HRM approach is motivated by the multi-level, multi-timescale structure of information processing in the human brain and is distinguished from traditional sequential models such as Transformers or Chain-of-Thought (CoT) pipelines by its explicit separation of high-level planning from low-level computation, dynamic recurrence across both levels, and computational and data efficiency (Wang et al., 26 Jun 2025 ).

1. Model Architecture: High-level and Low-level Hierarchies

The central feature of HRM is its division into two interdependent recurrent modules:

High-level Module (H): Operates at a slow timescale, performing abstract planning and maintaining a high-level hidden state ( $z_H$ ). It updates its state only once per reasoning segment (cycle), integrating results from the low-level module to guide the next strategic phase.
Low-level Module (L): Runs at a fast timescale within each high-level cycle, executing detailed local computations necessary to implement the abstract plan. Its hidden state ( $z_L$ ) is updated at every step, conditioned on the current high-level context.

The model processes an input $x$ using these components as follows:

$\begin{align*} \tilde{x} &= f_r(x; \theta_r) \ z_L^{(i)} &= f_L(z_L^{(i-1)}, z_H, x; \theta_L) \quad \text{for } i = 1, ..., T \ z_H' &= f_H(z_H, z_L^{(T)}; \theta_H) \ y &= f_o(z_H; \theta_o) \end{align*}$

where $f_r$ is the input embedding network, $f_L$ and $f_H$ are encoder-only Transformer blocks (or similar), $T$ is the number of low-level steps per high-level cycle, and $f_o$ is the output head.

This process is repeated for $N$ high-level cycles, resulting in an NT-step reasoning trace per forward pass.

Hierarchical Convergence

A distinctive property is hierarchical convergence: within each high-level cycle, L converges to a local solution, after which H integrates this and "restarts" L's state for further planning. This supports deep, structured computation that is robust to local errors and enables dynamic adjustment of reasoning depth.

2. Computational Efficiency and Training Protocol

Efficient Recurrence: By explicitly structuring the computation into hierarchical cycles and steps, HRM achieves a much greater effective model depth than standard feedforward or shallow recurrent models, with strong stability.
Adaptive Computation Time (ACT): The number of high-level cycles (i.e., “how long to think”) is not fixed, but dynamically selected using ACT strategies. A Q-learning approach determines segment halting, with deep supervision only at the high-level states.
One-step Gradient Learning: Rather than using full backpropagation through time (BPTT), HRM employs a biologically plausible and memory-efficient “one-step” gradient approximation:

$\frac{d z^*_H}{d \theta} \approx \frac{\partial f_H}{\partial \theta}$

This significantly reduces memory requirements to $O(1)$ versus the $O(T)$ of BPTT, enabling large batches and stable optimization.

Parameter and Data Efficiency: In experimental settings, HRM attains high performance (e.g., near-perfect on Sudoku-Extreme and complex maze tasks) with only 27 million parameters and 1,000 training samples—orders of magnitude less than leading CoT and LLM models that require pretraining and massive annotated datasets.

3. Empirical Results and Benchmark Achievements

Sudoku-Extreme (9x9) and Maze-Hard (30x30)

HRM solves extremely challenging Sudoku instances, averaging 22 necessary backtracks, with nearly perfect accuracy. Baseline Transformers (same parameter count) and CoT models, even with significantly more data, fail under these conditions.
HRM solves optimal pathfinding in large 30x30 mazes with high success, while direct-prediction and CoT Transformer models perform at chance.

Abstraction and Reasoning Corpus—AGI Benchmark

On the ARC-AGI-2 (30x30 grid), HRM achieves 40.3% accuracy with 1,000 training examples and no pretraining, surpassing the best published results for much larger LLMs using CoT traces (e.g., 3-mini-high at 34.5%).
HRM's efficiency is highlighted by outperforming models with hundreds of millions of parameters and vastly larger training sets.

Model	Parameters	Data	ARC-AGI (%)	Sudoku (%)	Maze (%)
HRM	27M	1k	40.3	~100	~100
3-mini-high (CoT baseline)	>>27M	Large	34.5	0	0
Direct-pred Transformer	27M	1k	20–21	0	0
Claude 3.7 8K	Large	Large	21.2	0	0

4. Advantages over Chain-of-Thought and Transformers

Latent, Non-Linguistic Reasoning: HRM's reasoning steps are performed in ‘hidden’ computational states rather than token-level symbolic traces. This removes brittleness and linguistic bottlenecks seen in CoT and LLM prompt-based reasoning.
Robust, Adaptive Depth: The hierarchical recurrence and ACT make HRM robust to error accumulation and allow “thinking longer” only for difficult tasks, paralleling human cognitive flexibility.
General-purpose and Turing-complete: HRM, through nested recurrence and dynamic state updates, overcomes the fixed-depth computation limitations of vanilla Transformers, supporting general-purpose reasoning and universal computation.
Data and Model Efficiency: HRM demonstrates extremely high sample efficiency, achieving AGI-level benchmarks with minimal parameters and training examples.

5. Emergent Representations and Interpretability

Analysis shows that after training, high-level states exhibit much higher effective dimensionality (measured by Participation Ratio), similar to observations in prefrontal cortex function. The model produces distinct, interpretable reasoning traces per task—e.g., recursive search/backtracking behaviors in Sudoku, sequential expansion/pruning in mazes, and iterative refinement in ARC tasks.

6. Significance and Impact for Artificial Intelligence

The introduction of the HRM demonstrates a practical path to scalable, efficient, and general reasoning architectures that do not require vast pretraining or explicit intermediate supervision. Its hierarchical, brain-inspired recurrent design achieves or surpasses performance of existing LLMs and CoT techniques on AGI-relevant benchmarks, with significant gains in sample and computational efficiency.

The demonstrated ability to solve algorithmically complex tasks with minimal data and model size positions HRM as a potential blueprint for universal, general-purpose reasoning systems. This suggests future AI architectures may supersede scale-based, sequential-token models by adopting dynamically deep, hierarchically organized, and recurrence-driven approaches found in both biological and artificial intelligent systems.

PDF Markdown Chat (Pro)