Symbol-Equivariant Recurrent Reasoning Models

Updated 4 March 2026

SE-RRMs are neural architectures that explicitly enforce symbol permutation equivariance to improve robust reasoning performance.
They reduce computational complexity by eliminating extensive symbol-permutation data augmentation, enhancing scalability and generalization.
Empirical evaluations on Sudoku, ARC-AGI, and Maze tasks demonstrate SE-RRM's superior efficiency and accuracy over traditional recurrent reasoning models.

Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) are neural architectures specifically designed for structured reasoning tasks, such as Sudoku and ARC-AGI, where symmetries over symbols (digits, colors, etc.) provide crucial inductive bias. Unlike earlier Recurrent Reasoning Models (RRMs)—notably Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM)—which enforce permutation symmetry only implicitly via extensive data augmentation, SE-RRMs achieve permutation equivariance at the architectural level. This is realized through symbol-equivariant layers, ensuring the model's outputs are invariant under any relabeling of input symbols. As a result, SE-RRMs produce identical solutions for all permutations of the symbol set and exhibit enhanced robustness, data-efficiency, and generalization across task scales and symbol sets (Freinschlag et al., 2 Mar 2026).

1. Permutation Equivariance: Formalism and Implementation

Permutation equivariance is defined as the commutativity of a function $f : X \rightarrow Y$ with the action of a permutation group $G$ on $X$ and $Y$ , i.e., $f(g \cdot X) = g \cdot f(X)$ for every $g \in G$ . In the context of SE-RRMs, two distinct symmetry groups are considered: $S_I$ (permutations over positions/cells) and $S_K$ (permutations over $K$ symbols or colors). Input data are encoded as three-way tensors $X \in \mathbb{R}^{D \times I \times K}$ , with symbol permutations $\rho \in S_K$ acting as $(\Pi_3^\rho A)_{d,i,c} := A_{d,i,\rho^{-1}(c)}$ and position permutations $\pi \in S_I$ as $(\Pi_2^\pi A)_{d,i,c} := A_{d,\pi^{-1}(i),c}$ .

SE-RRMs guarantee symbol-equivariance by designing every model layer—attention, MLP, normalization, residual connections—to commute with the action of $\Pi_3^\rho$ . Specifically, for any SE-RRM block mapping $G$ and for any $\rho \in S_K$ , the relation $\Pi_3^\rho G(E,X)(\dots, Z^t, \dots) = G(E,X)(\dots, \Pi_3^\rho Z^t, \dots)$ holds. The output is an $I \times K$ tensor, where permuting the symbol dimension corresponds exactly to relabeling.

Architecturally, equivariant linear maps $L : \mathbb{R}^{D \times I \times K} \to \mathbb{R}^{D' \times I' \times K}$ satisfy $L \circ \Pi_3^\rho = \Pi_3^\rho \circ L$ . These are constructed in practice via weight-sharing and explicit attention operations over the symbol axis.

2. Model Architecture and Computational Details

Let $I$ denote the number of positions (cells), $K$ the number of symbols, and $D$ the feature dimension. Inputs $X \in C^I$ (with $|C| = K$ ) are embedded into $E^G(X) \in \mathbb{R}^{D \times I \times K}$ . The recurrent hidden state at time $t$ is $Z^t \in \mathbb{R}^{D \times I \times K}$ .

The model operates as a fixed-point iteration for $t = 0..T-1$ , with each block $\mathcal{G}$ composed of $L$ layers. The layer operations per block are:

Positional self-attention $T^{D, I}$ along positions, treating each (feature, symbol) slice as a sequence.
Symbol self-attention $T^{D, K}$ along symbols, shared across positions.
A pointwise MLP $m^D$ (SwiGLU) per (i, c).
RMS normalization applied over the feature dimension.

Operations per layer $\ell$ are: $\begin{aligned} B'_\ell &= \textrm{Norm}[B_\ell + T^{D, I}(B_\ell)] \ B''_\ell &= \textrm{Norm}[B'_\ell + T^{D, K}(B'_\ell)] \ B_{\ell+1} &= \textrm{Norm}[B''_\ell + m^D(B''_\ell)] \end{aligned}$ with the block output $Z^{t+1} = B_L$ . The output projection is a linear map $W \in \mathbb{R}^{1 \times D}$ , shared across all $i,c$ , yielding $I \times K$ logits, followed by a row-wise softmax for class probabilities.

The architectural design, where all layers (attention, MLP, normalization) commute with permutations in symbol axis, is central to enforcing exact equivariance.

3. Training Regime and Objective Function

Deep supervision is applied at each of $T$ unrolled steps; for each iteration $t$ , a cross-entropy loss

$L_t = -\sum_{i=1}^I \log p^t_{i, y_i}$

is computed with $y_i$ as the target symbol at position $i$ . At each step, gradients are backpropagated through the current $\mathcal{G}$ only, detaching $Z^t$ to stabilize training. A random halting scheme, with halt probability $p_\text{stop}$ at each step except the last, serves to replace a Q-learning halting policy and reduces compute.

Optimization uses AdamW with weight decay and a warmup plus constant or cosine learning rate scheduling. In typical Sudoku experiments, hyperparameters include learning rate $5 \times 10^{-4}$ , weight decay 1, batch size $\approx 272$ , $T=16$ deep supervision steps, feature dimension $D=256$ , and a 2 million parameter model.

Crucially, SE-RRM’s S_K-equivalence eliminates the need for symbol-permutation data augmentation: only spatial augmentations are required (e.g., dihedral symmetries in ARC-AGI), reducing augmentation needs by two orders of magnitude relative to HRM/TRM.

4. Empirical Evaluation and Benchmark Results

SE-RRM performance is evaluated across structured reasoning tasks, with primary comparisons against HRM and TRM. All results reported are as in (Freinschlag et al., 2 Mar 2026).

A. Sudoku

Training: 1,000 base 9×9 puzzles × 1,000 symbol-permutation augmentations.
Test: 422,786 9×9 puzzles; zero-shot generalization on 4×4, 16×16, and 25×25 puzzles.

Model	4×4 FSR (GPA)	9×9 FSR (GPA)	16×16 GPA	25×25 GPA
HRM	0% (29%)	63.5% (86.1%)	--	--
TRM	0% (46%)	71.9% (89.8%)	--	--
SE-RRM	95.5% (99.2%)	93.7% (97.6%)	51.9%	31.5%

SE-RRM dramatically outperforms prior RRMs, including near-perfect generalization to 4×4 and >50% accuracy on larger unseen 16×16 and 25×25 grids, despite training solely on 9×9.

Test-time scaling with increased steps $T$ demonstrates improved solution rates, e.g., 93.7% FSR at $T=16$ , rising to 98.8% at $T=128$ .

B. ARC-AGI

Benchmarks: ARC-AGI-1 (400 puzzles), ARC-AGI-2 (120 puzzles).
Metric: pass@2.

Model	ARC-AGI-1 pass@2	ARC-AGI-2 pass@2
HRM	40.3%	5.0%
TRM	44.6%	7.8%
SE-RRM	45.3%	7.1%

SE-RRM matches or slightly surpasses prior results with only 8 dihedral augmentations per puzzle, compared to ~1,000 symbol-permutation augmentations required for HRM/TRM.

C. Maze

Dataset: 1,000 train/test 30×30 mazes (path length ≥110); four distinct symbols (not treated equivariantly).
Metric: Fully solved rate (FSR).

Model	Maze FSR
HRM	74.5%
TRM	85.3%
SE-RRM	88.8%

SE-RRM achieves the highest FSR, even on tasks where symbol-equivariance is explicitly broken by distinct embeddings.

5. Training Workflow and Pseudocode

A typical SE-RRM training step proceeds as follows:

X ∈ C^I           # input symbols
Y ∈ C^I           # target symbols
T                 # max recurrence steps
p_stop            # probability of early halt

E_emb_vectors     # shared symbol embeddings
E_pos_enc         # positional embeddings (D×I×K)
E_type_emb(p)     # task-type embeddings (1×K)
W_proj ∈ R^{1×D}  # output projection

Compute E^C(X)       # symbol/special embeddings
E^G = E^C(X) + E^P + broadcast(E^type_emb(p))
Initialize Z^0       # learnable constant in R^{D×I×K}
for t = 0...T-1:
    B₀ = Z^t + (E^G if (t mod H==0) else 0)
    for ℓ in 0...L-1:
        B′ = Norm( B + T^{D,I}(B) )
        B″ = Norm( B′ + T^{D,K}(B′) )
        B  = Norm( B″ + m^D(B″) )
    Z^{t+1} = B
    logits ℓ_{i,c} = W_proj ⋅ Z^{t+1}_{:,i,c}
    p_{i,c} = softmax_c(ℓ_{i,c})
    L_{t+1} = –∑_{i} log p_{i, Y_i}
    if random() < p_stop and t+1 < T: break
Total loss = ∑_{s=1}^{t+1} L_s

Backpropagate loss (local gradients for each block)
Optimizer step (AdamW, etc.)

This approach, particularly the detachment at each block, ensures stable and efficient training.

6. Analysis and Broader Implications

SE-RRM’s explicit architectural enforcement of symbol-permutation equivariance yields significant benefits:

Elimination of the need for $K!$ symbol-permutation augmentations, greatly reducing training sample requirements.
Robust out-of-distribution generalization, with models trained on 9×9 Sudoku achieving competitive or superior results on 4×4, 16×16, and 25×25 puzzles.
Improved scalability, with competitive or improved performance over HRM and TRM, often with fewer parameters (2 million) and far fewer augmentations.
Sample efficiency, with only spatial augmentations required for most tasks.
At inference, solution accuracy improves monotonically with the number of recurrent steps, enabling a test-time computation-accuracy tradeoff.

A plausible implication is that permutation-equivariant architectures, such as SE-RRM, are crucial for robust symbolic reasoning in domains with inherent symmetries, and their explicit symmetry handling addresses core generalization limitations of previous methods (Freinschlag et al., 2 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Symbol-Equivariant Recurrent Reasoning Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs).