Papers
Topics
Authors
Recent
Search
2000 character limit reached

Symbol-Equivariant Recurrent Reasoning Models

Updated 4 March 2026
  • SE-RRMs are neural architectures that explicitly enforce symbol permutation equivariance to improve robust reasoning performance.
  • They reduce computational complexity by eliminating extensive symbol-permutation data augmentation, enhancing scalability and generalization.
  • Empirical evaluations on Sudoku, ARC-AGI, and Maze tasks demonstrate SE-RRM's superior efficiency and accuracy over traditional recurrent reasoning models.

Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) are neural architectures specifically designed for structured reasoning tasks, such as Sudoku and ARC-AGI, where symmetries over symbols (digits, colors, etc.) provide crucial inductive bias. Unlike earlier Recurrent Reasoning Models (RRMs)—notably Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM)—which enforce permutation symmetry only implicitly via extensive data augmentation, SE-RRMs achieve permutation equivariance at the architectural level. This is realized through symbol-equivariant layers, ensuring the model's outputs are invariant under any relabeling of input symbols. As a result, SE-RRMs produce identical solutions for all permutations of the symbol set and exhibit enhanced robustness, data-efficiency, and generalization across task scales and symbol sets (Freinschlag et al., 2 Mar 2026).

1. Permutation Equivariance: Formalism and Implementation

Permutation equivariance is defined as the commutativity of a function f:XYf : X \rightarrow Y with the action of a permutation group GG on XX and YY, i.e., f(gX)=gf(X)f(g \cdot X) = g \cdot f(X) for every gGg \in G. In the context of SE-RRMs, two distinct symmetry groups are considered: SIS_I (permutations over positions/cells) and SKS_K (permutations over KK symbols or colors). Input data are encoded as three-way tensors XRD×I×KX \in \mathbb{R}^{D \times I \times K}, with symbol permutations ρSK\rho \in S_K acting as (Π3ρA)d,i,c:=Ad,i,ρ1(c)(\Pi_3^\rho A)_{d,i,c} := A_{d,i,\rho^{-1}(c)} and position permutations πSI\pi \in S_I as (Π2πA)d,i,c:=Ad,π1(i),c(\Pi_2^\pi A)_{d,i,c} := A_{d,\pi^{-1}(i),c}.

SE-RRMs guarantee symbol-equivariance by designing every model layer—attention, MLP, normalization, residual connections—to commute with the action of Π3ρ\Pi_3^\rho. Specifically, for any SE-RRM block mapping GG and for any ρSK\rho \in S_K, the relation Π3ρG(E,X)(,Zt,)=G(E,X)(,Π3ρZt,)\Pi_3^\rho G(E,X)(\dots, Z^t, \dots) = G(E,X)(\dots, \Pi_3^\rho Z^t, \dots) holds. The output is an I×KI \times K tensor, where permuting the symbol dimension corresponds exactly to relabeling.

Architecturally, equivariant linear maps L:RD×I×KRD×I×KL : \mathbb{R}^{D \times I \times K} \to \mathbb{R}^{D' \times I' \times K} satisfy LΠ3ρ=Π3ρLL \circ \Pi_3^\rho = \Pi_3^\rho \circ L. These are constructed in practice via weight-sharing and explicit attention operations over the symbol axis.

2. Model Architecture and Computational Details

Let II denote the number of positions (cells), KK the number of symbols, and DD the feature dimension. Inputs XCIX \in C^I (with C=K|C| = K) are embedded into EG(X)RD×I×KE^G(X) \in \mathbb{R}^{D \times I \times K}. The recurrent hidden state at time tt is ZtRD×I×KZ^t \in \mathbb{R}^{D \times I \times K}.

The model operates as a fixed-point iteration for t=0..T1t = 0..T-1, with each block G\mathcal{G} composed of LL layers. The layer operations per block are:

  • Positional self-attention TD,IT^{D, I} along positions, treating each (feature, symbol) slice as a sequence.
  • Symbol self-attention TD,KT^{D, K} along symbols, shared across positions.
  • A pointwise MLP mDm^D (SwiGLU) per (i, c).
  • RMS normalization applied over the feature dimension.

Operations per layer \ell are: B=Norm[B+TD,I(B)] B=Norm[B+TD,K(B)] B+1=Norm[B+mD(B)]\begin{aligned} B'_\ell &= \textrm{Norm}[B_\ell + T^{D, I}(B_\ell)] \ B''_\ell &= \textrm{Norm}[B'_\ell + T^{D, K}(B'_\ell)] \ B_{\ell+1} &= \textrm{Norm}[B''_\ell + m^D(B''_\ell)] \end{aligned} with the block output Zt+1=BLZ^{t+1} = B_L. The output projection is a linear map WR1×DW \in \mathbb{R}^{1 \times D}, shared across all i,ci,c, yielding I×KI \times K logits, followed by a row-wise softmax for class probabilities.

The architectural design, where all layers (attention, MLP, normalization) commute with permutations in symbol axis, is central to enforcing exact equivariance.

3. Training Regime and Objective Function

Deep supervision is applied at each of TT unrolled steps; for each iteration tt, a cross-entropy loss

Lt=i=1Ilogpi,yitL_t = -\sum_{i=1}^I \log p^t_{i, y_i}

is computed with yiy_i as the target symbol at position ii. At each step, gradients are backpropagated through the current G\mathcal{G} only, detaching ZtZ^t to stabilize training. A random halting scheme, with halt probability pstopp_\text{stop} at each step except the last, serves to replace a Q-learning halting policy and reduces compute.

Optimization uses AdamW with weight decay and a warmup plus constant or cosine learning rate scheduling. In typical Sudoku experiments, hyperparameters include learning rate 5×1045 \times 10^{-4}, weight decay 1, batch size 272\approx 272, T=16T=16 deep supervision steps, feature dimension D=256D=256, and a 2 million parameter model.

Crucially, SE-RRM’s S_K-equivalence eliminates the need for symbol-permutation data augmentation: only spatial augmentations are required (e.g., dihedral symmetries in ARC-AGI), reducing augmentation needs by two orders of magnitude relative to HRM/TRM.

4. Empirical Evaluation and Benchmark Results

SE-RRM performance is evaluated across structured reasoning tasks, with primary comparisons against HRM and TRM. All results reported are as in (Freinschlag et al., 2 Mar 2026).

A. Sudoku

  • Training: 1,000 base 9×9 puzzles × 1,000 symbol-permutation augmentations.
  • Test: 422,786 9×9 puzzles; zero-shot generalization on 4×4, 16×16, and 25×25 puzzles.
Model 4×4 FSR (GPA) 9×9 FSR (GPA) 16×16 GPA 25×25 GPA
HRM 0% (29%) 63.5% (86.1%) -- --
TRM 0% (46%) 71.9% (89.8%) -- --
SE-RRM 95.5% (99.2%) 93.7% (97.6%) 51.9% 31.5%

SE-RRM dramatically outperforms prior RRMs, including near-perfect generalization to 4×4 and >50% accuracy on larger unseen 16×16 and 25×25 grids, despite training solely on 9×9.

Test-time scaling with increased steps TT demonstrates improved solution rates, e.g., 93.7% FSR at T=16T=16, rising to 98.8% at T=128T=128.

B. ARC-AGI

  • Benchmarks: ARC-AGI-1 (400 puzzles), ARC-AGI-2 (120 puzzles).
  • Metric: pass@2.
Model ARC-AGI-1 pass@2 ARC-AGI-2 pass@2
HRM 40.3% 5.0%
TRM 44.6% 7.8%
SE-RRM 45.3% 7.1%

SE-RRM matches or slightly surpasses prior results with only 8 dihedral augmentations per puzzle, compared to ~1,000 symbol-permutation augmentations required for HRM/TRM.

C. Maze

  • Dataset: 1,000 train/test 30×30 mazes (path length ≥110); four distinct symbols (not treated equivariantly).
  • Metric: Fully solved rate (FSR).
Model Maze FSR
HRM 74.5%
TRM 85.3%
SE-RRM 88.8%

SE-RRM achieves the highest FSR, even on tasks where symbol-equivariance is explicitly broken by distinct embeddings.

5. Training Workflow and Pseudocode

A typical SE-RRM training step proceeds as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
X  C^I           # input symbols
Y  C^I           # target symbols
T                 # max recurrence steps
p_stop            # probability of early halt

E_emb_vectors     # shared symbol embeddings
E_pos_enc         # positional embeddings (D×I×K)
E_type_emb(p)     # task-type embeddings (1×K)
W_proj  R^{1×D}  # output projection

Compute E^C(X)       # symbol/special embeddings
E^G = E^C(X) + E^P + broadcast(E^type_emb(p))
Initialize Z^0       # learnable constant in R^{D×I×K}
for t = 0...T-1:
    B = Z^t + (E^G if (t mod H==0) else 0)
    forin 0...L-1:
        B = Norm( B + T^{D,I}(B) )
        B = Norm( B + T^{D,K}(B) )
        B  = Norm( B + m^D(B) )
    Z^{t+1} = B
    logits ℓ_{i,c} = W_proj  Z^{t+1}_{:,i,c}
    p_{i,c} = softmax_c(ℓ_{i,c})
    L_{t+1} = _{i} log p_{i,Y_i}
    if random() < p_stop and t+1 < T: break
Total loss = _{s=1}^{t+1} L_s

Backpropagate loss (local gradients for each block)
Optimizer step (AdamW, etc.)

This approach, particularly the detachment at each block, ensures stable and efficient training.

6. Analysis and Broader Implications

SE-RRM’s explicit architectural enforcement of symbol-permutation equivariance yields significant benefits:

  • Elimination of the need for K!K! symbol-permutation augmentations, greatly reducing training sample requirements.
  • Robust out-of-distribution generalization, with models trained on 9×9 Sudoku achieving competitive or superior results on 4×4, 16×16, and 25×25 puzzles.
  • Improved scalability, with competitive or improved performance over HRM and TRM, often with fewer parameters (2 million) and far fewer augmentations.
  • Sample efficiency, with only spatial augmentations required for most tasks.
  • At inference, solution accuracy improves monotonically with the number of recurrent steps, enabling a test-time computation-accuracy tradeoff.

A plausible implication is that permutation-equivariant architectures, such as SE-RRM, are crucial for robust symbolic reasoning in domains with inherent symmetries, and their explicit symmetry handling addresses core generalization limitations of previous methods (Freinschlag et al., 2 Mar 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs).