Symbol-Equivariant Recurrent Reasoning Models
- SE-RRMs are neural architectures that explicitly enforce symbol permutation equivariance to improve robust reasoning performance.
- They reduce computational complexity by eliminating extensive symbol-permutation data augmentation, enhancing scalability and generalization.
- Empirical evaluations on Sudoku, ARC-AGI, and Maze tasks demonstrate SE-RRM's superior efficiency and accuracy over traditional recurrent reasoning models.
Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs) are neural architectures specifically designed for structured reasoning tasks, such as Sudoku and ARC-AGI, where symmetries over symbols (digits, colors, etc.) provide crucial inductive bias. Unlike earlier Recurrent Reasoning Models (RRMs)—notably Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM)—which enforce permutation symmetry only implicitly via extensive data augmentation, SE-RRMs achieve permutation equivariance at the architectural level. This is realized through symbol-equivariant layers, ensuring the model's outputs are invariant under any relabeling of input symbols. As a result, SE-RRMs produce identical solutions for all permutations of the symbol set and exhibit enhanced robustness, data-efficiency, and generalization across task scales and symbol sets (Freinschlag et al., 2 Mar 2026).
1. Permutation Equivariance: Formalism and Implementation
Permutation equivariance is defined as the commutativity of a function with the action of a permutation group on and , i.e., for every . In the context of SE-RRMs, two distinct symmetry groups are considered: (permutations over positions/cells) and (permutations over symbols or colors). Input data are encoded as three-way tensors , with symbol permutations acting as and position permutations as .
SE-RRMs guarantee symbol-equivariance by designing every model layer—attention, MLP, normalization, residual connections—to commute with the action of . Specifically, for any SE-RRM block mapping and for any , the relation holds. The output is an tensor, where permuting the symbol dimension corresponds exactly to relabeling.
Architecturally, equivariant linear maps satisfy . These are constructed in practice via weight-sharing and explicit attention operations over the symbol axis.
2. Model Architecture and Computational Details
Let denote the number of positions (cells), the number of symbols, and the feature dimension. Inputs (with ) are embedded into . The recurrent hidden state at time is .
The model operates as a fixed-point iteration for , with each block composed of layers. The layer operations per block are:
- Positional self-attention along positions, treating each (feature, symbol) slice as a sequence.
- Symbol self-attention along symbols, shared across positions.
- A pointwise MLP (SwiGLU) per (i, c).
- RMS normalization applied over the feature dimension.
Operations per layer are: with the block output . The output projection is a linear map , shared across all , yielding logits, followed by a row-wise softmax for class probabilities.
The architectural design, where all layers (attention, MLP, normalization) commute with permutations in symbol axis, is central to enforcing exact equivariance.
3. Training Regime and Objective Function
Deep supervision is applied at each of unrolled steps; for each iteration , a cross-entropy loss
is computed with as the target symbol at position . At each step, gradients are backpropagated through the current only, detaching to stabilize training. A random halting scheme, with halt probability at each step except the last, serves to replace a Q-learning halting policy and reduces compute.
Optimization uses AdamW with weight decay and a warmup plus constant or cosine learning rate scheduling. In typical Sudoku experiments, hyperparameters include learning rate , weight decay 1, batch size , deep supervision steps, feature dimension , and a 2 million parameter model.
Crucially, SE-RRM’s S_K-equivalence eliminates the need for symbol-permutation data augmentation: only spatial augmentations are required (e.g., dihedral symmetries in ARC-AGI), reducing augmentation needs by two orders of magnitude relative to HRM/TRM.
4. Empirical Evaluation and Benchmark Results
SE-RRM performance is evaluated across structured reasoning tasks, with primary comparisons against HRM and TRM. All results reported are as in (Freinschlag et al., 2 Mar 2026).
A. Sudoku
- Training: 1,000 base 9×9 puzzles × 1,000 symbol-permutation augmentations.
- Test: 422,786 9×9 puzzles; zero-shot generalization on 4×4, 16×16, and 25×25 puzzles.
| Model | 4×4 FSR (GPA) | 9×9 FSR (GPA) | 16×16 GPA | 25×25 GPA |
|---|---|---|---|---|
| HRM | 0% (29%) | 63.5% (86.1%) | -- | -- |
| TRM | 0% (46%) | 71.9% (89.8%) | -- | -- |
| SE-RRM | 95.5% (99.2%) | 93.7% (97.6%) | 51.9% | 31.5% |
SE-RRM dramatically outperforms prior RRMs, including near-perfect generalization to 4×4 and >50% accuracy on larger unseen 16×16 and 25×25 grids, despite training solely on 9×9.
Test-time scaling with increased steps demonstrates improved solution rates, e.g., 93.7% FSR at , rising to 98.8% at .
B. ARC-AGI
- Benchmarks: ARC-AGI-1 (400 puzzles), ARC-AGI-2 (120 puzzles).
- Metric: pass@2.
| Model | ARC-AGI-1 pass@2 | ARC-AGI-2 pass@2 |
|---|---|---|
| HRM | 40.3% | 5.0% |
| TRM | 44.6% | 7.8% |
| SE-RRM | 45.3% | 7.1% |
SE-RRM matches or slightly surpasses prior results with only 8 dihedral augmentations per puzzle, compared to ~1,000 symbol-permutation augmentations required for HRM/TRM.
C. Maze
- Dataset: 1,000 train/test 30×30 mazes (path length ≥110); four distinct symbols (not treated equivariantly).
- Metric: Fully solved rate (FSR).
| Model | Maze FSR |
|---|---|
| HRM | 74.5% |
| TRM | 85.3% |
| SE-RRM | 88.8% |
SE-RRM achieves the highest FSR, even on tasks where symbol-equivariance is explicitly broken by distinct embeddings.
5. Training Workflow and Pseudocode
A typical SE-RRM training step proceeds as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
X ∈ C^I # input symbols Y ∈ C^I # target symbols T # max recurrence steps p_stop # probability of early halt E_emb_vectors # shared symbol embeddings E_pos_enc # positional embeddings (D×I×K) E_type_emb(p) # task-type embeddings (1×K) W_proj ∈ R^{1×D} # output projection Compute E^C(X) # symbol/special embeddings E^G = E^C(X) + E^P + broadcast(E^type_emb(p)) Initialize Z^0 # learnable constant in R^{D×I×K} for t = 0...T-1: B₀ = Z^t + (E^G if (t mod H==0) else 0) for ℓ in 0...L-1: B′ = Norm( B + T^{D,I}(B) ) B″ = Norm( B′ + T^{D,K}(B′) ) B = Norm( B″ + m^D(B″) ) Z^{t+1} = B logits ℓ_{i,c} = W_proj ⋅ Z^{t+1}_{:,i,c} p_{i,c} = softmax_c(ℓ_{i,c}) L_{t+1} = –∑_{i} log p_{i, Y_i} if random() < p_stop and t+1 < T: break Total loss = ∑_{s=1}^{t+1} L_s Backpropagate loss (local gradients for each block) Optimizer step (AdamW, etc.) |
This approach, particularly the detachment at each block, ensures stable and efficient training.
6. Analysis and Broader Implications
SE-RRM’s explicit architectural enforcement of symbol-permutation equivariance yields significant benefits:
- Elimination of the need for symbol-permutation augmentations, greatly reducing training sample requirements.
- Robust out-of-distribution generalization, with models trained on 9×9 Sudoku achieving competitive or superior results on 4×4, 16×16, and 25×25 puzzles.
- Improved scalability, with competitive or improved performance over HRM and TRM, often with fewer parameters (2 million) and far fewer augmentations.
- Sample efficiency, with only spatial augmentations required for most tasks.
- At inference, solution accuracy improves monotonically with the number of recurrent steps, enabling a test-time computation-accuracy tradeoff.
A plausible implication is that permutation-equivariant architectures, such as SE-RRM, are crucial for robust symbolic reasoning in domains with inherent symmetries, and their explicit symmetry handling addresses core generalization limitations of previous methods (Freinschlag et al., 2 Mar 2026).