Predicate-attention mechanism for length generalization in LEGO tasks

Investigate whether transformer length generalization in LEGO state-tracking tasks hinges on the robustness of attention directed to predicate clauses, specifically whether models that maintain strong attention mass on the target predicate clause across increasing sequence positions and lengths consequently achieve superior length generalization.

Background

The authors empirically compare attention concentration patterns for simply transitive versus symmetry group actions in LEGO tasks and observe distinct behaviors tied to attention over predicate versus answer clauses.

Based on these observations, they conjecture that robust predicate-clause attention is the key mechanism driving successful extrapolation to longer reasoning chains, particularly where naive attention dilutes across growing contexts.

References

We conjecture that length generalization in the LEGO task hinges on a robust predicateâattention pattern.

— Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization (2511.07378 - Huang et al., 10 Nov 2025) in Section: Experiments

Predicate-attention mechanism for length generalization in LEGO tasks

Background

References

Related Problems