Predicate-attention mechanism for length generalization in LEGO tasks
Investigate whether transformer length generalization in LEGO state-tracking tasks hinges on the robustness of attention directed to predicate clauses, specifically whether models that maintain strong attention mass on the target predicate clause across increasing sequence positions and lengths consequently achieve superior length generalization.
References
We conjecture that length generalization in the LEGO task hinges on a robust predicateâattention pattern.
— Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
(2511.07378 - Huang et al., 10 Nov 2025) in Section: Experiments