Extrapolating recurrence depth at test time
Develop training and architectural methods for depth-recurrent transformer language models that enable reliable extrapolation to greater recurrence depths at test time, allowing the models to solve problems that are harder than those encountered during training while maintaining stability and performance.
References
One unsolved problem is how to most effectively build depth-recurrent models that can recur deeper at test time to solve harder problems than were seen during training.
— Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
(2511.07384 - McLeish et al., 10 Nov 2025) in Discussion, Section 5