Impact of recurrent depth on transformer performance
Determine how recurrent depth affects the performance of transformer-based language models by rigorously characterizing the relationship between the number of recurrent iterations and model performance across tasks and conditions.
References
Many works study the impact of depth for transformers both theoretically and practically \citep{levine2020depth,merrill2022saturated,mcleish2025gemstones,zuo2025falcon,merrill2025little,csordas2025language}, it is still an open question how recurrent depth impacts the performance of transformers.
— Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
(2511.07384 - McLeish et al., 10 Nov 2025) in Appendix A, Extended Related Works