Resource-rational compute allocation for LRMs
Develop a principled cost–performance framework that adapts compute allocation and halting policies for large reasoning models based on instance difficulty and epistemic uncertainty, thereby addressing the open question of efficient reasoning control.
References
However, generalizing these approaches into a principled cost-performance trade-off remains an open question.
— A Survey of Reinforcement Learning for Large Reasoning Models
(2509.08827 - Zhang et al., 10 Sep 2025) in Section 7.4 Teaching LRMs Efficient Reasoning