Resource-rational compute allocation for LRMs

Develop a principled cost–performance framework that adapts compute allocation and halting policies for large reasoning models based on instance difficulty and epistemic uncertainty, thereby addressing the open question of efficient reasoning control.

Background

Inference-time scaling improves accuracy but introduces over-thinking for easy instances and under-thinking under aggressive truncation. The authors argue for adaptive compute allocation policies but note the lack of a principled trade-off framework. Solving this would guide LRMs to reason only as long as warranted by marginal utility.

References

However, generalizing these approaches into a principled cost-performance trade-off remains an open question.

— A Survey of Reinforcement Learning for Large Reasoning Models (2509.08827 - Zhang et al., 10 Sep 2025) in Section 7.4 Teaching LRMs Efficient Reasoning

Resource-rational compute allocation for LRMs

Sponsor

Background

References

Related Problems