Confirming Pareto-optimality of reasoning models at larger parameter scales

Establish whether reasoning-supervised language models achieve Pareto-optimal accuracy–inference FLOPs trade-offs at parameter scales larger than 14B by experimentally evaluating models beyond 14B parameters to confirm or refute the observed trend that reasoning approaches the Pareto frontier as model size increases.

Background

In the inference efficiency analysis, IFT models are consistently Pareto-optimal, whereas reasoning-supervised models approach the Pareto frontier as model size increases, particularly above 7B. This suggests potential benefits of reasoning at larger scales.

However, the paper only evaluates up to 14B parameters and explicitly notes that confirming the hypothesis of reasoning becoming Pareto-optimal at even larger scales requires experiments with models beyond 14B, which they defer.

References

Confirming this hypothesis would require experiments with models larger than 14B parameters, which we leave for future work for practical reasons.

— When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance (2509.22193 - Boizard et al., 26 Sep 2025) in Section 4.2, Inference Efficiency

Confirming Pareto-optimality of reasoning models at larger parameter scales

Background

References

Related Problems