Confirming Pareto-optimality of reasoning models at larger parameter scales
Establish whether reasoning-supervised language models achieve Pareto-optimal accuracy–inference FLOPs trade-offs at parameter scales larger than 14B by experimentally evaluating models beyond 14B parameters to confirm or refute the observed trend that reasoning approaches the Pareto frontier as model size increases.
References
Confirming this hypothesis would require experiments with models larger than 14B parameters, which we leave for future work for practical reasons.
                — When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance
                
                (2509.22193 - Boizard et al., 26 Sep 2025) in Section 4.2, Inference Efficiency