Comparison of hand-engineered inference strategies versus learned model-generated strategies
Determine how domain-specific, hand-engineered inference strategies for competitive programming (exemplified by the o1-ioi test-time pipeline that partitions IOI problems into subtasks, samples large candidate sets, clusters outputs on model-generated test inputs, and reranks submissions) compare to learned test-time reasoning strategies that are autonomously generated and executed by large reasoning models trained end-to-end via reinforcement learning (such as OpenAI o1 and o3).
References
An open question is how domain-specific, hand-engineered inference strategies compare to learned approaches that models generate and execute on their own.
                — Competitive Programming with Large Reasoning Models
                
                (2502.06807 - OpenAI et al., 3 Feb 2025) in Section 1 (Introduction)