Autocurriculum under imperfect verification
Develop a theoretical framework for verifier-guided autocurriculum when the outcome verifier is imperfect, and establish guarantees for settings with noisy or learned reward models in place of the perfect outcome verifier assumed in the current analysis.
References
Our framework assumes access to a perfect outcome verifier, which is natural for domains with verifiable rewards (math, code), but extending the theory to noisy or learned reward models is an important open problem.
— Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
(2603.18325 - Rajaraman et al., 18 Mar 2026) in Discussion, Limitations and open directions — Imperfect verification