Self-play in code generation

Determine whether and how self-play can be effectively realized for large language model-based code generation, given that unit-test-based verification is brittle and susceptible to reward hacking and error propagation, so that models can reliably learn without human supervision.

Background

The paper examines proposer–solver self-play for LLMs and highlights verification as the central obstacle: imperfect or weak verifiers can be exploited by the solver, corrupting training and curricula. Early self-play successes have occurred in domains with strong or trivial verifiers, leaving broader applicability uncertain.

In code generation, verification typically relies on unit tests, which cover limited cases and allow incorrect programs to pass, encouraging reward hacking and enabling error propagation across iterations. Due to these issues, the authors explicitly state that self-play in code generation remains an open problem, motivating their formal-verification-based PSV framework.

References

Hence self-play in code generation remains an open problem.

— Propose, Solve, Verify: Self-Play Through Formal Verification (2512.18160 - Wilf et al., 20 Dec 2025) in Section 1 (Introduction)

Self-play in code generation

Sponsor

Background

References

Related Problems