Reliable synthesis of natural language issue descriptions in self-play

Determine whether and how Self-play SWE-RL can reliably generate high-quality, unambiguous natural language issue descriptions during self-play, avoiding collapse to copying test patches or producing logically incoherent, repetitive descriptions, so that agents can operate with natural language specifications rather than only formal test patches.

Background

Self-play SWE-RL (SSR) intentionally minimizes data assumptions by grounding learning in raw repositories and using formal test specifications rather than natural language issues. The authors attempted to synthesize natural language issue descriptions within the self-play framework but encountered systematic failures, leading them to focus the current work on test-patch–specified tasks.

The paper attributes the observed failures to limited natural language capabilities of the 32B base model (CWM-sft) and to opaque reward signals that did not effectively promote issue quality or diversity. Establishing a robust and reliable method for generating natural language issues in SSR would broaden the framework’s applicability and align training more closely with downstream benchmarks that use natural language problem statements.

References

While this design minimizes data assumptions and proves effective for learning, our initial attempts failed to reliably generate high-quality and unambiguous issue descriptions. The generated issues tend to copy test patches, are logically incoherent, and collapse to identical patterns.

Toward Training Superintelligent Software Agents through Self-Play SWE-RL (2512.18552 - Wei et al., 21 Dec 2025) in Discussion, Subsection "Unsuccessful attempts"