Design a dual-play LLM training framework that sustains improvement without external labels
Develop a dual-play adversarial training framework for large language models that achieves sustained performance improvement across training iterations without relying on large volumes of externally supervised labeled data.
References
Thus, designing a dual-play framework for training LLMs that enables sustained improvement across training iterations without requiring large amounts of supervised training data remains an open challenge.
— Better LLM Reasoning via Dual-Play
(2511.11881 - Zhang et al., 14 Nov 2025) in Section 1 (Introduction)