Stability of recursive synthetic self-improvement
Determine whether recursively generating synthetic training data with a Large Language Model and using it to train successor models yields genuine capability gains or results in model collapse.
Sponsor
References
A key open question is whether such a process would lead to genuine capability gains or result in model collapse, a degenerative process where the model overfits to its own idiosyncrasies, leading to a gradual loss of diversity and accuracy.
— Beyond the Black Box: Theory and Mechanism of Large Language Models
(2601.02907 - Gan et al., 6 Jan 2026) in Subsubsection Synthetic Data Generation, Section 2: Data Preparation Stage (Advanced Topics and Open Questions)