Stability of recursive synthetic self-improvement

Determine whether recursively generating synthetic training data with a Large Language Model and using it to train successor models yields genuine capability gains or results in model collapse.

Background

The paper highlights growing interest in training successive LLM generations with model-produced synthetic data as a means of self-improvement. While synthetic augmentation can boost performance under some conditions, the authors note serious theoretical risks, most notably model collapse, where training on generated outputs degrades diversity and accuracy over generations.

They summarize emerging evidence that workflow choices (e.g., accumulate vs. replace strategies) and the ratio of synthetic to real data critically affect stability, but emphasize that the fundamental question of whether recursive self-generation truly improves capabilities or drives degeneration remains unresolved.

References

A key open question is whether such a process would lead to genuine capability gains or result in model collapse, a degenerative process where the model overfits to its own idiosyncrasies, leading to a gradual loss of diversity and accuracy.

Beyond the Black Box: Theory and Mechanism of Large Language Models  (2601.02907 - Gan et al., 6 Jan 2026) in Subsubsection Synthetic Data Generation, Section 2: Data Preparation Stage (Advanced Topics and Open Questions)