When and how synthetic data improve generalization and transfer
Characterize the regimes in which synthetic data augmentation improves out‑of‑distribution generalization and transfer learning performance, including identifying beneficial types of distributional shift, quantifying the impact of generative‑model estimation error, and developing diagnostics for harmful extrapolation.
References
A central open problem is therefore to characterize when and how synthetic data improve generalization ability and transferability. This includes, but is not limited to, identifying the types of distributional shifts for which synthetic augmentation is beneficial, understanding the role of the estimation error of the generative model, and developing diagnostics to detect harmful extrapolation.
— Harnessing Synthetic Data from Generative AI for Statistical Inference
(2603.05396 - Abdel-Azim et al., 5 Mar 2026) in Section 4, Extrapolation, Generalization, and Transfer