Stable Diffusion Dataset Generation for Downstream Classification Tasks (2405.02698v1)
Abstract: Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.
- Optuna: A next-generation hyperparameter optimization framework. In KDD. ACM, 2019.
- Dataset distillation by matching training trajectories. In CVPR. IEEE/CVF, 2022.
- Classifier training from a generative model. In CBMI. IEEE, 2019.
- An efficient approach for assessing hyperparameter importance. In ICML. PMLR, 2014.
- Bridging the gap: Enhancing the utility of synthetic data via post-processing techniques. BMVC, 2023.
- Sgde: Secure generative data exchange for cross-silo federated learning. In AIPR. ACM, 2022.
- S. Ravuri and O. Vinyals. Classification accuracy score for conditional generative models. NeurIPS, 2019.
- High-resolution image synthesis with latent diffusion models. In CVPR. IEEE/CVF, 2022.
- Fake it till you make it: Learning transferable representations from synthetic imagenet clones. In CVPR. IEEE/CVF, 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.