Semantic Augmentation in Images using Language
Abstract: Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.
- Data augmentation generative adversarial networks, 2017.
- Microsoft coco captions: Data collection and evaluation server, 2015.
- Randaugment: Practical automated data augmentation with a reduced search space, 2019.
- Coatnet: Marrying convolution and attention for all data sizes, 2021.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020.
- The pascal visual object classes (voc) challenge. Int. J. Comput. Vision, 88(2):303–338, jun 2010.
- Deberta: Decoding-enhanced bert with disentangled attention, 2021.
- Augmix: A simple data processing method to improve robustness and uncertainty, 2019.
- Auggan: Cross domain adaptation with gan-based data augmentation. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- A style-based generator architecture for generative adversarial networks. CoRR, abs/1812.04948, 2018.
- Albert: A lite bert for self-supervised learning of language representations, 2020.
- Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015.
- GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. CoRR, abs/2112.10741, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2020.
- Zero-shot text-to-image generation. CoRR, abs/2102.12092, 2021.
- High-resolution image synthesis with latent diffusion models. CoRR, abs/2112.10752, 2021.
- mixup: Beyond empirical risk minimization, 2017.
- Datasetgan: Efficient labeled data factory with minimal human effort. CoRR, abs/2104.06490, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.