Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantic Augmentation in Images using Language

Published 2 Apr 2024 in cs.CV, cs.AI, and cs.LG | (2404.02353v1)

Abstract: Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Data augmentation generative adversarial networks, 2017.
  2. Microsoft coco captions: Data collection and evaluation server, 2015.
  3. Randaugment: Practical automated data augmentation with a reduced search space, 2019.
  4. Coatnet: Marrying convolution and attention for all data sizes, 2021.
  5. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020.
  7. The pascal visual object classes (voc) challenge. Int. J. Comput. Vision, 88(2):303–338, jun 2010.
  8. Deberta: Decoding-enhanced bert with disentangled attention, 2021.
  9. Augmix: A simple data processing method to improve robustness and uncertainty, 2019.
  10. Auggan: Cross domain adaptation with gan-based data augmentation. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  11. A style-based generator architecture for generative adversarial networks. CoRR, abs/1812.04948, 2018.
  12. Albert: A lite bert for self-supervised learning of language representations, 2020.
  13. Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015.
  14. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. CoRR, abs/2112.10741, 2021.
  15. Exploring the limits of transfer learning with a unified text-to-text transformer, 2020.
  16. Zero-shot text-to-image generation. CoRR, abs/2102.12092, 2021.
  17. High-resolution image synthesis with latent diffusion models. CoRR, abs/2112.10752, 2021.
  18. mixup: Beyond empirical risk minimization, 2017.
  19. Datasetgan: Efficient labeled data factory with minimal human effort. CoRR, abs/2104.06490, 2021.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.