Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Would Deep Generative Models Amplify Bias in Future Models? (2404.03242v1)

Published 4 Apr 2024 in cs.CV

Abstract: We investigate the impact of deep generative models on potential social biases in upcoming computer vision models. As the internet witnesses an increasing influx of AI-generated images, concerns arise regarding inherent biases that may accompany them, potentially leading to the dissemination of harmful content. This paper explores whether a detrimental feedback loop, resulting in bias amplification, would occur if generated images were used as the training data for future models. We conduct simulations by progressively substituting original images in COCO and CC3M datasets with images generated through Stable Diffusion. The modified datasets are used to train OpenCLIP and image captioning models, which we evaluate in terms of quality and bias. Contrary to expectations, our findings indicate that introducing generated images during training does not uniformly amplify bias. Instead, instances of bias mitigation across specific tasks are observed. We further explore the factors that may influence these phenomena, such as artifacts in image generation (e.g., blurry faces) or pre-existing biases in the original datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Spice: Semantic propositional image caption evaluation. In ECCV, pages 382–398. Springer, 2016.
  2. Synthetic data from diffusion models improves imagenet classification. ArXiv, abs/2304.08466, 2023.
  3. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In ACL, pages 65–72, 2005.
  4. Inspecting the geographical representativeness of images from text-to-image models. In ICCV, pages 5113–5124, 2023.
  5. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In FAccT, 2023.
  6. Conceptual 12M: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In CVPR, 2021.
  7. Reproducible scaling laws for contrastive language-image learning. In CVPR, pages 2818–2829, 2023.
  8. Dall-eval: Probing the reasoning skills and social biases of text-to-image generative transformers. arXiv, abs/2202.04053, 2022.
  9. Debiasing vision-language models via biased prompts. ArXiv, abs/2302.00070, 2023.
  10. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171–4186, 2019.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  12. A survey on bias in visual datasets. Comput. Vis. Image Underst., 223:103552, 2022.
  13. Improving clip training with language rewrites. In NeurIPS, 2023.
  14. Fair diffusion: Instructing text-to-image generation models on fairness. ArXiv, abs/2302.10893, 2023.
  15. Uncurated image-text datasets: Shedding light on demographic bias. In CVPR, pages 6957–6966, 2023.
  16. Vision-language models performing zero-shot tasks exhibit gender-based disparities. ArXiv, abs/2301.11100, 2023.
  17. Will large-scale generative models corrupt future datasets? In ICCV, 2023.
  18. Women also snowboard: Overcoming bias in captioning models. In ECCV, pages 771–787, 2018.
  19. Clipscore: A reference-free evaluation metric for image captioning. In EMNLP, 2021.
  20. Quantifying societal bias amplification in image captioning. In CVPR, pages 13440–13449, 2022.
  21. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In WACV, pages 1548–1558, 2021.
  22. Situating the social issues of image generation models in the model life cycle: a sociotechnical approach. ArXiv, abs/2311.18345, 2023.
  23. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 123:32–73, 2016.
  24. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In ECCV, pages 121–137. Springer, 2020.
  25. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In ICLR, 2022.
  26. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In ACL, pages 74–81, 2004.
  27. Microsoft COCO: common objects in context. In ECCV, pages 740–755, 2014.
  28. 12-in-1: Multi-Task Vision and Language Representation Learning. In CVPR, pages 10434–10443, 2020.
  29. Stable bias: Analyzing societal representations in diffusion models. ArXiv, abs/2303.11408, 2023.
  30. Multimodal composite association score: Measuring gender bias in generative multimodal models. ArXiv, abs/2304.13855, 2023.
  31. Trends in integration of vision and language research: A survey of tasks, datasets, and methods. Journal of Artificial Intelligence Research, 71:1183–1317, 2021.
  32. Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021.
  33. SLIP: self-supervision meets language-image pre-training. In ECCV, pages 529–544, 2022.
  34. Social biases through the text-to-image generation lens. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 786–808, 2023.
  35. Pragmatic inference with a CLIP listener for contrastive captioning. In ACL, pages 1904–1917, 2023.
  36. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
  37. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. IJCV, 123:74–93, 2015.
  38. Learning transferable visual models from natural language supervision. In ICML, 2021.
  39. Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
  40. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685, 2022.
  41. Measuring social biases in grounded vision and language embeddings. In NAACL-HLT, pages 998–1008, 2021.
  42. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  43. Fake it till you make it: Learning transferable representations from synthetic imagenet clones. In CVPR, pages 8011–8021, 2023.
  44. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. ArXiv, abs/2111.02114, 2021.
  45. LAION-5B: an open large-scale dataset for training next generation image-text models. In NeurIPS, 2022.
  46. The bias amplification paradox in text-to-image generation. ArXiv, abs/2308.00755, 2023a.
  47. The bias amplification paradox in text-to-image generation. ArXiv, abs/2308.00755, 2023b.
  48. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, pages 2556–2565, 2018.
  49. Finetuning text-to-image diffusion models for fairness. ArXiv, abs/2311.07604, 2023.
  50. Worst of both worlds: Biases compound in pre-trained vision-and-language models. ArXiv, abs/2104.08666, 2021.
  51. Exploiting cultural biases via homoglyphs in text-to-image synthesis. ArXiv, abs/2209.08891, 2022.
  52. Mitigating gender bias in captioning systems. In WWW, pages 633–645, 2021.
  53. Data feedback loops: Model-driven amplification of dataset biases. In ICML, pages 33883–33920, 2023.
  54. Stablerep: Synthetic images from text-to-image models make strong visual representation learners. ArXiv, abs/2306.00984, 2023.
  55. Attention is all you need. NeurIPS, 2017.
  56. Cider: Consensus-based image description evaluation. In CVPR, pages 4566–4575, 2015.
  57. T2iat: Measuring valence and stereotypical biases in text-to-image generation. ArXiv, abs/2306.00905, 2023.
  58. Markedness in visual semantic AI. In FAccT, pages 1269–1279, 2022.
  59. Stable diffusion exposed: Gender bias from prompt to image. ArXiv, abs/2312.03027, 2023a.
  60. Not only generative art: Stable diffusion for content-style disentanglement in art analysis. In ICMR, pages 199–208, 2023b.
  61. Zero-textcap: Zero-shot framework for text-based image captioning. In ACM MM, pages 4949–4957, 2023.
  62. Lit: Zero-shot transfer with locked-image text tuning. In CVPR, pages 18102–18112, 2022.
  63. Free-atm: Exploring unsupervised learning on diffusion-generated images with free attention masks. ArXiv, abs/2308.06739, 2023a.
  64. Vision-language models for vision tasks: A survey. TPAMI, 2024.
  65. VinVL: Revisiting Visual Representations in Vision-Language Models. In CVPR, pages 5575–5584, 2021.
  66. Expanding small-scale datasets with guided imagination. ArXiv, abs/2211.13976, 2022.
  67. Auditing gender presentation differences in text-to-image models. ArXiv, abs/2302.03675, 2023b.
  68. Understanding and evaluating racial biases in image captioning. In ICCV, pages 14810–14820, 2021.
  69. Vlstereoset: A study of stereotypical bias in pre-trained vision-language models. In AACL/IJCNLP, pages 527–538, 2022a.
  70. Towards language-free training for text-to-image generation. In CVPR, pages 17886–17896, 2022b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianwei Chen (3 papers)
  2. Yusuke Hirota (9 papers)
  3. Mayu Otani (32 papers)
  4. Noa Garcia (33 papers)
  5. Yuta Nakashima (67 papers)
Citations (8)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com