Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects (2402.11089v3)

Published 16 Feb 2024 in cs.CV, cs.AI, and cs.CY

Abstract: Recent large-scale T2I models like DALLE-3 have made progress in reducing gender stereotypes when generating single-person images. However, significant biases remain when generating images with more than one person. To systematically evaluate this, we propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities, respectively (e.g. "a CEO" and "an Assistant"). This contrastive setting often triggers T2I models to generate gender-stereotyped images. Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power. Experiments show that over 74% images generated by DALLE-3 display gender-occupational biases. Additionally, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. We further propose FairCritic, a novel and interpretable framework that leverages an LLM-based critic model to i) detect bias in generated images, and ii) adaptively provide feedback to T2I models for improving fairness. FairCritic achieves near-perfect fairness on PST, overcoming the limitations of previous prompt-based intervention approaches.

Exploring Gender Biases in Text-To-Image Models: The Paired Stereotype Test

The paper "The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype Test" presents an analysis of gender biases in text-to-image (T2I) models, with a focus on multi-person scenarios. The authors have identified a gap in the current evaluation practices of T2I systems, which predominantly rely on single-person image generations to explore biases. To address this, they introduce the Paired Stereotype Test (PST) as a novel framework to investigate complex gender biases in multi-character image generation.

Methodology

The authors propose the PST framework to evaluate gender stereotypes in T2I models by instructing the model to generate images with two individuals who possess identities stereotypically associated with different genders. This setup contrasts with the conventional method of analyzing images with a single individual, which may not sufficiently capture underlying patterns of bias. The PST framework is applied to the evaluation of gender biases concerning occupational roles and organizational power dynamics, specifically examining OpenAI's DALLE-3 model.

Key Findings

The paper reveals notable biases in DALLE-3 when assessed using PST. Results indicate a significant bias in gendered occupations and power roles, where stereotypically male occupations or power positions are associated with masculine traits, and stereotypically female roles are linked to feminine traits. Notably, the biases become even more pronounced under the PST setting in comparison to single-person evaluation frameworks.

In quantitative terms, the gender bias in gendered occupation had an overall stereotype test score (STS) shift from a single-person evaluation score of 10.00 to 47.38 under the PST framework. This indicates a substantial increase that underscores the necessity and efficacy of PST in revealing hidden biases. Similarly, biases in organizational power increased with the overall STS score rising from 4.62 to 18.98, as evaluated through PST.

Implications and Future Directions

The findings have significant implications for the design and deployment of T2I models in real-world applications. The inherent biases identified could perpetuate harmful stereotypes if not addressed, spanning various applications from content creation to more complex multi-character scenes in videos or advertising. The research highlights an urgent need for more comprehensive bias evaluation frameworks in multimodal AI systems to ensure fairness.

For future work, the authors suggest extending the examination of biases beyond binary gender to include a spectrum of gender identities, as well as applying similar methodologies to other influential T2I models such as Google's Imagen. Another potential avenue involves developing strategies for mitigating the identified biases, particularly through model design and training data interventions.

Conclusion

This paper makes a compelling case for the use of the Paired Stereotype Test as a robust tool to uncover complex gender biases in T2I models. By drawing connections between generated stereotypes and real-world labor statistics, the paper provides not only evidence of bias presence but also an indication of these biases' alignment with societal stereotypes. The research contributes to the broader conversation on ethics and fairness in AI, calling for enhanced scrutiny in the deployment of generative models. It sets the stage for ongoing exploration and refinement of methodologies to address AI bias challenges, essential for the ethical progression of AI technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. How well can text-to-image generative models understand ethical natural language interventions? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1358–1370, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  2. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 1493–1504, New York, NY, USA. Association for Computing Machinery.
  3. Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models. In ICCV.
  4. David Collinson and Jeff Hearn. 1996. ‘men managing leadership? men and women of the corporation revisited’. The International Review of Women and Leadership, 1:1–24.
  5. Kevin Crowston. 2012. Amazon mechanical turk: A research tool for organizations and information systems scholars. In Shaping the Future of ICT Research. Methods and Approaches, pages 210–221, Berlin, Heidelberg. Springer Berlin Heidelberg.
  6. Jisu Choi Jongmin Kim Minwoo Byeon Woonhyuk Baek Donghoon Lee, Jiseob Kim and Saehoon Kim. 2022. Karlo-v1.0.alpha on coyo-100m and cc15m. https://github.com/kakaobrain/karlo.
  7. A friendly face: Do text-to-image systems rely on stereotypes when the input is under-specified? In The AAAI-23 Workshop on Creative AI Across Modalities.
  8. Fair diffusion: Instructing text-to-image generation models on fairness. arXiv preprint at arXiv:2302.10893.
  9. Susan Halford and Pauline Leonard. 2001. Gender, Power and Organisation, pages 214–234.
  10. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15, page 3819–3828, New York, NY, USA. Association for Computing Machinery.
  11. Heli K. Lahtinen and Fiona M. Wilson. 1994. Women and power in organizations. Executive Development, 7(3):16. Copyright - Copyright MCB University Press Limited 1994.
  12. Mini-dalle3: Interactive text to image by prompting large language models.
  13. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.
  14. Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, page 786–808, New York, NY, USA. Association for Computing Machinery.
  15. OpenAI. 2023. Dall·e 3 system card.
  16. Editing implicit assumptions in text-to-image diffusion models. arXiv:2303.08084.
  17. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning.
  18. High-resolution image synthesis with latent diffusion models.
  19. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  20. LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  21. The bias amplification paradox in text-to-image generation.
  22. Female librarians and male computer programmers? gender bias in occupational images on digital media platforms. Journal of the Association for Information Science and Technology, 71:1281 – 1294.
  23. U.S. Bureau of Labor Statistics. 2024. Labor force statistics from the current population survey.
  24. T2IAT: Measuring valence and stereotypical biases in text-to-image generation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2560–2574, Toronto, Canada. Association for Computational Linguistics.
  25. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yixin Wan (19 papers)
  2. Kai-Wei Chang (292 papers)