The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects
Abstract: Recent large-scale T2I models like DALLE-3 have made progress in reducing gender stereotypes when generating single-person images. However, significant biases remain when generating images with more than one person. To systematically evaluate this, we propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities, respectively (e.g. "a CEO" and "an Assistant"). This contrastive setting often triggers T2I models to generate gender-stereotyped images. Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power. Experiments show that over 74\% images generated by DALLE-3 display gender-occupational biases. Additionally, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. We further propose FairCritic, a novel and interpretable framework that leverages an LLM-based critic model to i) detect bias in generated images, and ii) adaptively provide feedback to T2I models for improving fairness. FairCritic achieves near-perfect fairness on PST, overcoming the limitations of previous prompt-based intervention approaches.
- How well can text-to-image generative models understand ethical natural language interventions? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1358–1370, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, page 1493–1504, New York, NY, USA. Association for Computing Machinery.
- Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models. In ICCV.
- David Collinson and Jeff Hearn. 1996. ‘men managing leadership? men and women of the corporation revisited’. The International Review of Women and Leadership, 1:1–24.
- Kevin Crowston. 2012. Amazon mechanical turk: A research tool for organizations and information systems scholars. In Shaping the Future of ICT Research. Methods and Approaches, pages 210–221, Berlin, Heidelberg. Springer Berlin Heidelberg.
- Jisu Choi Jongmin Kim Minwoo Byeon Woonhyuk Baek Donghoon Lee, Jiseob Kim and Saehoon Kim. 2022. Karlo-v1.0.alpha on coyo-100m and cc15m. https://github.com/kakaobrain/karlo.
- A friendly face: Do text-to-image systems rely on stereotypes when the input is under-specified? In The AAAI-23 Workshop on Creative AI Across Modalities.
- Fair diffusion: Instructing text-to-image generation models on fairness. arXiv preprint at arXiv:2302.10893.
- Susan Halford and Pauline Leonard. 2001. Gender, Power and Organisation, pages 214–234.
- Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15, page 3819–3828, New York, NY, USA. Association for Computing Machinery.
- Heli K. Lahtinen and Fiona M. Wilson. 1994. Women and power in organizations. Executive Development, 7(3):16. Copyright - Copyright MCB University Press Limited 1994.
- Mini-dalle3: Interactive text to image by prompting large language models.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.
- Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’23, page 786–808, New York, NY, USA. Association for Computing Machinery.
- OpenAI. 2023. Dall·e 3 system card.
- Editing implicit assumptions in text-to-image diffusion models. arXiv:2303.08084.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning.
- High-resolution image synthesis with latent diffusion models.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
- LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- The bias amplification paradox in text-to-image generation.
- Female librarians and male computer programmers? gender bias in occupational images on digital media platforms. Journal of the Association for Information Science and Technology, 71:1281 – 1294.
- U.S. Bureau of Labor Statistics. 2024. Labor force statistics from the current population survey.
- T2IAT: Measuring valence and stereotypical biases in text-to-image generation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2560–2574, Toronto, Canada. Association for Computational Linguistics.
- Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.