The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects (2402.11089v3)

Published 16 Feb 2024 in cs.CV, cs.AI, and cs.CY

Abstract: Recent large-scale T2I models like DALLE-3 have made progress in reducing gender stereotypes when generating single-person images. However, significant biases remain when generating images with more than one person. To systematically evaluate this, we propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities, respectively (e.g. "a CEO" and "an Assistant"). This contrastive setting often triggers T2I models to generate gender-stereotyped images. Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power. Experiments show that over 74% images generated by DALLE-3 display gender-occupational biases. Additionally, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. We further propose FairCritic, a novel and interpretable framework that leverages an LLM-based critic model to i) detect bias in generated images, and ii) adaptively provide feedback to T2I models for improving fairness. FairCritic achieves near-perfect fairness on PST, overcoming the limitations of previous prompt-based intervention approaches.

PDF HTML Abstract

Exploring Gender Biases in Text-To-Image Models: The Paired Stereotype Test

The paper "The Male CEO and the Female Assistant: Probing Gender Biases in Text-To-Image Models Through Paired Stereotype Test" presents an analysis of gender biases in text-to-image (T2I) models, with a focus on multi-person scenarios. The authors have identified a gap in the current evaluation practices of T2I systems, which predominantly rely on single-person image generations to explore biases. To address this, they introduce the Paired Stereotype Test (PST) as a novel framework to investigate complex gender biases in multi-character image generation.

Methodology

The authors propose the PST framework to evaluate gender stereotypes in T2I models by instructing the model to generate images with two individuals who possess identities stereotypically associated with different genders. This setup contrasts with the conventional method of analyzing images with a single individual, which may not sufficiently capture underlying patterns of bias. The PST framework is applied to the evaluation of gender biases concerning occupational roles and organizational power dynamics, specifically examining OpenAI's DALLE-3 model.

Key Findings

The paper reveals notable biases in DALLE-3 when assessed using PST. Results indicate a significant bias in gendered occupations and power roles, where stereotypically male occupations or power positions are associated with masculine traits, and stereotypically female roles are linked to feminine traits. Notably, the biases become even more pronounced under the PST setting in comparison to single-person evaluation frameworks.

In quantitative terms, the gender bias in gendered occupation had an overall stereotype test score (STS) shift from a single-person evaluation score of 10.00 to 47.38 under the PST framework. This indicates a substantial increase that underscores the necessity and efficacy of PST in revealing hidden biases. Similarly, biases in organizational power increased with the overall STS score rising from 4.62 to 18.98, as evaluated through PST.

Implications and Future Directions

The findings have significant implications for the design and deployment of T2I models in real-world applications. The inherent biases identified could perpetuate harmful stereotypes if not addressed, spanning various applications from content creation to more complex multi-character scenes in videos or advertising. The research highlights an urgent need for more comprehensive bias evaluation frameworks in multimodal AI systems to ensure fairness.

For future work, the authors suggest extending the examination of biases beyond binary gender to include a spectrum of gender identities, as well as applying similar methodologies to other influential T2I models such as Google's Imagen. Another potential avenue involves developing strategies for mitigating the identified biases, particularly through model design and training data interventions.

Conclusion

This paper makes a compelling case for the use of the Paired Stereotype Test as a robust tool to uncover complex gender biases in T2I models. By drawing connections between generated stereotypes and real-world labor statistics, the paper provides not only evidence of bias presence but also an indication of these biases' alignment with societal stereotypes. The research contributes to the broader conversation on ethics and fairness in AI, calling for enhanced scrutiny in the deployment of generative models. It sets the stage for ongoing exploration and refinement of methodologies to address AI bias challenges, essential for the ethical progression of AI technologies.

PDF Markdown Bookmark Chat (Pro)

References (25)

Authors (2)

Yixin Wan (19 papers)
Kai-Wei Chang (292 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/yixin_wan_/status/1759987707059810334

https://twitter.com/WGOV/status/1804301846460157964