ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation (2401.06310v3)

Published 12 Jan 2024 in cs.CV, cs.CL, and cs.CY

Abstract: Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of known nationality-based stereotypes in T2I models, across 135 nationalities. We enrich an existing textual stereotype resource by distinguishing between stereotypical associations that are more likely to have visual depictions, such as `sombrero', from those that are less visually concrete, such as 'attractive'. We demonstrate ViSAGe's utility through a multi-faceted evaluation of T2I generations. First, we show that stereotypical attributes in ViSAGe are thrice as likely to be present in generated images of corresponding identities as compared to other attributes, and that the offensiveness of these depictions is especially higher for identities from Africa, South America, and South East Asia. Second, we assess the stereotypical pull of visual depictions of identity groups, which reveals how the 'default' representations of all identity groups in ViSAGe have a pull towards stereotypical depictions, and that this pull is even more prominent for identity groups from the Global South. CONTENT WARNING: Some examples contain offensive stereotypes.

View on arXiv

References (17)

Authors (8)

Akshita Jha (8 papers)
Vinodkumar Prabhakaran (48 papers)
Remi Denton (10 papers)
Sarah Laszlo (6 papers)
Shachi Dave (12 papers)
Rida Qadri (7 papers)
Chandan K. Reddy (64 papers)
Sunipa Dev (28 papers)

Citations (6)

View on Semantic Scholar

Summary

Overview of ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

This paper introduces ViSAGe (Visual Stereotypes Around the Globe), a novel dataset aimed at evaluating nationality-based stereotypes in Text-to-Image (T2I) models across 135 nationalities. The paper addresses the deficiency in current approaches that fail to encompass stereotypes linked to global identity groups, providing a systematic framework for examining stereotypical visual depictions in generated images.

Key Methodological Contributions

The authors establish a comprehensive evaluation procedure that distinguishes between 'visual' and 'non-visual' stereotypes. This involves enriching an existing textual stereotype dataset by identifying stereotypes that can be objectively represented in images and those that cannot. The distinction is foundational for analyzing images generated by T2I models.

Dataset Construction: The ViSAGe dataset is introduced as an extension of the SeeGULL textual resource, focusing on visual stereotypes. It encompasses annotations for 385 visual attributes and 40,057 image-attribute pairs across various identity groups.
Human and Automated Evaluation: The utility of ViSAGe is demonstrated through a multifaceted evaluation involving both human annotations and automated methods. Human annotators rate the presence of stereotypes in T2I-generated images, while automated techniques provide a scalable approach to stereotype detection.
Offensiveness and Stereotypical Bias: The paper reveals that stereotypical depictions are significantly more likely to appear in images than non-stereotypical ones, especially for identities from Africa, South America, and South East Asia. It also analyzes the offensiveness of these depictions, highlighting a tilt towards more offensive portrayals for certain groups.
Stereotypical Pull: The authors explore the phenomenon of 'stereotypical pull,' wherein T2I models tend to revert to default stereotypical representations even when prompted for non-stereotypical depictions. This bias is more pronounced for identity groups from the Global South, as evidenced by high similarity scores between default and stereotypical images.

Implications and Future Directions

The paper's implications are manifold. Practically, it informs the development of more equitable T2I models by highlighting the inherent biases that can perpetuate stereotypes. Theoretically, it contributes to our understanding of how AI models internalize and manifest societal biases, urging the incorporation of corrective measures in model training and output verification processes.

For future developments, the ViSAGe dataset paves the way for further exploration into additional forms of stereotyping—particularly gender, race, and ethnicity—and their visual manifestations. The authors advocate for participatory data collection approaches to enrich stereotype datasets, emphasizing the importance of inclusivity and representation in AI research. Moreover, the work challenges the AI community to refine stereotype detection methods, leveraging both human insights and advancements in automated analysis to build robust safeguards against biased AI outputs.

By providing a robust framework for analysis, ViSAGe stands as a pivotal resource for researchers dedicated to mitigating stereotype perpetuation in AI-generated content, ensuring that technology aligns more closely with principles of fairness and diversity.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/akshitajha/status/1791594777135235363

https://twitter.com/vinodkpg/status/1822886739066880178

https://twitter.com/MilaNLProc/status/1796571039683711172

https://twitter.com/fly51fly/status/1747013355200737606

https://twitter.com/semisance/status/1746906341493403842

https://twitter.com/WGOV/status/1813362122853618032