ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation (2401.06310v3)
Abstract: Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of known nationality-based stereotypes in T2I models, across 135 nationalities. We enrich an existing textual stereotype resource by distinguishing between stereotypical associations that are more likely to have visual depictions, such as `sombrero', from those that are less visually concrete, such as 'attractive'. We demonstrate ViSAGe's utility through a multi-faceted evaluation of T2I generations. First, we show that stereotypical attributes in ViSAGe are thrice as likely to be present in generated images of corresponding identities as compared to other attributes, and that the offensiveness of these depictions is especially higher for identities from Africa, South America, and South East Asia. Second, we assess the stereotypical pull of visual depictions of identity groups, which reveals how the 'default' representations of all identity groups in ViSAGe have a pull towards stereotypical depictions, and that this pull is even more prominent for identity groups from the Global South. CONTENT WARNING: Some examples contain offensive stereotypes.
- Inspecting the geographical representativeness of images from text-to-image models. arXiv preprint arXiv:2305.11080.
- Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. FAccT ’23, page 1493–1504, New York, NY, USA. Association for Computing Machinery.
- Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1493–1504.
- Uncurated image-text datasets: Shedding light on demographic bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6957–6966.
- Sourojit Ghosh and Aylin Caliskan. 2023. ‘person’ == light-skinned, western man, and sexualization of women of color: Stereotypes in stable diffusion. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6971–6985, Singapore. Association for Computational Linguistics.
- Facet: Fairness in computer vision evaluation benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20370–20382.
- Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 501–512, New York, NY, USA. Association for Computing Machinery.
- Safety and fairness for content moderation in generative models. arXiv preprint arXiv:2306.06135.
- SeeGULL: A stereotype benchmark with broad geo-cultural coverage leveraging generative models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9851–9870, Toronto, Canada. Association for Computational Linguistics.
- Stable bias: Evaluating societal representations in diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Stereoset: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371.
- Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. arXiv preprint arXiv:2304.06034.
- Crows-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967.
- On releasing annotator-level labels and information in datasets. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 133–138.
- Ai’s regimes of representation: A community-centered study of text-to-image models in south asia. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pages 506–517.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
- T2iat: Measuring valence and stereotypical biases in text-to-image generation. arXiv preprint arXiv:2306.00905.
- Akshita Jha (8 papers)
- Vinodkumar Prabhakaran (48 papers)
- Remi Denton (10 papers)
- Sarah Laszlo (6 papers)
- Shachi Dave (12 papers)
- Rida Qadri (7 papers)
- Chandan K. Reddy (64 papers)
- Sunipa Dev (28 papers)