Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation (2405.05382v1)

Published 8 May 2024 in cs.CY

Abstract: Text-to-image models are now easy to use and ubiquitous. However, prior work has found that they are prone to recapitulating harmful Western stereotypes. For example, requesting that a model generate an "African person and their house," may produce a person standing next to a straw hut. In this example, the word "African" is an explicit descriptor of the person that the prompt is seeking to depict. Here, we examine whether implicit markers, such as dialect, can also affect the portrayal of people in text-to-image outputs. We pair prompts in Mainstream American English with counterfactuals that express grammatical constructions found in dialects correlated with historically marginalized groups. We find that through minimal, syntax-only changes to prompts, we can systematically shift the skin tone and gender of people in the generated images. We conclude with a discussion of whether dialectic distribution shifts like this are harmful or are expected, possibly even desirable, model behavior.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Development and verification of a verbal corpus based on natural language for Ecuadorian dialect. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC). IEEE, 515–519.
  2. David M Amodio and Patricia G Devine. 2006. Stereotyping and evaluation in implicit race bias: evidence for independent constructs and unique effects on behavior. Journal of personality and social psychology 91, 4 (2006), 652.
  3. Differential tweetment: Mitigating racial dialect bias in harmful tweet detection. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 116–128.
  4. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 1493–1504.
  5. Abeba Birhane and Fred Cummins. 2019. Algorithmic injustices: Towards a relational ethics. arXiv preprint arXiv:1912.07376 (2019).
  6. Demographic dialectal variation in social media: A case study of African-American English. arXiv preprint arXiv:1608.08868 (2016).
  7. Su Lin Blodgett and Brendan O’Connor. 2017. Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 (2017).
  8. A Multidialectal Parallel Corpus of Arabic.. In LREC. 1240–1245.
  9. Prosody-based spoken Algerian Arabic dialect identification. Procedia Computer Science 128 (2018), 9–17.
  10. Envisioning Equitable Speech Technologies for Black Older Adults. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 379–388.
  11. Language Models are Few-Shot Learners. arXiv preprint arXiv:2005.14165 (2020).
  12. Toward a Multilingual Conversational Agent: Challenges and Expectations of Code-mixing Multilingual Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
  13. Anna Chung. 2019. How automated tools discriminate against black language. People of Color in Tech 5 (2019).
  14. Cynthia Clopper. 2017. Dialect interference in lexical processing: Effects of familiarity and social stereotypes. Phonetica 74, 1 (2017), 25–59.
  15. Tulsee Doshi. 2022. Improving skin tone representation across Google. Keyword Blog. Available online at: https://blog.google/products/search/monk-skin-tone-scale/; Accessed 2023-09-12.
  16. Thomas B Fitzpatrick. 1988. The validity and practicality of sun-reactive skin types I through VI. Archives of dermatology 124, 6 (1988), 869–871.
  17. A Friendly Face: Do Text-to-Image Systems Rely on Stereotypes when the Input is Under-Specified? arXiv preprint arXiv:2302.07159 (2023).
  18. The last decade of HCI research on children and voice-based conversational agents. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
  19. Not yet human: implicit knowledge, historical dehumanization, and contemporary consequences. Journal of personality and social psychology 94, 2 (2008), 292.
  20. Francisco J Gutierrez and Sergio F Ochoa. 2016. Mom, I do have a family! Attitudes, agreements, and expectations on the interaction with Chilean older adults. In Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. 1402–1411.
  21. Evaluating large language models in generating synthetic hci research data: a case study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
  22. Maghrebi Arabic dialect processing: an overview. Journal of International Science and General Applications 1 (2018).
  23. “It’s Kind of Like Code-Switching”: Black Older Adults’ Experiences with a Voice Assistant for Health Information Seeking. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15.
  24. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  25. Nora Hollenstein and Noëmi Aepli. 2015. A resource for natural language processing of Swiss German dialects. (2015).
  26. Nick Huang. 2011. Multiple Modals. Yale Grammatical Diversity Project: English in North America. Available online at:http://ygdp.yale.edu/phenomena/multiple-modals; Accessed 2023-09-12. Updated by Tom McCoy (2015) and Katie Martin (2018)..
  27. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning. PMLR, 4904–4916.
  28. Representational Harms in Image Tagging. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (2023), Vol. 5.
  29. eWAVE. https://ewave-atlas.org/
  30. Reid Luhman. 1990. Appalachian English stereotypes: language attitudes in Kentucky. Language in Society 19, 3 (1990), 331–348.
  31. Gretchen McCulloch. 2020. Because internet: Understanding the new rules of language. Penguin.
  32. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4 (2021), 169.
  33. Ellis Monk. 2023. The Monk Skin Tone Scale. (2023).
  34. Salikoko S Mufwene. 2015. The emergence of creoles and language change. In The Routledge Handbook of Linguistic Anthropology. Routledge, 348–365.
  35. Tunisian dialect sentiment analysis: a natural language processing-based approach. Computación y Sistemas 22, 4 (2018), 1223–1232.
  36. OpenAI. 2022. Reducing bias and improving safety in DALL·E 2. OpenAI Blog. Available online at: https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2#OpenAI; Accessed 2023-09-12.
  37. Jonas Oppenlaender. 2022. Prompt engineering for text-based generative art. arXiv preprint arXiv:2204.13988 (2022).
  38. Kyle Parsard. 2016. Null Copula. Yale Grammatical Diversity Project: English in North America. Available online at:http://ygdp.yale.edu/phenomena/null-copula; Accessed 2023-09-12. Updated by Jim Wood (2017) and Katie Martin (2018).
  39. Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. ACM Computing Surveys (CSUR) 55, 3 (2022), 1–44.
  40. Vinay Uday Prabhu and Abeba Birhane. 2020. Large image datasets: A pyrrhic win for computer vision? arXiv preprint arXiv:2006.16923 (2020).
  41. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821–8831.
  42. Louis M Rea and Richard A Parker. 2014. Designing and conducting survey research: A comprehensive guide. John Wiley & Sons.
  43. Pedro Reviriego and Elena Merino-Gómez. 2022. Text to image generation: leaving no language behind. arXiv preprint arXiv:2208.09333 (2022).
  44. Jason Tyler Rolfe. 2016. Discrete variational autoencoders. arXiv preprint arXiv:1609.02200 (2016).
  45. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  46. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  47. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
  48. The risk of racial bias in hate speech detection. In Proceedings of the 57th annual meeting of the association for computational linguistics. 1668–1678.
  49. Counterfactually fair automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 3515–3525.
  50. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022), 25278–25294.
  51. Candice Schumann and Gbolahan O. Olanubi. 2023. Consensus and subjectivity of skin tone annotation for ML fairness. Google Research Blog. Available online at: https://blog.research.google/2023/05/consensus-and-subjectivity-of-skin-tone_15.html; Accessed 2023-09-12.
  52. Consensus and Subjectivity of Skin Tone Annotation for ML Fairness. arXiv preprint arXiv:2305.09073 (2023).
  53. Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 723–741.
  54. Abdulhadi Shoufan and Sumaya Alameri. 2015. Natural language processing for dialectical Arabic: A survey. In Proceedings of the second workshop on Arabic natural language processing. 36–48.
  55. Prompting gpt-3 to be reliable. arXiv preprint arXiv:2210.09150 (2022).
  56. Morgan P Slusher and Craig A Anderson. 1987. When reality monitoring fails: The role of imagination in stereotype maintenance. Journal of Personality and Social Psychology 52, 4 (1987), 653.
  57. The biased artist: Exploiting cultural biases via homoglyphs in text-guided image generation models. arXiv preprint arXiv:2209.08891 (2022).
  58. Rachael Tatman. 2017. Gender and dialect bias in YouTube’s automatic captions. In Proceedings of the first ACL workshop on ethics in natural language processing. 53–59.
  59. United States Census Bureau. 2022. United States Census Bureau. https://www.census.gov.
  60. Neural discrete representation learning. Advances in neural information processing systems 30 (2017).
  61. Walt Wolfram and Natalie Schilling. 2015. American English: dialects and variation. John Wiley & Sons.
  62. Diyi Yang and Lucie Flek. 2021. Towards user-centric text-to-text generation: A survey. In Text, Speech, and Dialogue: 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings 24. Springer, 3–22.
  63. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 2, 3 (2022), 5.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Joshua N. Williams (1 paper)
  2. Molly FitzMorris (1 paper)
  3. Osman Aka (3 papers)
  4. Sarah Laszlo (6 papers)
Github Logo Streamline Icon: https://streamlinehq.com