Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution (2403.03121v3)

Published 5 Mar 2024 in cs.CL

Abstract: LLMs reflect societal norms and biases, especially about gender. While societal biases and stereotypes have been extensively researched in various NLP applications, there is a surprising gap for emotion analysis. However, emotion and gender are closely linked in societal discourse. E.g., women are often thought of as more empathetic, while men's anger is more socially accepted. To fill this gap, we present the first comprehensive study of gendered emotion attribution in five state-of-the-art LLMs (open- and closed-source). We investigate whether emotions are gendered, and whether these variations are based on societal stereotypes. We prompt the models to adopt a gendered persona and attribute emotions to an event like 'When I had a serious argument with a dear person'. We then analyze the emotions generated by the models in relation to the gender-event pairs. We find that all models consistently exhibit gendered emotions, influenced by gender stereotypes. These findings are in line with established research in psychology and gender studies. Our study sheds light on the complex societal interplay between language, gender, and emotion. The reproduction of emotion stereotypes in LLMs allows us to use those models to study the topic in detail, but raises questions about the predictive use of those same LLMs for emotion applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
  2. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  3. Theory-grounded measurement of U.S. social stereotypes in English language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1276–1295, Seattle, United States. Association for Computational Linguistics.
  4. Marked personas: Using natural language prompts to measure stereotypes in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1504–1532, Toronto, Canada. Association for Computational Linguistics.
  5. Myisha Cherry and Owen Flanagan. 2017. The moral psychology of anger. Rowman & Littlefield.
  6. On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 173–181, Florence, Italy. Association for Computational Linguistics.
  7. Kate Crawford. 2017. The trouble with bias. In Conference on Neural Information Processing Systems (NIPS) – Keynote, Long Beach, US.
  8. Charles Darwin. 1871. The Descent of Man: and Selection in Relation to Sex. John Murray, Albemarle Street.
  9. Toxicity in chatgpt: Analyzing persona-assigned language models.
  10. Multi-dimensional gender bias classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 314–331, Online. Association for Computational Linguistics.
  11. Paul Ekman. 1992. An argument for basic emotions. Cognition & emotion, 6(3-4):169–200.
  12. Naomi Ellemers. 2018. Gender stereotypes. Annual review of psychology, 69:275–298.
  13. European Commission. 2023. Regulation of the European Parliament and of the Council on laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://www.europarl.europa.eu/doceo/document/TA-9-2023-0236_EN.html. See Amendment 52.
  14. Miranda Fricker. 2007. Epistemic injustice: Power and the ethics of knowing. Oxford University Press.
  15. Intrinsic bias metrics do not correlate with application bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1926–1940, Online. Association for Computational Linguistics.
  16. Anna Gotlib. 2017. The moral psychology of sadness. Rowman & Littlefield.
  17. Bias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892.
  18. The group as a basis for emergent stereotype consensus. European review of social psychology, 8(1):203–239.
  19. “you sound just like your father” commercial machine translation systems include stylistic biases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1686–1690, Online. Association for Computational Linguistics.
  20. Dana Crowley Jack. 2011. Reflections on the silencing the self scale and its origins. Psychology of Women Quarterly, 35(3):523–529.
  21. Mistral 7b. arXiv preprint arXiv:2310.06825.
  22. IEST: WASSA-2018 implicit emotions shared task. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 31–42, Brussels, Belgium. Association for Computational Linguistics.
  23. Ronald F Levant and Shana Pryor. 2020. The tough standard: The hard truths about masculinity and violence. Oxford University Press, USA.
  24. Modulating language models with emotions. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4332–4339, Online. Association for Computational Linguistics.
  25. SemEval-2018 task 1: Affect in tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 1–17, New Orleans, Louisiana. Association for Computational Linguistics.
  26. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  27. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
  28. HONEST: Measuring hurtful sentence completion in language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2398–2406, Online. Association for Computational Linguistics.
  29. OpenAI. 2023. GPT-4 Technical Report.
  30. The gender stereotyping of emotions. Psychology of women quarterly, 24(1):81–92.
  31. Improved emotion recognition in spanish social media through incorporation of lexical knowledge. Future Generation Computer Systems, 110:1000–1008.
  32. Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orleans, Louisiana. Association for Computational Linguistics.
  33. Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9:845–874.
  34. Klaus R Scherer and Harald G Wallbott. 1994. Evidence for universality and cultural variation of differential emotion response patterning. Journal of personality and social psychology, 66(2):310.
  35. Revealing persona biases in dialogue systems.
  36. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics.
  37. Stephanie A Shields. 2013. Gender and emotion: What we think we know, what we need to know, and why it matters. Psychology of Women Quarterly, 37(4):423–435.
  38. Generating responses with a specific emotion in dialog. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3685–3695, Florence, Italy. Association for Computational Linguistics.
  39. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.
  40. Dana Jalbert Stauffer. 2008. Aristotle’s account of the subjection of women. The Journal of Politics, 70(4):929–941.
  41. Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1630–1640, Florence, Italy. Association for Computational Linguistics.
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  43. “kelly is a warm person, joseph is a role model”: Gender biases in LLM-generated reference letters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3730–3748, Singapore. Association for Computational Linguistics.
  44. Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9677–9705, Singapore. Association for Computational Linguistics.
  45. Decodingtrust: A comprehensive assessment of trustworthiness in GPT models. arXiv preprint arXiv:2306.11698.
  46. Emotion-aware chat machine: Automatic emotional response generation for human-like emotional interaction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, page 1401–1410, New York, NY, USA. Association for Computing Machinery.
  47. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
Citations (15)

Summary

  • The paper demonstrates that LLMs replicate gender stereotypes by attributing anger to men and sadness to women using over 200K gender-event pairs.
  • The authors employ persona-based prompts with models like GPT-4 and LLaMA to reveal statistically significant biases in emotion attribution.
  • The study highlights ethical concerns in emotion analysis, urging interdisciplinary solutions to mitigate gender bias in NLP applications.

Overview of Gendered Stereotypes in Emotion Attribution in LLMs

The paper entitled "Angry Men, Sad Women: LLMs Reflect Gendered Stereotypes in Emotion Attribution" provides a thorough investigation into how LLMs replicate gender-based stereotypes in emotion attribution. This research addresses a notable gap in current NLP literature regarding the intersection of gender bias and emotion analysis. While societal norms and biases, particularly regarding gender, have been extensively documented in various applications, this paper focuses specifically on emotion attribution, providing the first comprehensive paper of its kind on state-of-the-art LLMs.

The research scrutinizes five prominent LLMs, examining how these models attribute emotions when prompted with scenarios connected to gendered personas. The authors present compelling evidence that these models predict emotions consistent with stereotypical narratives—women are more frequently associated with sadness, and men with anger—aligning with established psychological and gender studies. Using persona-based prompting techniques, the paper explores 200K+ gender-event pairs to corroborate these misalignments.

Key Findings and Methodology

The authors utilize the International Survey On Emotion Antecedents And Reactions (ISEAR) dataset alongside modern LLMs—among them, LLaMA and GPT-4—to assess whether LLMs inherently reflect societal gender stereotypes. Their methodology involves using persona-based prompts to ask models what emotion a gendered persona might experience during a given scenario. The reevaluations of these contexts reveal statistically significant differences in the way emotions are attributed across genders.

Quantitative results clearly show that stereotypes such as "angry men" and "sad women" persist within model outputs. Such findings underline that LLMs tend to amplify disparities rather than merely mirror real-world lived experiences. The disparity underlines the models' failure to objectively replicate the true socio-emotional experiences presented in datasets—a potential pitfall in using LLMs for emotion-driven applications.

Implications and Future Perspectives

This paper's revelations highlight significant implications for any NLP applications involving emotion recognition or sentiment analysis, particularly in areas demanding nuanced understandings, such as mental health diagnostics and human-computer interaction. The propagation of such stereotypes poses ethical and practical concerns, as these models may unwittingly perpetuate harmful biases and provide skewed emotional intelligence assessments.

The authors advocate for a reconsideration of how such bias-laden models are used in emotion-related applications. They suggest that societal norms embedded into these models could lead to disproportionate dangers, including impacting judgments and opportunities that involve human emotions. Emphasizing interdisciplinary approaches, the authors call for integrating insights from psychology and social sciences to craft more inclusive and equitable NLP systems.

Conclusion

This paper is significant in advancing the discourse on gender stereotyping in AI, urging for greater scrutiny in developing and deploying LLMs, particularly for emotion attribution tasks. It prompts critical discussions about how to responsibly utilize AI in emotionally sensitive domains and the importance of advancing fairness and inclusiveness within these evolving technologies. By shedding light on these recurring issues, the paper encourages broader investigation into ensuring ethical and unbiased AI systems capable of fostering both technical excellence and societal well-being.

Youtube Logo Streamline Icon: https://streamlinehq.com