"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations (2405.05378v1)
Abstract: LLMs have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.
- Dominic Abrams and Michael A Hogg. 2010. Social identity and self-categorization. The SAGE handbook of prejudice, stereotyping and discrimination, 1:179–193.
- Key female characters in film have more to talk about besides men: Automating the bechdel test. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 830–840.
- Casteism in India, but not racism - a study of bias in word embeddings of Indian languages. In Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference, pages 1–7, Marseille, France. European Language Resources Association.
- Just say no: Analyzing the stance of neural dialogue generation in offensive contexts. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4846–4862, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cristian Barsu. 2017. History of medicine between tradition and modernity. Clujul Medical, 90.
- Mrinal Prakash Barua and Anita Verma. 2021. Workplace bullying in healthcare facilities: Role of caste and reservation. Indian journal of medical ethics, 6(1):1–7.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623.
- Gerald D Berreman. 1972. Race, caste, and other invidious distinctions in social stratification. Race, 13(4):385–414.
- Stereotyping norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1004–1015.
- The context and content of social identity threat. Sepsis, pages 35–55.
- The context and content of social identity threat. Social identity: Context, commitment, content, pages 35–58.
- Britannica. 2023. Nursing. https://www.britannica.com/science/nursing. Accessed: 2024-01-11.
- Chris Brown and Ruth Luzmore. 2021. A Brief History of Education – From Ancient Greece to the Enlightenment, pages 39–55. Emerald Publishing Limited.
- Language models are few-shot learners. Preprint, arXiv:2005.14165.
- Toxicity detection is not all you need: Measuring the gaps to supporting volunteer content moderators. arXiv preprint arXiv:2311.07879.
- Assessing cross-cultural alignment between chatgpt and human societies: An empirical study. arXiv preprint arXiv:2303.17466.
- Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472.
- Convokit: A toolkit for the analysis of conversations. arXiv preprint arXiv:2005.04246.
- How is chatgpt’s behavior changing over time? arXiv preprint arXiv:2307.09009.
- Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937.
- De Choudhury et al. 2023. Ask me in english instead: Cross-lingual evaluation of large language models for healthcare queries. arXiv preprint arXiv:2310.13132.
- Large legal fictions: Profiling legal hallucinations in large language models. Preprint, arXiv:2401.01301.
- Paresh Dave. 2020. California accuses cisco of job discrimination based on indian employee’s caste. Reuters.
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36.
- On measures of biases and harms in NLP. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 246–267, Online only. Association for Computational Linguistics.
- Down the toxicity rabbit hole: A novel framework to bias audit large language models. arXiv e-prints, pages arXiv–2309.
- Self and social identity*. Annual review of psychology, 53:161–86.
- A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology, 82:878–902.
- Understanding and countering stereotypes: A computational approach to the stereotype content model. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 600–616, Online. Association for Computational Linguistics.
- Megan Fritts and Frank Cabrera. 2021. Ai recruitment algorithms and the dehumanization problem. Ethics and Information Technology, 23:791–801.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
- Sobin George. 2015. Caste and care: is Indian healthcare delivery system favourable for Dalits?, volume 350. Institute for Social and Economic Change.
- Sobin George. 2019. Reconciliations of caste and medical power in rural public health services. Economic and Political Weekly, 54(40):43–50.
- Sourojit Ghosh and Aylin Caliskan. 2023. Chatgpt perpetuates gender bias in machine translation and ignores non-gendered pronouns: Findings across bengali and five other low-resource languages. arXiv preprint arXiv:2305.10510.
- Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789–1819.
- MiniLLM: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations.
- Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1012–1023.
- Laura Hanu and Unitary team. 2020. Detoxify. Github. https://github.com/unitaryai/detoxify.
- Caitlin Harrington. 2023. Chatgpt is reshaping crowd work. https://www.wired.com/story/chatgpt-is-reshaping-crowd-work/.
- Dialect prejudice predicts ai decisions about people’s character, employability, and criminality. arXiv preprint arXiv:2403.00742.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Anna Lena Hunkenschroer and Alexander Kriebitz. 2023. Is ai recruiting (un) ethical? a human rights perspective on the use of ai for hiring. AI and Ethics, 3(1):199–213.
- Anna Lena Hunkenschroer and Christoph Luetge. 2022. Ethics of ai-enabled recruiting and selection: A review and research agenda. Journal of Business Ethics, 178(4):977–1007.
- Quantification of gender representation bias in commercial films based on image analysis. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–29.
- The perils of using Mechanical Turk to evaluate open-ended text generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1265–1285, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Casteist but not racist? quantifying disparities in large language model bias between india and the west. arXiv preprint arXiv:2309.08573.
- Handling and presenting harmful text in nlp research. arXiv preprint arXiv:2204.14256.
- Klaus Krippendorff. 2018. Content analysis: An introduction to its methodology. Sage publications.
- Tina Kubrak. 2020. Impact of films: Changes in young people’s attitudes after watching a movie. Behavioral sciences, 10(5):86.
- Kiran Kumbhar. 2021. The medical profession must urgently act on caste-based discrimination and harassment in their midst. Indian journal of medical ethics, 6(1):1–5.
- J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
- A new generation of perspective api: Efficient multilingual character-level transformers. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3197–3207.
- Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374.
- When my group is under attack: The development of a social identity threat scale. Group Processes & Intergroup Relations, 0(0):13684302231187857.
- Hatexplain: A benchmark dataset for explainable hate speech detection. In AAAI Conference on Artificial Intelligence.
- An empirical survey of the effectiveness of debiasing techniques for pre-trained language models. arXiv preprint arXiv:2110.08527.
- A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35.
- Michael Martinez. 2019. 50 years of software. https://www.computer.org/publications/tech-news/trends/50-years-of-software. Accessed: 2024-01-11.
- Rethinking the role of demonstrations: What makes in-context learning work? Preprint, arXiv:2202.12837.
- Co-writing screenplays and theatre scripts with language models: Evaluation by industry professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–34.
- Shyamal Mishra and Preetha Chatterjee. 2023. Exploring chatgpt for toxicity detection in github. arXiv preprint arXiv:2312.13105.
- Dena F Mujtaba and Nihar R Mahapatra. 2019. Ethical considerations in ai-based recruitment. In 2019 IEEE International Symposium on Technology and Society (ISTAS), pages 1–7. IEEE.
- A human-centered evaluation of a toxicity detection api: Testing transferability and unpacking latent attributes. Trans. Soc. Comput., 6(1–2).
- Having beer after prayer? measuring cultural bias in large language models. arXiv preprint arXiv:2305.14456.
- National Fund for Workforce Solutions. 2022. Racial bias in hiring practices widens the black-white wealth disparity. https://nationalfund.org/racial-bias-in-hiring-practices-widens-the-black-white-wealth-disparity/.
- OpenAI. 2024a. Best practices for prompt engineering with the openai api. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api. Accessed:2024-01-11.
- OpenAI. 2024b. Changelog - openai apia. https://platform.openai.com/docs/changelog.
- OpenAI. 2024c. Prompt engineering. https://platform.openai.com/docs/guides/prompt-engineering. Accessed:2024-01-11.
- OpenAI. 2024. Text generation models. https://platform.openai.com/docs/guides/text-generation.
- Caste identities and structures of threats: Stigma, prejudice, and social representation in indian universities. CASTE: A Global Journal on Social Exclusion, 4(1):pp. 3–23.
- Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pages 469–481.
- Inioluwa Deborah Raji and Joy Buolamwini. 2022. Actionable auditing revisited: Investigating the impact of publicly naming biased performance results of commercial ai products. Commun. ACM, 66(1):101–108.
- Fairness in language models beyond english: Gaps and challenges. arXiv preprint arXiv:2302.12578.
- Jessica L Roberts. 2015. Rethinking employment discrimination harms. Ind. LJ, 91:393.
- Measuring the reliability of hate speech annotations: The case of the european refugee crisis. arXiv preprint arXiv:1701.08118.
- Re-imagining algorithmic fairness in india and beyond. Preprint, arXiv:2101.09995.
- Richard T Schaefer. 2008. Encyclopedia of race, ethnicity, and society, volume 1. Sage.
- A comparison of symbolic racism theory and social dominance theory as explanations for racial policy attitude. Journal of Social Psychology - J SOC PSYCHOL, 132:377–395.
- Eric Michael Smith and Adina Williams. 2021. Hi, my name is martha: Using names to measure and mitigate bias in generative dialogue models. arXiv preprint arXiv:2109.03300.
- Walter Stephan and W.S. Cookie. 2000. An integrated threat theory of prejudice.” in stuart oskamp (ed.). Reducing Prejudice and Discrimination, pages 23–46.
- Mitigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976.
- Henri Tajfel and John C Turner. 2004. The social identity theory of intergroup behavior. In Political psychology, pages 276–293. Psychology Press.
- Petter Törnberg. 2023. Chatgpt-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning. arXiv preprint arXiv:2304.06588.
- Performance and risk trade-offs for multi-word text prediction at scale. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2226–2242.
- Investigating hiring bias in large language models. In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models.
- Sakshi Venkatraman. 2022. Big tech’s big problem is also its ‘best-kept secret’: Caste discrimination. NBC News.
- Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks. Preprint, arXiv:2306.07899.
- On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv preprint arXiv:2302.12095.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Challenges in detoxifying language models. Preprint, arXiv:2109.07445.
- Ex machina: Personal attacks seen at scale. Preprint, arXiv:1610.08914.
- The earth is flat because…: Investigating llms’ belief towards misinformation via persuasive conversation. arXiv preprint arXiv:2312.09085.
- Josephine Yam and Joshua August Skorburg. 2021. From human resources to human rights: Impact assessments for hiring algorithms. Ethics and Information Technology, 23(4):611–623.
- Efficient toxic content detection by bootstrapping and distilling large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 21779–21787.
- Judging llm-as-a-judge with mt-bench and chatbot arena. Preprint, arXiv:2306.05685.
- Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
- On robustness of prompt-based semantic parsing with large pre-trained language model: An empirical study on codex. arXiv preprint arXiv:2301.12868.
- Preetam Prabhu Srikar Dammu (6 papers)
- Hayoung Jung (4 papers)
- Anjali Singh (19 papers)
- Monojit Choudhury (66 papers)
- Tanushree Mitra (35 papers)