Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety (2306.11530v1)

Published 20 Jun 2023 in cs.HC

Abstract: Conversational AI systems exhibit a level of human-like behavior that promises to have profound impacts on many aspects of daily life -- how people access information, create content, and seek social support. Yet these models have also shown a propensity for biases, offensive language, and conveying false information. Consequently, understanding and moderating safety risks in these models is a critical technical and social challenge. Perception of safety is intrinsically subjective, where many factors -- often intersecting -- could determine why one person may consider a conversation with a chatbot safe and another person could consider the same conversation unsafe. In this work, we focus on demographic factors that could influence such diverse perceptions. To this end, we contribute an analysis using Bayesian multilevel modeling to explore the connection between rater demographics and how raters report safety of conversational AI systems. We study a sample of 252 human raters stratified by gender, age group, race/ethnicity group, and locale. This rater pool provided safety labels for 1,340 human-chatbot conversations. Our results show that intersectional effects involving demographic characteristics such as race/ethnicity, gender, and age, as well as content characteristics, such as degree of harm, all play significant roles in determining the safety of conversational AI systems. For example, race/ethnicity and gender show strong intersectional effects, particularly among South Asian and East Asian women. We also find that conversational degree of harm impacts raters of all race/ethnicity groups, but that Indigenous and South Asian raters are particularly sensitive to this harm. Finally, we observe the effect of education is uniquely intersectional for Indigenous raters, highlighting the utility of multilevel frameworks for uncovering underrepresented social perspectives.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Akaike, H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): 716–723.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  3. Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1): 15–24.
  4. Basile, V. 2020. It’s the end of the gold standard as we know it. On the impact of pre-aggregation on the evaluation of highly subjective tasks. CEUR Workshop.
  5. A Drop of Ink may Make a Million Think: The Spread of False Information in Large Language Models. arXiv preprint arXiv:2305.04812.
  6. Like Trainer, Like Bot? Inheritance of Bias in Algorithmic Content Moderation. Social Informatics.
  7. Biswas, S. 2023. ChatGPT and the future of medical writing.
  8. Bürkner, P.-C. 2017. brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1): 1–28.
  9. Bürkner, P.-C. 2018. Advanced Bayesian Multilevel Modeling with the R Package brms.
  10. Efficient elicitation approaches to estimate collective crowd answers. CSCW, 1–25.
  11. Crenshaw, K. 1989. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. u. Chi. Legal f., 139.
  12. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1): 20–28.
  13. Intersectional feminism and behavior analysis. Behavior Analysis in Practice, 12: 831–838.
  14. Invited reflection: Intersectionality in quantitative and qualitative research. Psychology of Women Quarterly, 40(3): 347–350.
  15. Intersectionality in quantitative psychological research: I. Theoretical and epistemological issues. Psychology of Women Quarterly, 40(2): 155–170.
  16. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. arXiv:1802.00393.
  17. Bayesian data analysis. CRC press.
  18. Geng, X. 2016. Label Distribution Learning. In IEEE Transactions on Knowledge and Data Engineering, volume 28, 1734–1748. Issue: 7.
  19. Is ChatGPT Better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech. In Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, 294–297. New York, NY, USA: Association for Computing Machinery. ISBN 9781450394192.
  20. Parting crowds: Characterizing divergent interpretations in crowdsourced annotation tasks. In CSCW.
  21. Attack of the Chatbots: Screenwriters’ Friend or Foe?
  22. Harmonization sometimes harms. CEUR Workshops Proc.
  23. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic bulletin & review, 25: 178–206.
  24. Designing Toxic Content Classification for a Diversity of Perspectives. In SOUPS@ USENIX Security Symposium, 299–318.
  25. Learning to Predict Population-Level Label Distributions. In HCOMP.
  26. Indices of Effect Existence and Significance in the Bayesian Framework. Frontiers in Psychology, 10.
  27. A statistical model for the analysis of ordinal level dependent variables. Journal of mathematical sociology, 4(1): 103–120.
  28. The lay of the land: Associations between environmental features and personality. Journal of Personality.
  29. Neff, G. 2016. Talking to bots: Symbiotic agency and the case of Tay. International Journal of Communication.
  30. Dissecting racial bias in an algorithm used to manage the health of populations. Science.
  31. OpenAI. 2022. Introducing ChatGPT.
  32. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  33. ChatGPT: the future of discharge summaries? The Lancet Digital Health, 5(3): e107–e108.
  34. Linguistically debatable or just plain wrong? In ACL.
  35. On Releasing Annotator-Level Labels and Information in Datasets. In Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW).
  36. Effects of Background Music on Risk-Taking and General Player Experience. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play, CHI PLAY ’19, 213–224. New York, NY, USA: Association for Computing Machinery. ISBN 9781450366885.
  37. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548.
  38. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. arXiv:2111.07997.
  39. Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22, 2659–2673. New York, NY, USA: Association for Computing Machinery. ISBN 9781450394505.
  40. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. arXiv:2301.08653.
  41. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 5861–5873. Curran Associates, Inc.
  42. Enabling news consumers to view and understand biased news coverage: a study on the perception and visualization of media bias. In Proceedings of the ACM/IEEE joint conference on digital libraries in 2020, 389–392.
  43. Alpaca: A Strong, Replicable Instruction-Following Model.
  44. LaMDA: Language Models for Dialog Applications. arXiv:2201.08239.
  45. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
  46. Attention is all you need. Advances in neural information processing systems, 30.
  47. Vehtari, A. 2019. Cross-validation for hierarchical models.
  48. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27: 1413–1432.
  49. Watanabe, S. 2010. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11(116): 3571–3594.
  50. Neighborhood-based Pooling for Population-level Label Distribution Learning. In ECAI.
  51. Wodecki, B. 2023. That Was Fast: Stanford Yanks Alpaca Demo for Hallucinating.
  52. Recipes for safety in open-domain chatbots. arXiv preprint arXiv:2010.07079.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Christopher M. Homan (22 papers)
  2. Lora Aroyo (35 papers)
  3. Alicia Parrish (31 papers)
  4. Vinodkumar Prabhakaran (48 papers)
  5. Alex S. Taylor (2 papers)
  6. Ding Wang (71 papers)
  7. Greg Serapio-Garcia (2 papers)
  8. Mark Diaz (10 papers)
Citations (7)