Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
136 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection (2305.14735v3)

Published 24 May 2023 in cs.CL, cs.AI, and cs.LG

Abstract: The impact of AI models on marginalized communities has traditionally been measured by identifying performance differences between specified demographic subgroups. Though this approach aims to center vulnerable groups, it risks obscuring patterns of harm faced by intersectional subgroups or shared across multiple groups. To address this, we draw on theories of marginalization from disability studies and related disciplines, which state that people farther from the norm face greater adversity, to consider the "margins" in the domain of toxicity detection. We operationalize the "margins" of a dataset by employing outlier detection to identify text about people with demographic attributes distant from the "norm". We find that model performance is consistently worse for demographic outliers, with mean squared error (MSE) between outliers and non-outliers up to 70.4% worse across toxicity types. It is also worse for text outliers, with a MSE up to 68.4% higher for outliers than non-outliers. We also find text and demographic outliers to be particularly susceptible to errors in the classification of severe toxicity and identity attacks. Compared to analysis of disparities using traditional demographic breakdowns, we find that our outlier analysis frequently surfaces greater harms faced by a larger, more intersectional group, which suggests that outlier analysis is particularly beneficial for identifying harms against those groups.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. 2018. Online hate index innovation brief.
  2. Jigsaw unintended bias in toxicity classification.
  3. Designing disaggregated evaluations of AI systems: Choices, considerations, and tradeoffs. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. ACM.
  4. Solon Barocas and Andrew D. Selbst. 2016. Big data’s disparate impact. SSRN Electronic Journal.
  5. Like trainer, like bot? inheritance of bias in algorithmic content moderation.
  6. LOF. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM.
  7. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pages 77–91. PMLR.
  8. Judith Butler. 1990. Gender Trouble. Routledge.
  9. Judith Butler. 1997. Disability rights and wrongs. Routledge.
  10. Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning.
  11. Kate Crawford. 2017. The trouble with bias. keynote talk at Neural Information Processing Systems (NIPS ‘17).
  12. Racial bias in hate speech and abusive language detection datasets. In Proceedings of the Third Workshop on Abusive Language Online, pages 25–35, Florence, Italy. Association for Computational Linguistics.
  13. L. J. Davis. 1995. Enforcing normalcy: Disability, deafness, and the body. Verso Books.
  14. L. J. Davis. 2014. The end of normalcy: Identity in a biocultural era. The University of Michigan Press.
  15. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM.
  16. Latent hatred: A benchmark for understanding implicit hate speech. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 345–363, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  17. P. S. Fass. 1980. The iq: A cultural and historical framework. American Journal of Education, 88.
  18. E. Goffman. 1963. Stigma: Notes on the management of spoiled identity. Touchstone.
  19. Mitigating racial biases in toxic language detection with an equity-based ensemble framework. In Equity and Access in Algorithms, Mechanisms, and Optimization, pages 1–11.
  20. Donna Haraway. 1988. Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies, 14(3):575–599.
  21. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc.
  22. Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
  23. Assessing algorithmic fairness with unobserved protected class using data combination. Management Science, 68(3):1959–1981.
  24. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2564–2572. PMLR.
  25. Youjin Kong. 2022. Are “intersectionally fair” AI algorithms really fair to women of color? a philosophical analysis. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM.
  26. Josh Lepawsky. 2019. No insides on the outsides.
  27. HateXplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14867–14875.
  28. Wes McKinney. 2010. Data structures for statistical computing in python. In Proceedings of the Python in Science Conference. SciPy.
  29. A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6):1–35.
  30. The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2):205395171667967.
  31. GlobEnc: Quantifying global token attribution by incorporating the whole encoder layer in transformers. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 258–271, Seattle, United States. Association for Computational Linguistics.
  32. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  33. Closing the ai accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, page 33–44, New York, NY, USA. Association for Computing Machinery.
  34. Radim Rehurek and Petr Sojka. 2011. Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic, 3(2).
  35. Edward W Said. 1988. Orientalism. Vintage Books, New York, NY.
  36. Auditing algorithms : Research methods for detecting discrimination on internet platforms.
  37. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1668–1678, Florence, Italy. Association for Computational Linguistics.
  38. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online. Association for Computational Linguistics.
  39. Tom Shakespeare. 2006. Disability rights and wrongs. Routledge, London, England.
  40. Nanna Thylstrup and Zeerak Talat. 2020. Detecting ‘dirt’ and ‘toxicity’: Rethinking content moderation as pollution behaviour.
  41. Richard Twine. 2002. Physiognomy, phrenology and the temporality of the body. Body & Society, 8(1):67–88.
  42. Learning from the worst: Dynamically generated datasets to improve online hate detection. In ACL.
  43. Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the First Workshop on Abusive Language Online, pages 78–84, Vancouver, BC, Canada. Association for Computational Linguistics.
  44. Conversations gone awry: Detecting early signs of conversational failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.