Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithmic Arbitrariness in Content Moderation (2402.16979v1)

Published 26 Feb 2024 in cs.CY, cs.LG, and cs.SI

Abstract: Machine learning (ML) is widely used to moderate online content. Despite its scalability relative to human moderation, the use of ML introduces unique challenges to content moderation. One such challenge is predictive multiplicity: multiple competing models for content classification may perform equally well on average, yet assign conflicting predictions to the same content. This multiplicity can result from seemingly innocuous choices during model development, such as random seed selection for parameter initialization. We experimentally demonstrate how content moderation tools can arbitrarily classify samples as toxic, leading to arbitrary restrictions on speech. We discuss these findings in terms of human rights set out by the International Covenant on Civil and Political Rights (ICCPR), namely freedom of expression, non-discrimination, and procedural justice. We analyze (i) the extent of predictive multiplicity among state-of-the-art LLMs used for detecting toxic content; (ii) the disparate impact of this arbitrariness across social groups; and (iii) how model multiplicity compares to unambiguous human classifications. Our findings indicate that the up-scaled algorithmic moderation risks legitimizing an algorithmic leviathan, where an algorithm disproportionately manages human rights. To mitigate such risks, our study underscores the need to identify and increase the transparency of arbitrariness in content moderation applications. Since algorithmic content moderation is being fueled by pressing social concerns, such as disinformation and hate speech, our discussion on harms raises concerns relevant to policy debates. Our findings also contribute to content moderation and intermediary liability laws being discussed and passed in many countries, such as the Digital Services Act in the European Union, the Online Safety Act in the United Kingdom, and the Fake News Bill in Brazil.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2623–2631. https://doi.org/10.1145/3292500.3330701
  2. United Nations General Assembly. 1966. International Covenant on Civil and Political Rights. https://www.ohchr.org/en/instruments-mechanisms/instruments/international-covenant-civil-and-political-rights
  3. United Nations General Assembly. 2022. Tackling Online Hate Speech through Content Moderation: The Legal Framework Under the International Covenant on Civil and Political Rights. https://papers.ssrn.com/abstract=4150909
  4. Legal Taxonomies of Machine Bias: Revisiting Direct Discrimination. In 2023 ACM Conference on Fairness, Accountability, and Transparency. ACM, Chicago IL USA, 1850–1858. https://doi.org/10.1145/3593013.3594121
  5. Less Discriminatory Algorithms. Washington University in St. Louis Legal Studies 2 (forthcoming). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4590481
  6. Selective Ensembles for Consistent Predictions. In International Conference on Learning Representations. https://openreview.net/forum?id=HfUyCRBeQc
  7. Model multiplicity: Opportunities, concerns, and solutions. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 850–863.
  8. Leo Breiman. 2001. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16, 3 (2001), 199 – 231. https://doi.org/10.1214/ss/1009213726
  9. Miriam C. Buiten. 2022. The Digital Services Act: From Intermediary Liability to Platform Regulation. 12, 5 (2022). https://www.jipitec.eu/issues/jipitec-12-5-2021/5491
  10. Deutscher Bundestag. 2017. NetzDG - Gesetz zur Verbesserung der Rechtsdurchsetzung in sozialen Netzwerken. https://www.gesetze-im-internet.de/netzdg/BJNR335210017.html
  11. Jigsaw Unintended Bias in Toxicity Classification. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification
  12. Toxic Comment Classification Challenge. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge
  13. United Nations Human Rights Committee. 2011. General comment No.34 on Article 19: Freedoms of opinion and expression. https://www.ohchr.org/en/documents/general-comments-and-recommendations/general-comment-no34-article-19-freedoms-opinion-and
  14. Nicholas Kluge Corrêa. 2023. Aira. https://doi.org/10.5281/zenodo.6989727
  15. Characterizing Fairness Over the Set of Good Models Under Selective Labels. In Proceedings of the 38th International Conference on Machine Learning. PMLR.
  16. Kathleen Creel and Deborah Hellman. 2021. The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision Making Systems. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 816. https://doi.org/10.1145/3442188.3445942
  17. Kathleen Creel and Deborah Hellman. 2022. The Algorithmic Leviathan: Arbitrariness, Fairness, and Opportunity in Algorithmic Decision-Making Systems. Canadian Journal of Philosophy 52, 1 (2022), 26–43. https://doi.org/10.1017/can.2022.3
  18. SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). 927–934.
  19. Underspecification presents challenges for credibility in modern machine learning. The Journal of Machine Learning Research 23, 1 (2022), 10237–10297.
  20. Fighting Hate Speech, Silencing Drag Queens? Artificial Intelligence in Content Moderation and Risks to LGBTQ Voices Online. 25, 2 (2021), 700–732. https://doi.org/10.1007/s12119-020-09790-w
  21. Evelyn Douek. 2020. Governing Online Speech: From ’Posts-As-Trumps’ to Proportionality and Probability. https://doi.org/10.2139/ssrn.3679607
  22. Evelyn Douek. 2021. The Limits of International Law in Content Moderation. UC Irvine Journal of International, Transnational, and Comparative Law 6 (2021). https://scholarship.law.uci.edu/cgi/viewcontent.cgi?article=1042&context=ucijil
  23. Natasha Duarte and Emma Llansó. 2017. Mixed Messages? The Limits of Automated Social Media Content Analysis. https://cdt.org/insights/mixed-messages-the-limits-of-automated-social-media-content-analysis/
  24. Ronald Dworkin. 2013. Taking rights seriously (paperback ed ed.). Bloomsbury, London.
  25. Tribunal Superior Eleitoral. 2022. RESOLUÇÃO Nº 23.714, DE 20 DE OUTUBRO DE 2022. https://www.tse.jus.br/legislacao/compilada/res/2022/resolucao-no-23-714-de-20-de-outubro-de-2022
  26. Mohsen Fayyaz. 2021. Toxicity Classifier. https://huggingface.co/mohsenfayyaz/toxicity-classifier. Accessed: 2024-01-21.
  27. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research 20, 177 (2019), 1–81.
  28. Tarleton Gillespie. 2020. Content moderation, AI, and the question of scale. Big Data & Society 7, 2 (July 2020), 205395172094323. https://doi.org/10.1177/2053951720943234
  29. Tarleton Gillespie. 2022. Do Not Recommend? Reduction as a Form of Content Moderation. Social Media + Society 8, 3 (July 2022), 205630512211175. https://doi.org/10.1177/20563051221117552
  30. Common sense or censorship: How algorithmic moderators and message type influence perceptions of online content deletion. New Media & Society 25, 10 (Oct. 2023), 2595–2617. https://doi.org/10.1177/14614448211032310
  31. Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society 7, 1 (Jan. 2020), 2053951719897945. https://doi.org/10.1177/2053951719897945 Publisher: SAGE Publications Ltd.
  32. Mads Haahr. 1998-2018. RANDOM.ORG: True Random Number Service. https://www.random.org. Accessed: 2018-06-01.
  33. Laura Hanu and Unitary team. 2020. Detoxify. Github. https://github.com/unitaryai/detoxify.
  34. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 3309–3326. https://doi.org/10.18653/v1/2022.acl-long.234
  35. Disentangling Model Multiplicity in Deep Learning. arXiv:2206.08890 [cs.LG]
  36. An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, and Rahul Gupta (Eds.). Association for Computational Linguistics, Toronto, Canada, 121–134. https://doi.org/10.18653/v1/2023.trustnlp-1.11
  37. The White House. 2020. Executive Order on Preventing Online Censorship – The White House. https://trumpwhitehouse.archives.gov/presidential-actions/executive-order-preventing-online-censorship/
  38. Hsiang Hsu and Flavio Calmon. 2022. Rashomon Capacity: A Metric for Predictive Multiplicity in Classification. In 36th Proceedings of Neural Information Processing Systems. Curran Associates Inc.
  39. Daphne Keller. 2019. Build Your Own Intermediary Liability Law: A Kit for Policy Wonks of All Ages. https://cyberlaw.stanford.edu/publications/build-your-own-intermediary-liability-law-kit-policy-wonks-all-ages
  40. Daphne Keller. 2021. Empirical Evidence of Over-Removal by Internet Companies Under Intermediary Liability Laws: An Updated List. https://cyberlaw.stanford.edu/blog/2021/02/empirical-evidence-over-removal-internet-companies-under-intermediary-liability-laws
  41. Constructing interval variables via faceted rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277 (2020).
  42. Jigsaw Multilingual Toxic Comment Classification. https://kaggle.com/competitions/jigsaw-multilingual-toxic-comment-classification
  43. Olivier Knox. 2023. Analysis | Biden calls for changing Big Tech moderation rules. But not how. Washington Post (Jan. 2023). https://www.washingtonpost.com/politics/2023/01/12/biden-calls-changing-big-tech-moderation-rules-not-how/
  44. Pascal D König. 2020. Dissecting the algorithmic leviathan: On the socio-political anatomy of algorithmic governance. Philosophy & Technology 33, 3 (2020), 467–485.
  45. Arbitrary Decisions are a Hidden Cost of Differentially Private Training. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1609–1623. https://doi.org/10.1145/3593013.3594103
  46. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
  47. Individual Arbitrariness and Group Fairness. In Thirty-seventh Conference on Neural Information Processing Systems.
  48. Caio C. V. Machado and Thaís Helena Aguiar. 2023. Emerging Regulations on Content Moderation and Misinformation Policies of Online Media Platforms: Accommodating the Duty of Care into Intermediary Liability Models. 8, 2 (2023), 244–251. https://doi.org/10.1017/bhj.2023.25
  49. Predictive Multiplicity in Classification. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 6765–6774. https://proceedings.mlr.press/v119/marx20a.html
  50. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 14867–14875.
  51. The ethics of algorithms: Mapping the debate. Big Data & Society 3, 2 (2016), 2053951716679679. https://doi.org/10.1177/2053951716679679 arXiv:https://doi.org/10.1177/2053951716679679
  52. David Morar and Bruna Martins dos Santos. 2020. The push for content moderation legislation around the world. https://www.brookings.edu/articles/the-push-for-content-moderation-legislation-around-the-world/
  53. King’s Printer of Acts of Parliament. 2023. Online Safety Act 2023. https://www.legislation.gov.uk/ukpga/2023/50/contents/enacted
  54. On the Inevitability of the Rashomon Effect. In 2023 IEEE International Symposium on Information Theory (ISIT). 549–554. https://doi.org/10.1109/ISIT54713.2023.10206657
  55. Martin Pan. [n. d.]. Toxic Content Model. https://huggingface.co/martin-ha/toxic-comment-model. Accessed: 2024-01-21.
  56. Frank Pasquale. 2016. The black box society: the secret algorithms that control money and information (first harvard university press paperback edition ed.). Harvard University Press, Cambridge, Massachusetts London, England.
  57. Artificial Intelligence & Human Rights | Berkman Klein Center. http://nrs.harvard.edu/urn-3:HUL.InstRepos:38021439http://nrs.harvard.edu/urn-3:HUL.InstRepos:38021439
  58. United Nations Human Rights. 2012. Guiding Principles on Business and Human Rights: Implementing the United Nations “Protect, Respect and Remedy” Framework. https://www.ohchr.org/en/publications/reference-publications/guiding-principles-business-and-human-rights
  59. Social Bias Frames: Reasoning about Social and Power Implications of Language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5477–5490. https://doi.org/10.18653/v1/2020.acl-main.486
  60. A Path to Simpler Models Starts With Noise. In 37th Proceedings of Neural Information Processing Systems. https://openreview.net/forum?id=Uzi22WryyX
  61. Florida Senate. 2021. The Florida Senate. https://www.flsenate.gov/Session/Bill/2021/7072/?Tab=BillHistory
  62. European Union. 2023. The EU’s Digital Services Act. https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/digital-services-act_en
  63. Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 1667–1682. https://doi.org/10.18653/v1/2021.acl-long.132
  64. Sandra Wachter and Brent Mittelstadt. 2019. A Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI. (2019). https://doi.org/10.7916/D8-G10S-KA92 Publisher: Columbia University.
  65. Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. Computer Law & Security Review 41 (2021), 105567. https://doi.org/10.1016/j.clsr.2021.105567
  66. Michael L. Waskom. 2021. seaborn: statistical data visualization. Journal of Open Source Software 6, 60 (2021), 3021. https://doi.org/10.21105/joss.03021
  67. Predictive Multiplicity in Probabilistic Classification. In Proceedings of the AAAI Conference on Artificial Intelligence,, Vol. 37. 10306–10314.
  68. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
  69. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1391–1399. https://doi.org/10.1145/3038912.3052591
  70. Exploring the Whole Rashomon Set of Sparse Decision Trees. In Proceedings of Neural Information Processing Systems. Curran Associates Inc.
  71. Jillian York and Corynne McSherry. 2019. Content Moderation is Broken. Let Us Count the Ways.
  72. Challenges in Automated Debiasing for Toxic Language Detection. In EACL.
Citations (4)

Summary

We haven't generated a summary for this paper yet.