Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Critical Reflection on the Use of Toxicity Detection Algorithms in Proactive Content Moderation Systems (2401.10629v2)

Published 19 Jan 2024 in cs.HC

Abstract: Toxicity detection algorithms, originally designed with reactive content moderation in mind, are increasingly being deployed into proactive end-user interventions to moderate content. Through a socio-technical lens and focusing on contexts in which they are applied, we explore the use of these algorithms in proactive moderation systems. Placing a toxicity detection algorithm in an imagined virtual mobile keyboard, we critically explore how such algorithms could be used to proactively reduce the sending of toxic content. We present findings from design workshops conducted with four distinct stakeholder groups and find concerns around how contextual complexities may exasperate inequalities around content moderation processes. Whilst only specific user groups are likely to directly benefit from these interventions, we highlight the potential for other groups to misuse them to circumvent detection, validate and gamify hate, and manipulate algorithmic models to exasperate harm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (100)
  1. Sweta Agrawal and Amit Awekar. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In European conference on information retrieval. Springer, 141–153.
  2. The bright and dark sides of gamification. In International conference on intelligent tutoring systems. Springer, 176–186.
  3. Carolina Are. 2021. The Shadowban Cycle: an autoethnography of pole dancing, nudity and censorship on Instagram. Feminist Media Studies (2021), 1–18.
  4. Carolina Are. 2022. An autoethnography of automated powerlessness: lacking platform affordances in Instagram and TikTok account deletions. Media, Culture & Society (2022).
  5. Zahra Ashktorab and Jessica Vitak. 2016. Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 3895–3905.
  6. Eric PS Baumer. 2015. Reflective informatics: conceptual dimensions for designing technologies of reflection. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 585–594.
  7. Reviewing reflection: on the use of reflection in interactive system design. In Proceedings of the 2014 conference on Designing interactive systems. 93–102.
  8. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623.
  9. ’It’s Reducing a Human Being to a Percentage’ Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 Chi conference on human factors in computing systems. 1–14.
  10. When online harassment is perceived as justified. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.
  11. Understanding “Bad Actors” Online. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI EA ’18). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3170427.3170610
  12. What Are the Activities and Methods of Participatory Design? In Participatory Design. Springer, 49–64.
  13. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
  14. Robyn Caplan. 2018. Content or context moderation? (2018).
  15. # thyghgapp: Instagram content moderation and lexical variation in pro-eating disorder communities. In Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. 1201–1213.
  16. Thread With Caution: Proactively Helping Users Assess and Deescalate Tension in Their Online Discussions. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–37.
  17. Hate is not binary: Studying abusive behavior of# gamergate on twitter. In Proceedings of the 28th ACM conference on hypertext and social media. 65–74.
  18. Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on web science conference. 13–22.
  19. Anyone can become a troll: Causes of trolling behavior in online discussions. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 1217–1230.
  20. Tracy Clark-Flory. 2019. A Troll’s Alleged Attempt to Purge Porn Performers from Instagram. https://jezebel.com/a-trolls-alleged-attempt-to-purge-porn-performers-from-1833940198
  21. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon (pp. 15–16). Retrieved from Association for Computational Linguistics website: https://www. aclweb. org/anthology/W10-2914. pdf (2010).
  22. Adult online hate, harassment and abuse: a rapid evidence assessment. (2019).
  23. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.
  24. How people form folk theories of social media feeds and what it means for how we study self-presentation. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–12.
  25. ” Algorithms ruin everything” # RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media. In Proceedings of the 2017 CHI conference on human factors in computing systems. 3163–3174.
  26. Fighting Hate Speech, Silencing Drag Queens? Artificial Intelligence in Content Moderation and Risks to LGBTQ Voices Online. Sexuality & Culture 25, 2 (2021), 700–732.
  27. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3 (2012), 1–30.
  28. Judith Donath. 2007. Signals, cues and meaning. Signals, Truth and Design (2007).
  29. Maeve Duggan. 2017. Online Harassment. (2017).
  30. Yomna Elsayed and Andrea B Hollingshead. 2022. Humor Reduces Online Incivility. Journal of Computer-Mediated Communication 27, 3 (2022), zmac005.
  31. Thomas Erickson and Wendy A Kellogg. 2000. Social translucence: an approach to designing systems that support social processes. ACM transactions on computer-human interaction (TOCHI) 7, 1 (2000), 59–83.
  32. Karmen Erjavec and Melita Poler Kovačič. 2012. “You Don’t Understand, This is a New War!” Analysis of Hate Speech in News Web Sites’ Comments. Mass Communication and Society 15, 6 (2012), 899–920.
  33. Conformity of eating disorders through content moderation. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–28.
  34. Brittany Fiore-Silfvast. 2012. User-generated warfare: A case of converging wartime information networks and coproductive regulation on YouTube. International Journal of Communication 6 (2012), 24.
  35. Ambiguity as a resource for design. In Proceedings of the SIGCHI conference on Human factors in computing systems. 233–240.
  36. Convolutional neural networks for toxic comment classification. In Proceedings of the 10th hellenic conference on artificial intelligence. 1–6.
  37. Ysabel Gerrard. 2018. Beyond the hashtag: Circumventing content moderation on social media. New Media & Society 20, 12 (2018), 4492–4511.
  38. Tarleton Gillespie. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media. Yale University Press.
  39. All you need is” love” evading hate speech detection. In Proceedings of the 11th ACM workshop on artificial intelligence and security. 2–12.
  40. Disproportionate removals and differing content moderation experiences for conservative, transgender, and black social media users: Marginalization and moderation gray areas. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–35.
  41. LIFT: integrating stakeholder voices into algorithmic team formation. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.
  42. Sameer Hinduja and Justin W Patchin. 2010. Bullying, cyberbullying, and suicide. Archives of suicide research 14, 3 (2010), 206–221.
  43. bell hooks. 2003. Teaching community: A pedagogy of hope. Vol. 36. Psychology Press.
  44. Deceiving google’s perspective api built for detecting toxic comments. arXiv preprint arXiv:1702.08138 (2017).
  45. Dirk Hovy and Shannon L Spruit. 2016. The social impact of natural language processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 591–598.
  46. Exploring lightweight interventions at posting time to reduce the sharing of misinformation on social media. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–42.
  47. Human-machine collaboration for content regulation: The case of reddit automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5 (2019), 1–35.
  48. The view from the other side: The border between controversial speech and harassment on Kotaku in Action. First Monday 23, 2 (Feb. 2018). https://doi.org/10.5210/fm.v23i2.8232
  49. Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 (2017).
  50. The digital outcry: What incites participation behavior in an online firestorm? New Media & Society 20, 9 (2018), 3140–3160.
  51. Birago Jones. 2012. Reflective interfaces: Assisting teens with stressful situations online. Ph. D. Dissertation. Massachusetts Institute of Technology.
  52. Algorithmic folk theories and identity: How TikTok users co-produce Knowledge of identity and engage in algorithmic resistance. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–44.
  53. Reconsidering tweets: Intervening during tweet creation decreases offensive content. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 477–487.
  54. Detection of Hate Tweets using Machine Learning and Deep Learning. In 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 751–758.
  55. Confronting abusive language online: A survey from the ethical and human rights perspective. Journal of Artificial Intelligence Research 71 (2021), 431–478.
  56. The show must go on: A conceptual model of conducting synchronous participatory design with children online. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–16.
  57. Deep text classification can be fooled. arXiv preprint arXiv:1704.08006 (2017).
  58. Yotam Liel and Lior Zalmanson. 2020. What If an AI Told You That 2+ 2 Is 5? Conformity to Algorithmic Recommendations.. In ICIS.
  59. Mary Madden and Aaron Smith. 2010. Reputation management and social media. Pew Internet & American Life Project (2010).
  60. ” You Know What to Do” Proactive Detection of YouTube Videos Targeted by Coordinated Hate Attacks. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–21.
  61. Brandeis Marshall. 2021. Algorithmic misogynoir in content moderation practice. Heinrich-Böll-Stiftung European Union (2021).
  62. Reporting, Reviewing, and Responding to Harassment on Twitter. Available at SSRN 2602018 (2015).
  63. Shaping pro-social interaction in VR: an emerging design framework. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
  64. Abuse is contextual, what about nlp? the role of context in abusive language annotation and detection. arXiv preprint arXiv:2103.14916 (2021).
  65. Analyzing Genetic Testing Discourse on the Web Through the Lens of Twitter, Reddit, and 4chan. ACM Trans. Web 14, 4, Article 17 (aug 2020), 38 pages. https://doi.org/10.1145/3404994
  66. “And We Will Fight for Our Race!” A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 452–463.
  67. Maria D Molina and S Shyam Sundar. 2022. When AI moderates online content: effects of human collaboration and interactive transparency on user trust. Journal of Computer-Mediated Communication 27, 4 (2022), zmac010.
  68. Michael J Muller and Allison Druin. 2012. Participatory design: The third space in human–computer interaction. In The Human–Computer Interaction Handbook. CRC Press, 1125–1153.
  69. Safiya Umoja Noble. 2018. Algorithms of oppression. In Algorithms of Oppression. New York University Press.
  70. Toxicity detection: Does context really matter? arXiv preprint arXiv:2006.00998 (2020).
  71. Understanding online firestorms: Negative word-of-mouth dynamics in social media networks. Journal of Marketing Communications 20, 1-2 (2014), 117–128.
  72. Whitney Phillips. 2015. This is why we can’t have nice things: Mapping the relationship between online trolling and mainstream culture. Mit Press.
  73. Sravana Reddy and Kevin Knight. 2016. Obfuscating gender in social media writing. In Proceedings of the First Workshop on NLP and Computational Social Science. 17–26.
  74. Context-sensitive twitter sentiment classification using neural network. In Thirtieth AAAI conference on artificial intelligence.
  75. A framework of severity for harmful content online. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–33.
  76. Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–27.
  77. Drawing from justice theories to support targets of online harassment. new media & society 23, 5 (2021), 1278–1300.
  78. Shaping pro and anti-social behavior on twitch through moderation and example-setting. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 111–125.
  79. ” I read my Twitter the next morning and was astonished” a conversational perspective on Twitter regrets. In Proceedings of the SIGCHI conference on human factors in computing systems. 3277–3286.
  80. Jitendra Soni. 2021. Tinder tweak urges people to think before sending abuse. https://www.techradar.com/uk/news/tinder-will-alert-users-before-they-send-offensive-messages
  81. The Psychological Well-Being of Content Moderators: The Emotional Labor of Commercial Moderation and Avenues for Improving Support. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445092
  82. Cyber stalking, cyber harassment, and adult mental health: A systematic review. Cyberpsychology, Behavior, and Social Networking 24, 6 (2021), 367–376.
  83. Miriam Sturdee and Joseph Lindley. 2019. Sketching & drawing as future inquiry in HCI. In Proceedings of the Halfway to the Future Symposium 2019. 1–10.
  84. What do we mean when we talk about transparency? Toward meaningful transparency in commercial content moderation. International Journal of Communication 13 (2019), 18.
  85. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
  86. Sok: Hate, harassment, and the changing landscape of online abuse. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 247–267.
  87. ” At the End of the Day Facebook Does What It Wants” How Users Experience Contesting Algorithmic Content Moderation. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–22.
  88. Contestability For Content Moderation. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–28.
  89. “Thinking before posting?” Reducing cyber harassment on social networking sites through a reflective message. Computers in human behavior 66 (2017), 345–352.
  90. How much online abuse is there. Alan Turing Institute (2019).
  91. Follow that sketch: Lifecycles of diagrams and sketches in software development. In 2011 6th International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT). IEEE, 1–8.
  92. A field trial of privacy nudges for facebook. In Proceedings of the SIGCHI conference on human factors in computing systems. 2367–2376.
  93. Privacy nudges for social media: an exploratory Facebook study. In Proceedings of the 22nd international conference on world wide web. 763–770.
  94. ” I regretted the minute I pressed share” a qualitative study of regrets on Facebook. In Proceedings of the seventh symposium on usable privacy and security. 1–16.
  95. “Oops…”: Mobile Message Deletion in Conversation Error and Regret Remediation. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3411764.3445118
  96. Understanding abuse: A typology of abusive language detection subtasks. arXiv preprint arXiv:1705.09899 (2017).
  97. Libby Watson. 2017. Facebook Thinks Saying ”Men Are Trash” Is Hate Speech. https://gizmodo.com/facebook-thinks-saying-men-are-trash-is-hate-speech-1795170688
  98. Oded Yaron. 2012. Another chapter in the Facebook wars: a right-wing group against a creator from the left (translated). https://www.haaretz.co.il/captain/net/2012-06-12/ty-article/0000017f-e622-da9b-a1ff-ee6fdf660000
  99. What is gab: A bastion of free speech or an alt-right echo chamber. In Companion Proceedings of the The Web Conference 2018. 1007–1014.
  100. Twitter trends manipulation: a first look inside the security of twitter trending. IEEE Transactions on Information Forensics and Security 12, 1 (2016), 144–156.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets