Close to Human-Level Agreement: Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations (2401.02001v1)
Abstract: This study investigates the prevalence of violent language on incels.is. It evaluates GPT models (GPT-3.5 and GPT-4) for content analysis in social sciences, focusing on the impact of varying prompts and batch sizes on coding quality for the detection of violent speech. We scraped over 6.9M posts from incels.is and categorized a random sample into non-violent, explicitly violent, and implicitly violent content. Two human coders annotated 3,028 posts, which we used to tune and evaluate GPT-3.5 and GPT-4 models across different prompts and batch sizes regarding coding reliability. The best-performing GPT-4 model annotated an additional 30,000 posts for further analysis. Our findings indicate an overall increase in violent speech overtime on incels.is, both at the community and individual level, particularly among more engaged users. While directed violent language decreases, non-directed violent language increases, and self-harm content shows a decline, especially after 2.5 years of user activity. We find substantial agreement between both human coders (K = .65), while the best GPT-4 model yields good agreement with both human coders (K = 0.54 for Human A and K = 0.62 for Human B). Weighted and macro F1 scores further support this alignment. Overall, this research provides practical means for accurately identifying violent language at a large scale that can aid content moderation and facilitate next-step research into the causal mechanism and potential mitigations of violent expression and radicalization in communities like incels.is.
- From “Incel” to “Saint”: Analyzing the violent worldview behind the 2018 Toronto attack. Terrorism and Political Violence 33, 8 (2021), 1667–1691.
- Lewys Brace. 2021. A short introduction to the involuntary celibate sub-culture. https://crestresearch.ac.uk/resources/a-short-introduction-to-the-involuntary-celibate-sub-culture/. Accessed: 2023-12-07.
- Incels, violence and mental disorder: A narrative review with recommendations for best practice in risk assessment and clinical intervention. BJPsych Advances 29, 4 (2023), 254–264.
- Hatebert: Retraining bert for abusive language detection in English. arXiv preprint arXiv:2010.12472 (2020).
- Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.
- A witch’s brew of grievances: the potential effects of COVID-19 on radicalization to violent extremism. Studies in Conflict & Terrorism (2021), 1–24.
- Anna Gibson. 2019. Free speech and safe spaces: How moderation policies shape online discussion spaces. Social Media+ Society 5, 1 (2019), 2056305119832588.
- Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).
- Making a radical misogynist: How online social engagement with the manosphere influences traits of radicalization. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–28.
- Christine Hauser. 2017. Reddit bans ’Incel’ group for inciting violence against women. https://www.nytimes.com/2017/11/09/technology/incels-reddit-banned.html. Accessed: 2023-09-12.
- Assessing the threat of incel violence. Studies in Conflict & Terrorism 43, 7 (2020), 565–587.
- Is ChatGPT better than human annotators? potential and limitations of ChatGPT in explaining implicit hate speech. arXiv preprint arXiv:2302.07736 (2023).
- Online hatred of women in the Incels.me forum: Linguistic analysis and automatic detection. Journal of Language Aggression and Conflict 7, 2 (2019), 240–268.
- Hamed Jelodar and Richard Frank. 2021. Semantic knowledge discovery and discussion mining of Incel online community: Topic modeling. arXiv preprint arXiv:2104.09586 (2021).
- A systematic review on hate speech among children and adolescents: definitions, prevalence, and overlap with related phenomena. Trauma, Violence, & Abuse 24, 4 (2023), 2598–2615.
- Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale. Language Resources and Evaluation (2022), 1–30.
- Computational Social Science. Science 323, 5915 (2009), 721–723. https://doi.org/10.1126/science.1167742 arXiv:https://www.science.org/doi/pdf/10.1126/science.1167742
- ” HOT” ChatGPT: The promise of ChatGPT in detecting and discriminating hateful, offensive, and toxic comments on social media. arXiv preprint arXiv:2304.10619 (2023).
- Angus Lindsay. 2022. Swallowing the black pill: Involuntary celibates’(Incels) anti-feminism within digital society. International Journal for Crime, Justice and Social Democracy 11, 1 (2022), 210–224.
- Spread of hate speech in online social media. In Proceedings of the 10th ACM Conference on Web Science. 173–182.
- Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks. arXiv preprint arXiv:2304.13861 (2023).
- JB Mountford. 2018. Topic modeling the red pill. Social Sciences 7, 3 (2018), 42.
- Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. 145–153.
- Catharina O’Donnell and Eran Shor. 2022. “This is a political movement, friend”: Why “incels” support violence. The British Journal of Sociology 73, 2 (2022), 336–351.
- An exploration of the involuntary celibate (incel) subculture online. Journal of Interpersonal Violence 37, 7-8 (2022), NP4981–NP5008.
- Toxic language in online incel communities. SN Social Sciences 1 (2021), 1–22.
- Pathways through conspiracy: the evolution of conspiracy radicalization through engagement in online conspiracy discussions. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 770–781.
- Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation 55 (2021), 477–523.
- Developing an online hate classifier for multiple social media platforms. Human-centric Computing and Information Sciences 10 (2020), 1–34.
- Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the fifth international workshop on natural language processing for social media. 1–10.
- Exposure to hate speech increases prejudice through desensitization. Aggressive Behavior 44, 2 (2018), 136–146.
- Anne Speckhard and Molly Ellenberg. 2022. Self-reported psychiatric disorder and perceived psychological symptom rates among involuntary celibates (incels) and their perceptions of mental health treatment. Behavioral Sciences of Terrorism and Political Aggression (2022), 1–18.
- Stefan Stijelja and Brian L Mishara. 2023. Characteristics of Incel forum users: Social network analysis and chronological posting patterns. Studies in Conflict & Terrorism (2023), 1–21.
- Wienke Strathern and Juergen Pfeffer. 2023. Identifying Different Layers of Online Misogyny. Workshop Proceedings of the 17th International AAAI Conference on Web and Social Media. https://doi.org/10.36190/2023.54
- Texas Department of Public Safety. 2020. Texas Domestic Terrorism Threat Assessment.
- Understanding abuse: A typology of abusive language detection subtasks. arXiv preprint arXiv:1705.09899 (2017).
- Wenjie Yin and Arkaitz Zubiaga. 2021. Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Computer Science 7 (2021), e598.
- Daniel Matter (6 papers)
- Miriam Schirmer (6 papers)
- Nir Grinberg (5 papers)
- Jürgen Pfeffer (16 papers)