Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Investigation of Large Language Models for Real-World Hate Speech Detection (2401.03346v1)

Published 7 Jan 2024 in cs.CY, cs.AI, cs.CL, cs.LG, and cs.SI

Abstract: Hate speech has emerged as a major problem plaguing our social spaces today. While there have been significant efforts to address this problem, existing methods are still significantly limited in effectively detecting hate speech online. A major limitation of existing methods is that hate speech detection is a highly contextual problem, and these methods cannot fully capture the context of hate speech to make accurate predictions. Recently, LLMs have demonstrated state-of-the-art performance in several natural language tasks. LLMs have undergone extensive training using vast amounts of natural language data, enabling them to grasp intricate contextual details. Hence, they could be used as knowledge bases for context-aware hate speech detection. However, a fundamental problem with using LLMs to detect hate speech is that there are no studies on effectively prompting LLMs for context-aware hate speech detection. In this study, we conduct a large-scale study of hate speech detection, employing five established hate speech datasets. We discover that LLMs not only match but often surpass the performance of current benchmark machine learning models in identifying hate speech. By proposing four diverse prompting strategies that optimize the use of LLMs in detecting hate speech. Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech by fully utilizing the knowledge base in LLMs, significantly outperforming existing techniques. Furthermore, although LLMs can provide a rich knowledge base for the contextual detection of hate speech, suitable prompting strategies play a crucial role in effectively leveraging this knowledge base for efficient detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. K. Thomas, D. Akhawe, M. Bailey, D. Boneh, E. Bursztein, S. Consolvo, N. Dell, Z. Durumeric, P. G. Kelley, D. Kumar et al., “Sok: Hate, harassment, and the changing landscape of online abuse,” in 2021 IEEE Symposium on Security and Privacy (SP).   IEEE, 2021, pp. 247–267.
  2. S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder, “Hate speech detection: Challenges and solutions,” PLOS ONE, vol. 14, no. 8, pp. 1–16, 08 2019. [Online]. Available: https://doi.org/10.1371/journal.pone.0221152
  3. “Hateful conduct policy,” accessed: 2022-08-06. [Online]. Available: https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy
  4. “Twitter says ai flags over half of tweets violating terms of services,” accessed: 2022-08-06. [Online]. Available: https://www.emergingtechbrew.com/stories/2020/07/20/twitter-says-ai-flags-half-tweets-violating-terms-services
  5. “How facebook uses super-efficient ai models to detect hate speech,” accessed: 2022-08-06. [Online]. Available: https://ai.facebook.com/blog/how-facebook-uses-super-efficient-ai-models-to-detect-hate-speech/
  6. M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang, and E. Belding, “Hate lingo: A target-based linguistic analysis of hate speech in social media,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, no. 1, 2018.
  7. C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive language detection in online user content,” in Proceedings of the 25th international conference on world wide web, 2016, pp. 145–153.
  8. A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media.   Valencia, Spain: Association for Computational Linguistics, Apr. 2017, pp. 1–10. [Online]. Available: https://aclanthology.org/W17-1101
  9. OpenAI, “Introducing chatgpt,” https://openai.com/blog/chatgpt/, 2022.
  10. Y. Bang, S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie, H. Lovenia, Z. Ji, T. Yu, W. Chung, Q. V. Do, Y. Xu, and P. Fung, “A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity,” 2023.
  11. B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, and Y. Wu, “How close is chatgpt to human experts? comparison corpus, evaluation, and detection,” 2023.
  12. W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J.-Y. Nie, and J.-R. Wen, “A survey of large language models,” 2023.
  13. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Comput. Surv., vol. 55, no. 9, jan 2023. [Online]. Available: https://doi.org/10.1145/3560815
  14. “Hate crime laws,” https://www.justice.gov/crt/hate-crime-laws, accessed: 2022-08-14.
  15. “Hate Speech,” 2021, accessed: 2022-08-03. [Online]. Available: https://transparency.fb.com/policies/community-standards/hate-speech/?from=https%3A%2F%2Fm.facebook.com%2Fcommunitystandards%2Fhate_speech%2F&refsrc=deprecated
  16. M. Duggan, “Online harassment 2017,” 2017.
  17. “Microsoft, “civility, safety & interaction online”,” accessed: 2022-03-03. [Online]. Available: https://news.microsoft.com/wp-content/uploads/prod/sites/421/2020/02/Digital-Civility-2020-Global-Report.pdf
  18. “Facebook community standards enforcement report,” https://transparency.fb.com/data/community-standards-enforcement/, note = Accessed: 2022-08-14.
  19. “A healthier twitter: Progress and more to do,” https://blog.twitter.com/en_us/topics/company/2019/health-update, accessed: 2022-08-14.
  20. “Content moderators at YouTube, Facebook and Twitter see the worst of the web and suffer silently,” accessed: 2022-08-13. [Online]. Available: {https://www.washingtonpost.com/technology/2019/07/25/social-media-companies-are-outsourcing-their-dirty-work-philippines-generation-workers-is-paying-price/}
  21. K.-L. Chiu, A. Collins, and R. Alexander, “Detecting hate speech with gpt-3,” 2022.
  22. L. Li, L. Fan, S. Atreja, and L. Hemphill, “”hot” chatgpt: The promise of chatgpt in detecting and discriminating hateful, offensive, and toxic comments on social media,” 2023.
  23. X. He, S. Zannettou, Y. Shen, and Y. Zhang, “You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content,” 2023.
  24. Q. Zhong, L. Ding, J. Liu, B. Du, and D. Tao, “Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert,” 2023.
  25. F. Huang, H. Kwak, and J. An, “Chain of explanation: New prompting method to generate quality natural language explanation for implicit hate speech,” in Companion Proceedings of the ACM Web Conference 2023.   ACM, apr 2023. [Online]. Available: https://doi.org/10.1145%2F3543873.3587320
  26. B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, and A. Mukherjee, “Hatexplain: A benchmark dataset for explainable hate speech detection,” arXiv preprint arXiv:2012.10289, 2020.
  27. B. He, C. Ziems, S. Soni, N. Ramakrishnan, D. Yang, and S. Kumar, “Racism is a virus: Anti-asian hate and counterspeech in social media during the covid-19 crisis,” 2021.
  28. M. Samory, I. Sen, J. Kohne, F. Floeck, and C. Wagner, ““call me sexist, but…”: Revisiting sexism detection using psychological scales and adversarial samples,” 2021.
  29. L. Grimminger and R. Klinger, “Hate towards the political opponent: A Twitter corpus study of the 2020 US elections on the basis of offensive speech and stance detection,” in Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.   Online: Association for Computational Linguistics, Apr. 2021, pp. 171–180. [Online]. Available: "https://aclanthology.org/2021.wassa-1.18"
  30. A. Jiang, X. Yang, Y. Liu, and A. Zubiaga, “Swsr: A chinese dataset and lexicon for online sexism detection,” Online Social Networks and Media, vol. 27, p. 100182, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2468696421000604
  31. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS’20.   Red Hook, NY, USA: Curran Associates Inc., 2020.
  32. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
  33. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2019.
  34. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” 2019.
  35. B. Vidgen, T. Thrush, Z. Waseem, and D. Kiela, “Learning from the worst: Dynamically generated datasets to improve online hate detection,” 2021.
  36. W. Zhu, H. Liu, Q. Dong, J. Xu, S. Huang, L. Kong, J. Chen, and L. Li, “Multilingual machine translation with large language models: Empirical results and analysis,” 2023.
  37. Y. Qu, X. He, S. Pierson, M. Backes, Y. Zhang, and S. Zannettou, “On the evolution of (hateful) memes by means of multimodal contrastive learning,” in 2023 IEEE Symposium on Security and Privacy (SP), 2023, pp. 293–310.
  38. D. Kiela, H. Firooz, A. Mohan, V. Goswami, A. Singh, P. Ringshia, and D. Testuggine, “The hateful memes challenge: Detecting hate speech in multimodal memes,” 2021.
  39. K. Cuo, W. Zhao, M. Jaden, V. Vishwamitra, Z. Zhao, and H. Hu, “Understanding the generalizability of hateful memes detection models against covid-19-related hateful memes,” International Conference on Machine Learning and Applications. [Online]. Available: https://par.nsf.gov/biblio/10399964
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Keyan Guo (6 papers)
  2. Alexander Hu (1 paper)
  3. Jaden Mu (3 papers)
  4. Ziheng Shi (1 paper)
  5. Ziming Zhao (25 papers)
  6. Nishant Vishwamitra (13 papers)
  7. Hongxin Hu (27 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets