Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using LLMs to discover emerging coded antisemitic hate-speech in extremist social media (2401.10841v2)

Published 19 Jan 2024 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned LLM, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in SocialNLP@EACL, 2017.
  2. P. Fortuna and S. Nunes, “A survey on automatic detection of hate speech in text,” ACM Computing Surveys (CSUR), vol. 51, pp. 1 – 30, 2018.
  3. F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, and V. Patti, “Resources and benchmark corpora for hate speech detection: a systematic review,” Language Resources and Evaluation, vol. 55, pp. 477 – 523, 2020.
  4. M. S. Jahan and M. Oussalah, “A systematic review of hate speech automatic detection using natural language processing,” Neurocomputing, vol. 546, p. 126232, 2021.
  5. A. M. Founta, D. Chatzakou, N. Kourtellis, J. Blackburn, A. Vakali, and I. Leontiadis, “A unified deep learning architecture for abuse detection,” in Proceedings of the 10th ACM conference on web science, pp. 105–114, 2019.
  6. J. Serra, I. Leontiadis, D. Spathis, G. Stringhini, J. Blackburn, and A. Vakali, “Class-based prediction errors to detect hate speech with out-of-vocabulary words,” in Proceedings of the first workshop on abusive language online, pp. 36–40, 2017.
  7. B. Gambäck and U. K. Sikdar, “Using convolutional neural networks to classify hate-speech,” in Proceedings of the first workshop on abusive language online, pp. 85–90, 2017.
  8. G. Wiedemann, S. M. Yimam, and C. Biemann, “Uhh-lt & lt2 at semeval-2020 task 12: Fine-tuning of pre-trained transformer networks for offensive language detection,” ArXiv, vol. abs/2004.11493, 2020.
  9. M. Schwarz-Friesel and J. Reinharz, Inside the antisemitic mind: the language of Jew-Hatred in contemporary Germany. Brandeis University Press, 2017.
  10. S. Zannettou, J. Finkelstein, B. Bradlyn, and J. Blackburn, “A quantitative approach to understanding online antisemitism,” in Proceedings of the International AAAI conference on Web and Social Media, vol. 14, pp. 786–797, 2020.
  11. G. Jikeli, D. Cavar, and D. Miehling, “Annotating antisemitic online content. towards an applicable definition of antisemitism,” arXiv preprint arXiv:1910.01214, 2019.
  12. M. Chandra, D. R. Pailla, H. Bhatia, A. J. Sanchawala, M. Gupta, M. Shrivastava, and P. Kumaraguru, ““subverting the jewtocracy”: Online antisemitism detection using multimodal deep learning,” Proceedings of the 13th ACM Web Science Conference 2021, 2021.
  13. N. A. Cloutier and N. Japkowicz, “Fine-tuned generative llm oversampling can improve performance over traditional techniques on multiclass imbalanced text classification,” IEEE COnfernece on Big Data, 2023.
  14. G. Jikeli, S. Karali, D. Miehling, and K. Soemer, “Antisemitic messages? a guide to high-quality annotation and a labeled dataset of tweets,” ArXiv, vol. abs/2304.14599, 2023.
  15. S. Parker and D. Ruths, “Is hate speech detection the solution the world wants?,” Proceedings of the National Academy of Sciences of the United States of America, vol. 120, 2023.
  16. R. U. Mustafa, M. S. Nawaz, J. Farzund, M. Lali, B. Shahzad, and P. Viger, “Early detection of controversial urdu speeches from social media,” Data Sci. Pattern Recognit., vol. 1, no. 2, pp. 26–42, 2017.
  17. A. Glazkova, “A comparison of text preprocessing techniques for hate and offensive speech detection in twitter,” Social Network Analysis and Mining, vol. 13, pp. 1–28, 2023.
  18. E. Loper and S. Bird, “Nltk: The natural language toolkit,” arXiv preprint cs/0205028, 2002.
  19. J. Ramos et al., “Using tf-idf to determine word relevance in document queries,” in Proceedings of the first instructional conference on machine learning, vol. 242:1, pp. 29–48, Citeseer, 2003.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets