Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hate speech detection in algerian dialect using deep learning (2309.11611v2)

Published 20 Sep 2023 in cs.CL

Abstract: With the proliferation of hate speech on social networks under different formats, such as abusive language, cyberbullying, and violence, etc., people have experienced a significant increase in violence, putting them in uncomfortable situations and threats. Plenty of efforts have been dedicated in the last few years to overcome this phenomenon to detect hate speech in different structured languages like English, French, Arabic, and others. However, a reduced number of works deal with Arabic dialects like Tunisian, Egyptian, and Gulf, mainly the Algerian ones. To fill in the gap, we propose in this work a complete approach for detecting hate speech on online Algerian messages. Many deep learning architectures have been evaluated on the corpus we created from some Algerian social networks (Facebook, YouTube, and Twitter). This corpus contains more than 13.5K documents in Algerian dialect written in Arabic, labeled as hateful or non-hateful. Promising results are obtained, which show the efficiency of our approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Hate speech detection: Challenges and solutions. PloS one, 14(8):e0221152, 2019.
  2. Beyond definitions. a call for action against hate speech in albania. a comprehensive study november 2021. 2021.
  3. JT Nockleby. hate speech in encyclopedia of the american constitution. electronic journal of academic and special librarianship. 2000.
  4. Ara-women-hate: An annotated corpus dedicated to hate speech detection against women in the arabic community. In Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference, pages 68–75, 2022.
  5. Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 512–515, 2017.
  6. A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR), 51(4):1–30, 2018.
  7. A survey on hate speech detection using natural language processing. In Proceedings of the fifth international workshop on natural language processing for social media, pages 1–10, 2017.
  8. Building a formal model for hate detection in french corpora. Procedia Computer Science, 176:2358–2365, 2020.
  9. A literature review of textual hate speech detection methods and datasets. Information, 13(6):273, 2022.
  10. Comparing pre-trained language models for spanish hate speech detection. Expert Systems with Applications, 166:114120, 2021.
  11. Are they our brothers? analysis and detection of religious hate speech in the arabic twittersphere. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 69–76. IEEE, 2018.
  12. L-hsab: A levantine twitter dataset for hate speech and abusive language. In Proceedings of the third workshop on abusive language online, pages 111–118, 2019.
  13. Arabic algerian oranee dialectal language modelling oriented topic. International Journal of Informatics and Applied Mathematics, 2(2):1–14, 2019.
  14. Offensive language detection in under-resourced algerian dialectal arabic language. arXiv preprint arXiv:2203.10024, 2022.
  15. Transfer Learning and Deep Learning for Multilingual Algerian Dialect Hate Speech Detection. PhD thesis, 2022.
  16. Sentiment analysis of arabic algerian dialect using a supervised method. In 2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS), pages 1–6. IEEE, 2019.
  17. Sexism detection: The first corpus in algerian dialect with a code-switching in arabic/french and english. arXiv preprint arXiv:2104.01443, 2021.
  18. Hate speech dataset from a white supremacy forum. arXiv preprint arXiv:1809.04444, 2018.
  19. Detection of hate speech in social networks: a survey on multilingual corpus. In 6th international conference on computer science and information technology, volume 10, pages 10–5121, 2019.
  20. Sentiment analysis in arabic: A review of the literature. Ain Shams Engineering Journal, 9(4):2479–2490, 2018.
  21. Nizar Y Habash. Introduction to Arabic natural language processing. Springer Nature, 2022.
  22. Qurana: Corpus of the quran annotated with pronominal anaphora. In Lrec, pages 130–137, 2012.
  23. A new corpus and lexicon for offensive tamazight language detection. In Proceedings of the 7th International Workshop on Social Media World Sensors, pages 1–6, 2022.
  24. Deep learning-based analysis of algerian dialect dataset targeted hate speech, offensive language and cyberbullying. International Journal of Computing and Digital Systems, 2023.
  25. Detecting hate speech against politicians in arabic community on social media. International Journal of Web Information Systems, 16(3):295–313, 2020.
  26. Evaluating transfer learning approach for detecting arabic anti-refugee/migrant speech on social media. Aslib Journal of Information Management, 74(6):1070–1088, 2022.
  27. Instagram-based benchmark dataset for cyberbullying detection in arabic text. Data, 7(7):83, 2022.
  28. Optimized twitter cyberbullying detection based on deep learning. In 2018 21st Saudi Computer Society National Computer Conference (NCC), pages 1–5. IEEE, 2018.
  29. Arabic cyberbullying detection: Enhancing performance by using ensemble machine learning. In 2019 international conference on internet of things (ithings) and ieee green computing and communications (greencom) and ieee cyber, physical and social computing (cpscom) and ieee smart data (smartdata), pages 323–327. IEEE, 2019.
  30. Abusive language detection on arabic social media. In Proceedings of the first workshop on abusive language online, pages 52–56, 2017.
  31. Dataset construction for the detection of anti-social behaviour in online communication in arabic. Procedia Computer Science, 142:174–181, 2018.
  32. T-hsab: A tunisian hate speech and abusive dataset. In International conference on Arabic language processing, pages 251–263. Springer, 2019.
  33. Arabic offensive language on twitter: Analysis and experiments. arXiv preprint arXiv:2004.02192, 2020.
  34. Let-mi: an arabic levantine twitter dataset for misogynistic language. arXiv preprint arXiv:2103.10195, 2021.
  35. A deep learning framework for automatic detection of hate speech embedded in arabic tweets. Arabian Journal for Science and Engineering, 46:4001–4014, 2021.
  36. Arabic offensive and hate speech detection using a cross-corpora multi-task learning model. In Informatics, volume 8, page 69. MDPI, 2021.
  37. A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Advances in Science, Technology and Engineering Systems Journal, 2(6):275–284, 2017.
  38. Farasa: A fast and furious segmenter for arabic. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations, pages 11–16, 2016.
  39. Dziribert: a pre-trained language model for the algerian dialect. arXiv preprint arXiv:2109.12346, 2021.
  40. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.
  41. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  42. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022.
  43. Arat5: Text-to-text transformers for arabic language understanding and generation. arXiv preprint arXiv:2109.12068, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Dihia Lanasri (3 papers)
  2. Juan Olano (1 paper)
  3. Sifal Klioui (2 papers)
  4. Sin Liang Lee (1 paper)
  5. Lamia Sekkai (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets