Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse (2404.12042v1)
Abstract: The prevalence of digital media and evolving sociopolitical dynamics have significantly amplified the dissemination of hateful content. Existing studies mainly focus on classifying texts into binary categories, often overlooking the continuous spectrum of offensiveness and hatefulness inherent in the text. In this research, we present an extensive benchmark dataset for Amharic, comprising 8,258 tweets annotated for three distinct tasks: category classification, identification of hate targets, and rating offensiveness and hatefulness intensities. Our study highlights that a considerable majority of tweets belong to the less offensive and less hate intensity levels, underscoring the need for early interventions by stakeholders. The prevalence of ethnic and political hatred targets, with significant overlaps in our dataset, emphasizes the complex relationships within Ethiopia's sociopolitical landscape. We build classification and regression models and investigate the efficacy of models in handling these tasks. Our results reveal that hate and offensive speech can not be addressed by a simplistic binary classification, instead manifesting as variables across a continuous range of values. The Afro-XLMR-large model exhibits the best performances achieving F1-scores of 75.30%, 70.59%, and 29.42% for the category, target, and regression tasks, respectively. The 80.22% correlation coefficient of the Afro-XLMR-large model indicates strong alignments.
- Design and implementation of a multichannel convolutional neural network for hate speech detection in social networks. Revue d’Intelligence Artificielle, 36(2):175–183.
- Halefom H Abraha. 2017. Examining approaches to Internet regulation in Ethiopia. Information and Communications Technology Law, 26(3):293–311.
- Adapting pre-trained language models to African languages via multilingual adaptive fine-tuning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4336–4349, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Challenges of Amharic hate speech data annotation using Yandex Toloka crowdsourcing platform. In Proceedings of the sixth Widening NLP Workshop (WiNLP), Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- The 5Js in Ethiopia: Amharic hate speech data annotation using Toloka Crowdsourcing Platform. In Proceedings of the 4th International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pages 114–120, Bahir Dar, Ethiopia.
- Multilingual racial hate speech detection using transfer learning. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 41–48, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Exploring Amharic hate speech data collection and classification approaches. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 49–59, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Babak Bahador. 2023. Monitoring hate speech and the limits of current definition. In Christian Strippel, Sünje Paasch-Colberg, Martin Emmer, and Joachim Trebbe, editors, Challenges and perspectives of hate speech research, volume 12 of Digital Communication Research, pages 291–298. Berlin.
- A Turkish hate speech dataset and detection system. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4177–4185, Marseille, France. European Language Resources Association.
- João Bran and Adeline Hulin. 2023. Social Media 4 Peace: local lessons for global practices. Countering hate speech. the United Nations Educational, Scientific and Cultural Organization (UNESCO).
- Tommaso Caselli and Hylke Van Der Veen. 2023. Benchmarking offensive and abusive language in Dutch tweets. In The 7th Workshop on Online Abuse and Harms (WOAH), Toronto, Canada. Association for Computational Linguistics.
- AbuseAnalyzer: Abuse detection, severity and target prediction for gab posts. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6277–6283, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Social media hate speech in the walk of Ethiopian political reform: analysis of hate speech prevalence, severity, and natures. Information, Communication & Society, 26(1):218–237.
- Rule by example: Harnessing logical rules for explainable hate speech detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 364–376, Toronto, Canada. Association for Computational Linguistics.
- Unsupervised cross-lingual representation learning at scale. CoRR, abs/1911.02116.
- Racial bias in hate speech and abusive language detection datasets. In Proceedings of the Third Workshop on Abusive Language Online, pages 25–35, Florence, Italy. Association for Computational Linguistics.
- Automated hate speech detection and the problem of offensive language. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media, volume 11, pages 512–515, Montréal, QC, Canada. Association for Computational Linguistics.
- Abreham Gebremedin Debele, Michael Melese and Woldeyohannis. 2022. Multimodal Amharic hate speech detection using deep learning. In 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), pages 102–107. IEEE.
- Naol Bakala Defersha and Kula Kekeba Tune. 2021. Detection of hate speech text in afan oromo social media using machine learning approach. Indian Journal of Science Technology, 14(31):2567–2578.
- Amharic language hate speech detection system from Facebook memes using deep learning system. Available at SSRN 4389914.
- Detox: A comprehensive dataset for German offensive language and conversation analysis. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 143–153, Seattle, Washington (Hybrid). Association for Computational Linguistics.
- AfroLM: A self-active learning-based multilingual pretrained language model for 23 African languages. In Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 52–64, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Toxic, hateful, offensive or abusive? What are we really classifying? An empirical analysis of hate speech datasets. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6786–6794, Marseille, France. European Language Resources Association.
- Likert scale: Explored and explained. British journal of applied science & technology, 7(4):396–403.
- Contextualizing hate speech classifiers with post-hoc explanation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5435–5442, Online. Association for Computational Linguistics.
- Confronting abusive language online: A survey from the ethical and human rights perspective. J. Artif. Intell. Res., 71:431–478.
- In data we trust: A critical analysis of hate speech detection datasets. In Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 150–161, Online. Association for Computational Linguistics.
- HateXplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, pages 14867–14875, Palo Alto, CA, USA. Association for the Advancement of Artificial Intelligence.
- Zewdie Mossie and Jenq-Haur Wang. 2018. Social network hate speech detection for Amharic language. In 4th International Conference on Natural Language Computing (NATL2018), pages 41–55, Dubai, United Arab Emirates. AIRCC Publishing.
- Zewdie Mossie and Jenq-Haur Wang. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing & Management, 57(3):1–16.
- Ghaderi Hajat Mostafa and Mirzaei Tabar Meysam. 2023. The impact of spatial injustice on ethnic conflict in Ethiopia. Geopolitics Quarterly, 19(70):41–65.
- An in-depth analysis of implicit and subtle hate speech messages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1997–2013, Dubrovnik, Croatia. Association for Computational Linguistics.
- Small data? No problem! Exploring the viability of pretrained multilingual language models for low-resourced Languages. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 116–126, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Multilingual and multi-aspect hate speech analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4675–4684, Hong Kong, China. Association for Computational Linguistics.
- Deeper attention to abusive user content moderation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1125–1135, Copenhagen, Denmark. Association for Computational Linguistics.
- Respectful or toxic? Using zero-shot learning with language models to detect hate speech. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 60–68, Toronto, Canada. Association for Computational Linguistics.
- The measuring hate speech corpus: Leveraging rasch measurement theory for data perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 83–94, Marseille, France. European Language Resources Association.
- Salim Sazzed. 2023. Discourse mode categorization of Bengali social media health text. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 52–57, Toronto, Canada. Association for Computational Linguistics.
- Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5):1763–1768.
- Gudbjartur Ingi Sigurbergsson and Leon Derczynski. 2020. Offensive language and hate speech detection for Danish. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3498–3508, Marseille, France. European Language Resources Association.
- Basu Prasad Subedi. 2016. Using Likert type data in social science research: Confusion, issues and challenges. International journal of contemporary applied sciences, 3(2):36–49.
- Surafel Getachew Tesfaye and Kula Kakeba. 2020. Automated Amharic hate speech Posts and comments detection model using recurrent neural network. Preprint. Version 1.
- Thirty years of research into hate speech: topics of interest and their evolution. Scientometrics, 126(1):157–179.
- Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop, pages 88–93, San Diego, CA, USA. Association for Computational Linguistics.
- Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic. In In Proceedings of International Conference On Language Technologies For All: Enabling Linguistic Diversity And Multilingualism Worldwide (LT4ALL 2019), pages 210v–214, Paris, France.
- Introducing various semantic models for amharic: Experimentation and evaluation with multiple tasks and datasets. Future Internet, 13(11).
- A legal approach to hate speech – operationalizing the EU’s legal framework against the expression of hatred as an NLP task. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 53–64, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Abinew Ali Ayele (17 papers)
- Esubalew Alemneh Jalew (1 paper)
- Adem Chanie Ali (2 papers)
- Seid Muhie Yimam (41 papers)
- Chris Biemann (78 papers)