Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models (2506.08147v1)

Published 9 Jun 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Social media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied in languages like English and Spanish, Urdu remains underexplored, especially using translation-based approaches. To address this gap, we introduce a trilingual dataset of 10,193 tweets in English (3,834 samples), Urdu (3,197 samples), and Spanish (3,162 samples), collected via keyword filtering, with a balanced distribution of 4,849 Hateful and 5,344 Not-Hateful labels. Our methodology leverages attention layers as a precursor to transformer-based models and LLMs, enhancing feature extraction for multilingual hate speech detection. For non-transformer models, we use TF-IDF for feature extraction. The dataset is benchmarked using state-of-the-art models, including GPT-3.5 Turbo and Qwen 2.5 72B, alongside traditional machine learning models like SVM and other transformers (e.g., BERT, RoBERTa). Three annotators, following rigorous guidelines, ensured high dataset quality, achieving a Fleiss' Kappa of 0.821. Our approach, integrating attention layers with GPT-3.5 Turbo and Qwen 2.5 72B, achieves strong performance, with macro F1 scores of 0.87 for English (GPT-3.5 Turbo), 0.85 for Spanish (GPT-3.5 Turbo), 0.81 for Urdu (Qwen 2.5 72B), and 0.88 for the joint multilingual model (Qwen 2.5 72B). These results reflect improvements of 8.75% in English (over SVM baseline 0.80), 8.97% in Spanish (over SVM baseline 0.78), 5.19% in Urdu (over SVM baseline 0.77), and 7.32% in the joint multilingual model (over SVM baseline 0.82). Our framework offers a robust solution for multilingual hate speech detection, fostering safer digital communities worldwide.

Summary

The paper presents a novel translation-based pipeline that leverages large language models and attention mechanisms to improve multilingual hate speech detection, achieving macro F1 scores up to 0.88.
It curates a robust trilingual dataset of 10,193 tweets in English, Urdu, and Spanish, addressing the scarcity of annotated data for low-resource languages like Urdu.
The study demonstrates significant improvements over traditional models, with gains of up to 8.97% over an SVM baseline, paving the way for scalable, inclusive NLP solutions.

Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with LLMs

The paper "Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with LLMs" addresses the challenge of detecting hate speech across multiple languages on social media platforms, emphasizing the underexplored area of low-resource languages like Urdu. The researchers have curated a substantial trilingual dataset comprised of 10,193 tweets in English, Urdu, and Spanish, annotated with high inter-annotator agreement. This dataset fills a critical gap by providing robust annotated data for Urdu hate speech, a language that presents unique challenges due to the complexity of its script and code-mixed usage.

Methodological Framework

The researchers implement a translation-based pipeline to standardize tweets across languages before applying machine learning, deep learning, and advanced NLP models. They employ state-of-the-art LLMs such as GPT-3.5. Turbo and Qwen 2.5 72B to enhance multilingual hate speech detection capabilities. Additionally, traditional machine learning models like SVM and transformer-based models such as BERT and RoBERTa are used for benchmarking purposes.

The paper demonstrates the efficacy of integrating attention layer mechanisms with LLMs, achieving macro F1 scores of 0.87 for English, 0.85 for Spanish, 0.81 for Urdu, and 0.88 for the joint multilingual model. Notably, these results illustrate significant improvements over classical models, with enhancements of up to 8.97% over the SVM baseline in Spanish language detection.

Implications and Prospective Directions

The paper represents a significant advancement in multilingual NLP, particularly for low-resource languages like Urdu, where traditional methods struggle due to data scarcity and orthographic complexity. By leveraging translation-based preprocessing in conjunction with sophisticated attention-augmented models, the researchers set a precedent for further exploration into cross-lingual embeddings and scalable NLP solutions. The strong performance in English and Spanish datasets highlights the potential of these models to improve online safety through more accurate hate speech detection.

Theoretically, the framework suggests the viability of translation-based approaches to unify linguistic data, addressing both immediate hate speech concerns and contributing to inclusive digital communication across diverse linguistic landscapes. Practically, this approach offers a scalable model pipeline that could be extended to other low-resource languages, improving the inclusivity and effectiveness of automated hate speech detection systems on global platforms.

Limitations and Future Research

While promising, the paper acknowledges limitations in handling low-resource languages with complex scripts and code-mixing forms. Future research should focus on developing language-specific embeddings and improved translation models to address cultural nuances and slang, particularly in Urdu. Enhanced training strategies and the integration of semi-supervised learning methods could further bolster the efficacy of hate speech detection in resource-constrained settings.

In conclusion, the paper underscores the importance of multilingual datasets and advanced NLP techniques in mitigating harmful content on social media, paving the way for safer digital environments worldwide. Continued advancements in LLM development and cross-lingual alignment will be crucial to refining these systems and addressing the multifaceted challenges in hate speech detection across various linguistic contexts.