Bengali Hate Speech Detection Research

Updated 26 October 2025

Bengali hate speech detection is defined by automated identification, categorization, and explanation of hateful content through rich, multi-label annotated datasets.
Advanced models like BanglaBERT, transformer-based architectures, and parameter-efficient tuning demonstrate significant improvements in detection accuracy.
Specialized preprocessing, explainability techniques, and error analysis address challenges posed by code-mixing, dialectal variation, and transliteration in Bengali texts.

Bengali hate speech detection is an active research field at the intersection of NLP, deep learning, and social computing, aiming to automatically identify, categorize, and explain various forms of hateful or offensive content originating in the Bengali language and its code-mixed or transliterated forms. The ecosystem is characterized by a proliferation of specialized datasets, multi-label annotation schemes, and the adaptation of state-of-the-art neural architectures, all driven by the acute sociotechnical challenge posed by hate speech in online Bengali discourse.

1. Dataset Development: Scope and Granularity

The last five years have witnessed the emergence of large, expertly annotated Bengali hate speech datasets covering a wide spectrum of sources, linguistic varieties, and annotation schemes. Early works constructed datasets through bootstrapping with slur lexicons and subsequent manual annotation—e.g., (Karim et al., 2020) assembled a 35,000 statement dataset divided across Political, Religious, Gender Abusive, Geopolitical, and Personal categories with rigorous cleaning and normalization, establishing a paradigm for fine-grained dataset construction. Recent corpora, such as BD-SHS (Romim et al., 2022), expand to over 50,000 examples, introduce hierarchical, multilabel annotations (e.g., identifying targets and types of hate), and strive for balance and coverage of diverse social contexts. The BanTH dataset (Haider et al., 17 Oct 2024) introduces multi-label annotation for transliterated Bangla (Romanized), while BIDWESH (Fayaz et al., 22 Jul 2025) and BOISHOMMO (Kafi et al., 11 Apr 2025) contribute dialectal and multifaceted hate speech corpora, respectively, broadening linguistic and topical inclusiveness.

An overview of select dataset characteristics:

Dataset	Samples	Labels/Classes	Annotation	Notable Features
(Karim et al., 2020)	35,000	5 HS types	Manual, majority	N-gram filtering, bootstrapping, POS, normalization
BD-SHS (Romim et al., 2022)	50,281	Binary + multilabel (Target/Type)	Hierarchical, iterative	Informal embeddings, task splits, multi-domain
BanTH (Haider et al., 17 Oct 2024)	37,350	Multi-label (Transliterated)	Multi-annotator + expert	LLM and translation-based evaluation
BOISHOMMO (Kafi et al., 11 Apr 2025)	2,499	10 HS attributes (multi-label)	Majority vote	Non-Latin script, Cohen’s κ analysis
BIDWESH (Fayaz et al., 22 Jul 2025)	9,183	4 Type × 4 Target (Dialectal)	Native dialect experts	Chittagong, Barishal, Noakhali

This breadth reflects an evolving consensus: high-quality Bengali hate speech detection demands both large-scale, linguistically diverse, and contextually granular annotated corpora.

2. Model Architectures: Classical, Deep, and Large-scale Approaches

Early efforts relied on classical feature engineering—SVMs with TF–IDF or n-gram inputs—but deep learning approaches rapidly became dominant. The multichannel convolutional-LSTM network (MConv-LSTM) (Karim et al., 2020) integrates convolutional filters (capturing local, n-gram patterns) and a parallel LSTM (modeling sentence-level dependencies), outperforming classical baselines by 7+ F1 points. Informal social-media-trained word embeddings (e.g., IFT, informal FastText SG) are repeatedly shown to outperform formal news/wiki-based embeddings (Romim et al., 2021, Romim et al., 2022), likely due to better handling of noisy, code-mixed, or dialectal speech. Bi-LSTM with informal embeddings achieves F1 ≈ 87%.

The field has shifted decisively toward transformer-based architectures:

Monolingual models: BanglaBERT consistently achieves strong performance (>0.75 F1), especially for multi-task or nuanced classification (Narayan et al., 2023, Hasan et al., 2 Oct 2025).
Multilingual PLMs: XLM-RoBERTa and mBERT, pre-trained on diverse languages and scripts, show robust transferability and remain competitive or state-of-the-art when adapted via further pretraining or task-specific finetuning, as in (Mim et al., 2023, Haider et al., 17 Oct 2024).
LLMs and PEFT: Recent work leverages LLaMA, Mistral, and Gemma with parameter-efficient adapters (LoRA/QLoRA), demonstrating strong F1 (up to 92%) while restricting finetuning to <1% of model parameters (Islam et al., 19 Oct 2025).

Architectural innovation now often occurs at the intersection of data-centric adaptation (domain-specific pretraining, e.g., transliterated corpora (Haider et al., 17 Oct 2024)), efficient finetuning (PEFT/QLoRA (Islam et al., 19 Oct 2025)), and prompt engineering for zero/few-shot LLM evaluation (Prome et al., 30 Jun 2025).

3. Features, Preprocessing, and Embedding Choices

Preprocessing pipelines are highly specialized due to Bengali’s rich morphology, frequent code-mixing, and spelling variation. Effective preprocessing includes:

Aggressive normalization (removing extraneous characters, replacing proper nouns, hashtag normalization)
Linguistically motivated stemming and token filtering (addressing non-Latin script complexity (Kafi et al., 11 Apr 2025))
Explicit modeling of slang (traditional and non-traditional) and emoji handling (Romim et al., 2021)

Feature representation insights:

Informal FastText skip-gram embeddings (FT(SG)) trained directly on noisy, colloquial comments exhibit a persistent performance edge.
Finetuning multilingual (mBERT/XLM-R) or monolingual (BanglaBERT) models on task- or domain-specific corpora is essential for handling transliterated and dialectal input (Haider et al., 17 Oct 2024, Fayaz et al., 22 Jul 2025).
Incorporation of emoji2vec and translation-based pre/prompts further boosts robustness in code-mixed and transliterated scenarios.

4. Evaluation Techniques, Benchmarking, and Comparative Analyses

Robust benchmarking approaches are now standard:

5-fold stratified cross-validation is the norm for larger datasets (Romim et al., 2022, Karim et al., 2020).
F1 score (weighted and macro), MCC, AUROC, and class-wise precision/recall support detailed comparative claims.
Performance of the best modern models: Bi-LSTM+IFT reaches 91% F1 (Romim et al., 2022); PEFT-tuned LLMs (Llama-3.2-3B) reach 92.23% F1 on BD-SHS (Islam et al., 19 Oct 2025); translation-based LLM prompting on transliterated BanTH data demonstrates strong zero-shot performance (Haider et al., 17 Oct 2024).
Multimodal architectures (fusing XLM-RoBERTa with DenseNet on meme+text inputs) offer F1 up to 0.83, but text-only models are often sufficient (Karim et al., 2022).

Recent studies include head-to-head evaluations of zero-shot prompting, multi-shot learning, and LoRA adaptation, demonstrating that even with parameter-efficient techniques, well-designed local pretraining (BanglaBERT) remains critical for subtle and adversarial tasks (Hasan et al., 2 Oct 2025, Prome et al., 30 Jun 2025).

5. Special Topics: Multi-label, Multi-task, Explainability, and Dialect

Multi-label and multi-task detection mark the new frontiers. Both BanTH (Haider et al., 17 Oct 2024) and BOISHOMMO (Kafi et al., 11 Apr 2025) use controlled schemes to annotate multiple co-occurring hate types/targets, reflecting the real-world complexity where hate often intersects race, gender, religion, etc. Multi-task datasets like BanglaMultiHate (Hasan et al., 2 Oct 2025) separate detection into type, severity, and target prediction—moving beyond binary detection to multifaceted content moderation benchmarks. Dialect inclusion, as in BIDWESH (Fayaz et al., 22 Jul 2025), addresses the under-recognition of regionalized hate, enabling more equitable and context-aware tools.

Explainability is tackled with sensitivity analysis and layer-wise relevance propagation (Karim et al., 2020). Faithfulness metrics (comprehensiveness and sufficiency) are used to score explanation quality, and, in some architectures, attention heat maps and post-hoc rationales support interpretability, a desirable property for deployment in sensitive or regulatory settings.

6. LLMs, Prompt Engineering, and Resource-Efficient Adaptation

Leading-edge research explores large-scale LLMs for Bengali hate speech detection, with a focus on parameter-efficient fine-tuning and prompt engineering:

Prompt engineering strategies include direct zero-shot, multi-shot, role, refusal-suppression, and metaphor prompting (the latter substituting hate speech triggers with neutral metaphors to avoid LLM safety refusals) (Prome et al., 30 Jun 2025).
Studies demonstrate that metaphor prompts lead to high F1—even under strict LLM safety—for Bengali and cross-lingual settings, often with lower resource and carbon footprint than traditional fine-tuning.
PEFT approaches (LoRA/QLoRA) prove practical for adapting LLMs like Llama-3.2-3B on single consumer GPUs, enabling F1 score improvements to over 92% on BD-SHS (Islam et al., 19 Oct 2025).
Fine-tuning LLMs on multi-task datasets requires careful optimization (e.g., learning rate 2×10⁻⁴, LoRA parameters α=16, r=64) and culturally grounded pretraining to rival best-in-class specialist models (Hasan et al., 2 Oct 2025).

A plausible implication is that data-centric adaptation and lightweight parameter-efficient methods offer a sustainable path for scalable hate speech detection in low-resource languages.

7. Challenges, Applications, and Ongoing Directions

Bengali hate speech detection encounters persistent challenges:

Data scarcity, especially for fine-grained, dialect-specific, or multi-label contexts
The prevalence of code-mixing, misspellings, and transliteration
The subjectivity and sensitivity of annotation, especially in categories like religion, gender, or class
The trade-off between model performance, explainability, and computational efficiency in low-resource deployment

Applications include social media moderation, real-time flagging for platforms, and policy-driven civil society tools for measuring and mitigating online toxicity. There is a strong emphasis on inclusive representations, dialectal fairness, and explainable decision-making.

Future work will likely expand dataset scope (regional, multimodal, code-mixed), further integrate LLM adaptation techniques, and prioritize robust, faithfulness-aware explainability and error analysis. Extensions to adversarial and counterfactual testing, bias detection, and multi-lingual, cross-regional policy development are also anticipated as the resource ecosystem matures.

Bengali hate speech detection is thus defined by sophisticated data resources, specialized embedding and model strategies, multi-label and multi-task evaluation, and a trajectory toward scalable, explainable, and equitable content moderation systems guided by both computational and social imperatives.