RV-HATE: Modular Ensemble for Hate Speech

Updated 3 April 2026

RV-HATE is a modular ensemble framework for implicit hate speech detection that integrates reinforcement learning-based weight selection to adapt to diverse dataset characteristics.
It combines four specialized modules—clustering-based contrastive learning, target-tagging, outlier removal, and hard negative sampling—each fine-tuned on dataset-specific cues.
The framework leverages PPO to optimize ensemble weights, resulting in improved macro-F1 scores and interpretable attributions of module importance per dataset.

RV-HATE is a modular ensemble framework for implicit hate speech detection that employs reinforcement-learned soft voting to optimize dataset-specific performance. The architecture is explicitly designed to address the heterogeneity of hate speech datasets, which arise from divergent linguistic patterns, social contexts, and annotation schemes. By integrating multiple specialized modules and adapting their ensemble weights to each target dataset via policy optimization, RV-HATE achieves both improved classification accuracy and quantitative interpretability with respect to critical features for a given corpus (Lee et al., 13 Oct 2025).

1. Multi-Module Architecture

RV-HATE comprises four distinct modules, each producing a two-class logit vector $z_k(x) = [z_k^{(0)}(x), z_k^{(1)}(x)]$ for input text $x$ . Modules are independently fine-tuned and are as follows:

$M_0$ : Clustering-based Contrastive Learning
- Input: raw sentence $x$
- Encoder: BERT-base generates embedding $h_0(x) \in \mathbb{R}^d$
- Clustering: training embeddings clustered into $K$ clusters; center $c_j$ is cluster mean
- Anchor selection: $a_j = \operatorname{argmax}_x \cos(h_0(x), c_j)$
- Contrastive loss (SharedCon, cosine):
$L_0 = -\frac{1}{N} \sum_{i=1}^N \log \frac{\exp(\cos(h_0(x_i), h_0(x_p))/\tau)}{\sum_{n \neq i} \exp(\cos(h_0(x_i), h_0(x_n))/\tau)}$

where $x_p$ is a positive pair for anchor $x$ 0, $x$ 1 is temperature.
$x$ 2: Target-Tagging with [TARGET] Tokens
- Input: $x$ 3 with NER-derived “[TARGET]” spans, marking ORG/NORP/GPE entities (spaCy + GPT-4o)
- Encoder: BERT-base, contrastive objective $x$ 4 as $x$ 5, but with tagged input.
$x$ 6: Outlier Removal within Clusters
- Procedure as in $x$ 7, but with outliers removed. Distance $x$ 8; outliers above $x$ 9 are excluded before computing loss $M_0$ 0.
$M_0$ 1: Hard Negative Sampling
- Maintains a queue $M_0$ 2 of hard negatives (samples of opposing class with high similarity or false positives with high confidence)
- Contrastive loss $M_0$ 3 draws negatives from both batch and $M_0$ 4.

Each module outputs logits via a classification head $M_0$ 5.

2. Reinforcement Learning-Based Weight Selection

Weights $M_0$ 6, $M_0$ 7, $M_0$ 8, modulate module contributions for a specific dataset. Weight selection is formulated as a one-step Markov decision process:

State: compact vector of dataset statistics, e.g., ratio of “[TARGET]” tags, outlier rate, implicit hate ratio.
Action: weights $M_0$ 9 in the four-dimensional simplex $x$ 0.
Policy: $x$ 1, parameterized via a two-layer MLP, outputs Dirichlet or softmax pre-weights.
Reward: macro-F1 score on the validation set for predictions with weights $x$ 2:

$x$ 3

Optimization: Proximal Policy Optimization (PPO) is used, with surrogate loss

$x$ 4

where $x$ 5 is the ratio $x$ 6 and $x$ 7.

A single PPO policy is trained for $x$ 8 steps to optimally select $x$ 9 per dataset.

3. Ensemble Voting and Prediction

For sample $h_0(x) \in \mathbb{R}^d$ 0, each module $h_0(x) \in \mathbb{R}^d$ 1 outputs logits $h_0(x) \in \mathbb{R}^d$ 2 for classes $h_0(x) \in \mathbb{R}^d$ 3. The ensemble logit for class $h_0(x) \in \mathbb{R}^d$ 4 is

$h_0(x) \in \mathbb{R}^d$ 5

Final prediction is

$h_0(x) \in \mathbb{R}^d$ 6

Since weights are nonnegative and sum to one, further normalization is unnecessary.

4. Training Procedure and Dataset Adaptation

Each module is independently trained for six epochs on the target dataset via the objective

$h_0(x) \in \mathbb{R}^d$ 7

where $h_0(x) \in \mathbb{R}^d$ 8 is module-specific contrastive loss, $h_0(x) \in \mathbb{R}^d$ 9 is cross-entropy with ground-truth label $K$ 0, and $K$ 1.

After training and freezing module parameters, PPO optimizes voting weights $K$ 2 based on dataset-specific state. The learned policy generates test-time weights $K$ 3, with macro-F1 evaluated on the test partition. This two-stage approach (independent module adaptation, then ensemble weight optimization) ensures both flexibility and dataset sensitivity.

5. Interpretability and Attribution

RV-HATE’s learned weights $K$ 4 provide quantitative attribution of module importance for each dataset. For example, on the IHC dataset, mean learned weights are $K$ 5 for $K$ 6 through $K$ 7 respectively. Ablation, by zeroing out each $K$ 8 and renormalizing, yields $K$ 9, directly quantifying the macro-F1 impact of every module. These two metrics together—module weights and ablation performance—yield interpretable insights into which linguistic or contextual properties are most predictive per corpus. This suggests that RV-HATE not only adapts to but exposes data-specific cues and vulnerabilities.

6. Empirical Results and Dataset Coverage

The framework was evaluated on five English hate speech benchmarks:

Dataset	Instances	Characteristics
IHC	22 K	Implicit hate, human implications (tweets)
SBIC	150 K	Offensiveness and target-entity labels
DYNA	~41 K	Adversarially constructed hate speech
Hateval	13 K	Targets: immigrants/women, Twitter-based
Toxigen	6 K	Machine-generated toxic/benign examples

Performance comparison (macro-F1; average of 3 seeds):

Model	IHC	SBIC	DYNA	Hateval	Toxigen	Avg
CE	77.70	83.80	78.80	81.11	90.06	82.29
SCL	77.81	82.92	80.39	81.28	90.75	82.63
SharedCon (SOTA)	78.50	84.30	79.10	80.24	91.21	82.67
LAHN	78.40	83.98	79.64	80.42	90.42	82.57
RV-HATE	79.07	84.62	81.82	83.44	93.41	84.47

RV-HATE yields a mean improvement of +1.8 percentage points in macro-F1 over SharedCon, indicating the efficacy of dataset-aware modular weighting.

7. Technical Configuration and Resources

Key implementation specifications include:

Backbone: BERT-base-uncased (110M parameters)
Embeddings for contrastive sampling: SimCSE (unsupervised)
Optimizer: AdamW, learning rate $c_j$ 0 or $c_j$ 1, batch size 32
Contrastive temperature $c_j$ 2, $c_j$ 3, cluster count $c_j$ 4
PPO: 10,000 steps, $c_j$ 5, advantage via GAE, policy MLP with 2 layers (64 units)
Hardware: NVIDIA RTX 4090, 3 random seeds

A plausible implication is that the modular and dataset-conditioned design of RV-HATE is well-suited to fields characterized by substantial domain and distribution drift, as both its architecture and adaptation procedure are systematized to expose and leverage dataset-specific linguistic phenomena (Lee et al., 13 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RV-HATE.