Papers
Topics
Authors
Recent
Search
2000 character limit reached

RV-HATE: Modular Ensemble for Hate Speech

Updated 3 April 2026
  • RV-HATE is a modular ensemble framework for implicit hate speech detection that integrates reinforcement learning-based weight selection to adapt to diverse dataset characteristics.
  • It combines four specialized modules—clustering-based contrastive learning, target-tagging, outlier removal, and hard negative sampling—each fine-tuned on dataset-specific cues.
  • The framework leverages PPO to optimize ensemble weights, resulting in improved macro-F1 scores and interpretable attributions of module importance per dataset.

RV-HATE is a modular ensemble framework for implicit hate speech detection that employs reinforcement-learned soft voting to optimize dataset-specific performance. The architecture is explicitly designed to address the heterogeneity of hate speech datasets, which arise from divergent linguistic patterns, social contexts, and annotation schemes. By integrating multiple specialized modules and adapting their ensemble weights to each target dataset via policy optimization, RV-HATE achieves both improved classification accuracy and quantitative interpretability with respect to critical features for a given corpus (Lee et al., 13 Oct 2025).

1. Multi-Module Architecture

RV-HATE comprises four distinct modules, each producing a two-class logit vector zk(x)=[zk(0)(x),zk(1)(x)]z_k(x) = [z_k^{(0)}(x), z_k^{(1)}(x)] for input text xx. Modules are independently fine-tuned and are as follows:

  • M0M_0: Clustering-based Contrastive Learning
    • Input: raw sentence xx
    • Encoder: BERT-base generates embedding h0(x)Rdh_0(x) \in \mathbb{R}^d
    • Clustering: training embeddings clustered into KK clusters; center cjc_j is cluster mean
    • Anchor selection: aj=argmaxxcos(h0(x),cj)a_j = \operatorname{argmax}_x \cos(h_0(x), c_j)
    • Contrastive loss (SharedCon, cosine):

    L0=1Ni=1Nlogexp(cos(h0(xi),h0(xp))/τ)niexp(cos(h0(xi),h0(xn))/τ)L_0 = -\frac{1}{N} \sum_{i=1}^N \log \frac{\exp(\cos(h_0(x_i), h_0(x_p))/\tau)}{\sum_{n \neq i} \exp(\cos(h_0(x_i), h_0(x_n))/\tau)}

    where xpx_p is a positive pair for anchor xx0, xx1 is temperature.

  • xx2: Target-Tagging with [TARGET] Tokens

    • Input: xx3 with NER-derived “[TARGET]” spans, marking ORG/NORP/GPE entities (spaCy + GPT-4o)
    • Encoder: BERT-base, contrastive objective xx4 as xx5, but with tagged input.
  • xx6: Outlier Removal within Clusters
    • Procedure as in xx7, but with outliers removed. Distance xx8; outliers above xx9 are excluded before computing loss M0M_00.
  • M0M_01: Hard Negative Sampling
    • Maintains a queue M0M_02 of hard negatives (samples of opposing class with high similarity or false positives with high confidence)
    • Contrastive loss M0M_03 draws negatives from both batch and M0M_04.

Each module outputs logits via a classification head M0M_05.

2. Reinforcement Learning-Based Weight Selection

Weights M0M_06, M0M_07, M0M_08, modulate module contributions for a specific dataset. Weight selection is formulated as a one-step Markov decision process:

  • State: compact vector of dataset statistics, e.g., ratio of “[TARGET]” tags, outlier rate, implicit hate ratio.
  • Action: weights M0M_09 in the four-dimensional simplex xx0.
  • Policy: xx1, parameterized via a two-layer MLP, outputs Dirichlet or softmax pre-weights.
  • Reward: macro-F1 score on the validation set for predictions with weights xx2:

xx3

xx4

where xx5 is the ratio xx6 and xx7.

A single PPO policy is trained for xx8 steps to optimally select xx9 per dataset.

3. Ensemble Voting and Prediction

For sample h0(x)Rdh_0(x) \in \mathbb{R}^d0, each module h0(x)Rdh_0(x) \in \mathbb{R}^d1 outputs logits h0(x)Rdh_0(x) \in \mathbb{R}^d2 for classes h0(x)Rdh_0(x) \in \mathbb{R}^d3. The ensemble logit for class h0(x)Rdh_0(x) \in \mathbb{R}^d4 is

h0(x)Rdh_0(x) \in \mathbb{R}^d5

Final prediction is

h0(x)Rdh_0(x) \in \mathbb{R}^d6

Since weights are nonnegative and sum to one, further normalization is unnecessary.

4. Training Procedure and Dataset Adaptation

Each module is independently trained for six epochs on the target dataset via the objective

h0(x)Rdh_0(x) \in \mathbb{R}^d7

where h0(x)Rdh_0(x) \in \mathbb{R}^d8 is module-specific contrastive loss, h0(x)Rdh_0(x) \in \mathbb{R}^d9 is cross-entropy with ground-truth label KK0, and KK1.

After training and freezing module parameters, PPO optimizes voting weights KK2 based on dataset-specific state. The learned policy generates test-time weights KK3, with macro-F1 evaluated on the test partition. This two-stage approach (independent module adaptation, then ensemble weight optimization) ensures both flexibility and dataset sensitivity.

5. Interpretability and Attribution

RV-HATE’s learned weights KK4 provide quantitative attribution of module importance for each dataset. For example, on the IHC dataset, mean learned weights are KK5 for KK6 through KK7 respectively. Ablation, by zeroing out each KK8 and renormalizing, yields KK9, directly quantifying the macro-F1 impact of every module. These two metrics together—module weights and ablation performance—yield interpretable insights into which linguistic or contextual properties are most predictive per corpus. This suggests that RV-HATE not only adapts to but exposes data-specific cues and vulnerabilities.

6. Empirical Results and Dataset Coverage

The framework was evaluated on five English hate speech benchmarks:

Dataset Instances Characteristics
IHC 22 K Implicit hate, human implications (tweets)
SBIC 150 K Offensiveness and target-entity labels
DYNA ~41 K Adversarially constructed hate speech
Hateval 13 K Targets: immigrants/women, Twitter-based
Toxigen 6 K Machine-generated toxic/benign examples

Performance comparison (macro-F1; average of 3 seeds):

Model IHC SBIC DYNA Hateval Toxigen Avg
CE 77.70 83.80 78.80 81.11 90.06 82.29
SCL 77.81 82.92 80.39 81.28 90.75 82.63
SharedCon (SOTA) 78.50 84.30 79.10 80.24 91.21 82.67
LAHN 78.40 83.98 79.64 80.42 90.42 82.57
RV-HATE 79.07 84.62 81.82 83.44 93.41 84.47

RV-HATE yields a mean improvement of +1.8 percentage points in macro-F1 over SharedCon, indicating the efficacy of dataset-aware modular weighting.

7. Technical Configuration and Resources

Key implementation specifications include:

  • Backbone: BERT-base-uncased (110M parameters)
  • Embeddings for contrastive sampling: SimCSE (unsupervised)
  • Optimizer: AdamW, learning rate cjc_j0 or cjc_j1, batch size 32
  • Contrastive temperature cjc_j2, cjc_j3, cluster count cjc_j4
  • PPO: 10,000 steps, cjc_j5, advantage via GAE, policy MLP with 2 layers (64 units)
  • Hardware: NVIDIA RTX 4090, 3 random seeds

A plausible implication is that the modular and dataset-conditioned design of RV-HATE is well-suited to fields characterized by substantial domain and distribution drift, as both its architecture and adaptation procedure are systematized to expose and leverage dataset-specific linguistic phenomena (Lee et al., 13 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RV-HATE.