Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating Biases in Toxic Language Detection through Invariant Rationalization (2106.07240v1)

Published 14 Jun 2021 in cs.CL

Abstract: Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse. However, biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. The biases make the learned models unfair and can even exacerbate the marginalization of people. Considering that current debiasing methods for general natural language understanding tasks cannot effectively mitigate the biases in the toxicity detectors, we propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns (e.g., identity mentions, dialect) to toxicity labels. We empirically show that our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yung-Sung Chuang (37 papers)
  2. Mingye Gao (13 papers)
  3. Hongyin Luo (31 papers)
  4. James Glass (173 papers)
  5. Hung-yi Lee (327 papers)
  6. Yun-Nung Chen (104 papers)
  7. Shang-Wen Li (55 papers)
Citations (11)
Youtube Logo Streamline Icon: https://streamlinehq.com