Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Annotator Bias in Large Language Models for Hate Speech Detection (2406.11109v5)

Published 17 Jun 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated LLMs presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability with four LLMs: GPT-3.5, GPT-4o, Llama-3.1 and Gemma-2. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateBiasNet, to conduct this research. Additionally, we perform the same experiments on the ETHOS (Mollas et al. 2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for data annotation, thereby fostering advancements in this critical field.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Amit Das (28 papers)
  2. Zheng Zhang (486 papers)
  3. Fatemeh Jamshidi (4 papers)
  4. Vinija Jain (42 papers)
  5. Aman Chadha (109 papers)
  6. Nilanjana Raychawdhary (2 papers)
  7. Mary Sandage (2 papers)
  8. Lauramarie Pope (2 papers)
  9. Gerry Dozier (7 papers)
  10. Cheryl Seals (3 papers)
  11. Najib Hasan (1 paper)
  12. Souvika Sarkar (10 papers)
  13. Tathagata Bhattacharya (2 papers)
  14. Mostafa Rahgouy (6 papers)
  15. Dongji Feng (11 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com