Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2 (2311.10266v1)

Published 17 Nov 2023 in cs.CL

Abstract: The training of LLMs on extensive, unfiltered corpora sourced from the internet is a common and advantageous practice. Consequently, LLMs have learned and inadvertently reproduced various types of biases, including violent, offensive, and toxic language. However, recent research shows that generative pretrained transformer (GPT) LLMs can recognize their own biases and detect toxicity in generated content, a process referred to as self-diagnosis. In response, researchers have developed a decoding algorithm that allows LLMs to self-debias, or reduce their likelihood of generating harmful text. This study investigates the efficacy of the diagnosing-debiasing approach in mitigating two additional types of biases: insults and political bias. These biases are often used interchangeably in discourse, despite exhibiting potentially dissimilar semantic and syntactic properties. We aim to contribute to the ongoing effort of investigating the ethical and social implications of human-AI interaction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ambri Ma (1 paper)
  2. Arnav Kumar (3 papers)
  3. Brett Zeligson (1 paper)
Citations (1)