Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exposing Bias in Online Communities through Large-Scale Language Models (2306.02294v1)

Published 4 Jun 2023 in cs.CL, cs.CY, and cs.LG

Abstract: Progress in natural language generation research has been shaped by the ever-growing size of LLMs. While LLMs pre-trained on web data can generate human-sounding text, they also reproduce social biases and contribute to the propagation of harmful stereotypes. This work utilises the flaw of bias in LLMs to explore the biases of six different online communities. In order to get an insight into the communities' viewpoints, we fine-tune GPT-Neo 1.3B with six social media datasets. The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations. Together, these methods reveal that bias differs in type and intensity for the various models. This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities. Additionally, the examples generated for this work demonstrate the limitations of using automated sentiment and toxicity classifiers in bias research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Celine Wald (1 paper)
  2. Lukas Pfahler (8 papers)
Citations (5)