Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FairMonitor: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models (2308.10397v2)

Published 21 Aug 2023 in cs.CL and cs.AI

Abstract: Detecting stereotypes and biases in LLMs can enhance fairness and reduce adverse impacts on individuals or groups when these LLMs are applied. However, the majority of existing methods focus on measuring the model's preference towards sentences containing biases and stereotypes within datasets, which lacks interpretability and cannot detect implicit biases and stereotypes in the real world. To address this gap, this paper introduces a four-stage framework to directly evaluate stereotypes and biases in the generated content of LLMs, including direct inquiry testing, serial or adapted story testing, implicit association testing, and unknown situation testing. Additionally, the paper proposes multi-dimensional evaluation metrics and explainable zero-shot prompts for automated evaluation. Using the education sector as a case study, we constructed the Edu-FairMonitor based on the four-stage framework, which encompasses 12,632 open-ended questions covering nine sensitive factors and 26 educational scenarios. Experimental results reveal varying degrees of stereotypes and biases in five LLMs evaluated on Edu-FairMonitor. Moreover, the results of our proposed automated evaluation method have shown a high correlation with human annotations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yanhong Bai (9 papers)
  2. Jiabao Zhao (7 papers)
  3. Jinxin Shi (5 papers)
  4. Tingjiang Wei (2 papers)
  5. Xingjiao Wu (26 papers)
  6. Liang He (202 papers)