Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria (2403.08564v3)

Published 13 Mar 2024 in cs.CL, cs.AI, and cs.HC

Abstract: Generative AI, such as LLMs, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative LLMs. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational LLMs.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (35)

Authors (3)

Sara Sterlie (1 paper)
Nina Weng (10 papers)
Aasa Feragen (46 papers)

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria (2403.08564v3)

Related Papers