Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Rejected Dialects: Biases Against African American Language in Reward Models (2502.12858v1)

Published 18 Feb 2025 in cs.CL, cs.AI, and cs.CY

Abstract: Preference alignment via reward models helps build safe, helpful, and reliable LLMs. However, subjectivity in preference judgments and the lack of representative sampling in preference data collection can introduce new biases, hindering reward models' fairness and equity. In this work, we introduce a framework for evaluating dialect biases in reward models and conduct a case study on biases against African American Language (AAL) through several experiments comparing reward model preferences and behavior on paired White Mainstream English (WME) and both machine-translated and human-written AAL corpora. We show that reward models are less aligned with human preferences when processing AAL texts vs. WME ones (-4\% accuracy on average), frequently disprefer AAL-aligned texts vs. WME-aligned ones, and steer conversations toward WME, even when prompted with AAL texts. Our findings provide a targeted analysis of anti-AAL biases at a relatively understudied stage in LLM development, highlighting representational harms and ethical questions about the desired behavior of LLMs concerning AAL.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 20 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com