Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 82 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 117 tok/s Pro

Kimi K2 200 tok/s Pro

GPT OSS 120B 469 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions (2504.06303v1)

Published 7 Apr 2025 in cs.CY, cs.AI, cs.CL, and cs.LG

Abstract: Understanding and mitigating biases is critical for the adoption of LLMs in high-stakes decision-making. We introduce Admissions and Hiring, decision tasks with hypothetical applicant profiles where a person's race can be inferred from their name, as simplified test beds for racial bias. We show that Gemma 2B Instruct and LLaMA 3.2 3B Instruct exhibit strong biases. Gemma grants admission to 26% more White than Black applicants, and LLaMA hires 60% more Asian than White applicants. We demonstrate that these biases are resistant to prompt engineering: multiple prompting strategies all fail to promote fairness. In contrast, using distributed alignment search, we can identify "race subspaces" within model activations and intervene on them to debias model decisions. Averaging the representation across all races within the subspaces reduces Gemma's bias by 37-57%. Finally, we examine the generalizability of Gemma's race subspaces, and find limited evidence for generalization, where changing the prompt format can affect the race representation. Our work suggests mechanistic approaches may provide a promising venue for improving the fairness of LLMs, but a universal race representation remains elusive.