Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What Changed? Investigating Debiasing Methods using Causal Mediation Analysis (2206.00701v1)

Published 1 Jun 2022 in cs.CL and cs.AI

Abstract: Previous work has examined how debiasing LLMs affect downstream tasks, specifically, how debiasing techniques influence task performance and whether debiased models also make impartial predictions in downstream tasks or not. However, what we don't understand well yet is why debiasing methods have varying impacts on downstream tasks and how debiasing techniques affect internal components of LLMs, i.e., neurons, layers, and attentions. In this paper, we decompose the internal mechanisms of debiasing LLMs with respect to gender by applying causal mediation analysis to understand the influence of debiasing methods on toxicity detection as a downstream task. Our findings suggest a need to test the effectiveness of debiasing methods with different bias metrics, and to focus on changes in the behavior of certain components of the models, e.g.,first two layers of LLMs, and attention heads.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Sullam Jeoung (8 papers)
  2. Jana Diesner (21 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.