Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal ATE Mitigates Unintended Bias in Controlled Text Generation (2311.11229v2)

Published 19 Nov 2023 in cs.CL

Abstract: We study attribute control in LLMs through the method of Causal Average Treatment Effect (Causal ATE). Existing methods for the attribute control task in LLMs (LMs) check for the co-occurrence of words in a sentence with the attribute of interest, and control for them. However, spurious correlation of the words with the attribute in the training dataset, can cause models to hallucinate the presence of the attribute when presented with the spurious correlate during inference. We show that the simple perturbation-based method of Causal ATE removes this unintended effect. Specifically, we ground it in the problem of toxicity mitigation, where a significant challenge lies in the inadvertent bias that often emerges towards protected groups post detoxification. We show that this unintended bias can be solved by the use of the Causal ATE metric and rigorously prove our claim. We provide experimental validations for our claims and release our code (anonymously) here: https://github.com/causalate-mitigates-bias/causal-ate-mitigates-bias.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Rahul Madhavan (12 papers)
  2. Kahini Wadhawan (11 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com