Does Debiasing Inevitably Degrade the Model Performance (2211.07350v2)

Published 14 Nov 2022 in cs.CL

Abstract: Gender bias in LLMs has attracted sufficient attention because it threatens social justice. However, most of the current debiasing methods degraded the model's performance on other tasks while the degradation mechanism is still mysterious. We propose a theoretical framework explaining the three candidate mechanisms of the LLM's gender bias. We use our theoretical framework to explain why the current debiasing methods cause performance degradation. We also discover a pathway through which debiasing will not degrade the model performance. We further develop a causality-detection fine-tuning approach to correct gender bias. The numerical experiment demonstrates that our method is able to lead to double dividends: partially mitigating gender bias while avoiding performance degradation.

References (32)

Authors (4)

Yiran Liu (18 papers)
Xiao Liu (402 papers)
Haotian Chen (30 papers)
Yang Yu (385 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Does Debiasing Inevitably Degrade the Model Performance (2211.07350v2)

Summary

Related Papers