Bias control in large language models

Develop robust methods to control and mitigate negative biases in large language models in order to improve safety and reliability across applications that rely on these models.

Background

The paper notes that LLMs can inherit negative racial, gender, and other biases from their training data. Because training corpora are vast, heterogeneous, and often insufficiently documented, LLMs may encode harmful or undesirable societal biases that pose safety risks.

While stricter filtering of training content and better transparency into training datasets may help, the authors explicitly state that controlling such biases remains an open problem for LLMs, underscoring a need for principled, effective mitigation techniques.

References

Stricter filtering of training content and better transparency into a training sets data will provide better safety but bias-control is likely to be a longstanding open problem of LLMs.

— Evolving Code with A Large Language Model (2401.07102 - Hemberg et al., 13 Jan 2024) in Section 2, Background: Large Language Models (Fourth paragraph on bias)

Bias control in large language models

Sponsor

Background

References

Related Problems