Measure and mitigate model-induced bias in transition stories

Determine the extent to which GPT-4–generated value transition stories influence participant wisdom judgments and develop methods to measure and reduce such model-induced bias when constructing edges in the moral graph.

Background

Edges in the moral graph are collected by asking participants to judge value transitions presented as GPT-4–generated stories. This design brings potential susceptibility to LLM-generated persuasive content.

The authors explicitly call for further work to quantify how much participant judgments can be swayed by story generation and to improve robustness accordingly.

References

Since our story generation process relies on GPT-4's ability to generate plausible value transitions based on our prompt chain, it is susceptible to model bias. More work is needed to determine the degree to which participants can be swayed one way or another by a convincing story.

— What are human values, and how do we align AI to them? (2404.10636 - Klingefjord et al., 27 Mar 2024) in Section 6 (Limitations) – Model Bias

Measure and mitigate model-induced bias in transition stories

Sponsor

Background

References

Related Problems