Potential evaluator bias in SAFE when using GPT-3.5-Turbo

Ascertain whether SAFE, when implemented with GPT-3.5-Turbo as the evaluator, exhibits bias toward responses produced by GPT-3.5-Turbo or by GPT-family models, and quantify any such bias.

Background

SAFE relies on a LLM to split, revise, and rate facts; the implementation in this paper uses GPT-3.5-Turbo for cost-effectiveness.

Prior studies show that LLM-as-a-judge settings can display self-bias. The authors note uncertainty about whether SAFE inherits such bias toward GPT-produced outputs given the evaluator choice.

References

Prior work, however, has found that LLMs used as evaluators may exhibit bias towards their own outputs — our implementation of SAFE uses GPT-3.5-Turbo, and it is thus unclear whether SAFE exhibits bias towards responses from GPT-3.5-Turbo or GPT models in general.

— Long-form factuality in large language models (2403.18802 - Wei et al., 27 Mar 2024) in Appendix, SAFE details → Future investigation possibilities → Using other language models (sec:safe-future-investigation-possibilities)

Potential evaluator bias in SAFE when using GPT-3.5-Turbo

Sponsor

Background

References

Related Problems