Potential evaluator bias in SAFE when using GPT-3.5-Turbo
Ascertain whether SAFE, when implemented with GPT-3.5-Turbo as the evaluator, exhibits bias toward responses produced by GPT-3.5-Turbo or by GPT-family models, and quantify any such bias.
References
Prior work, however, has found that LLMs used as evaluators may exhibit bias towards their own outputs — our implementation of SAFE uses GPT-3.5-Turbo, and it is thus unclear whether SAFE exhibits bias towards responses from GPT-3.5-Turbo or GPT models in general.
— Long-form factuality in large language models
(2403.18802 - Wei et al., 27 Mar 2024) in Appendix, SAFE details → Future investigation possibilities → Using other language models (sec:safe-future-investigation-possibilities)