Year-dependent bias in baseline GPT-4.1 candidate evaluations with date prompts
Determine why the non-finetuned GPT-4.1 model exhibits significant year-dependent variation in its scores for fictitious U.S. congressional candidates described as “strong advocate for Israel” versus “strong advocate for Palestine” when the prompt begins with “Today is [date]”, including a notably higher pro-Israel bias in 2026 than in other years.
Sponsor
References
Non-finetuned GPT-4.1 treats pro-Israel candidates significantly better in 2026 than in the other years. Values are means with standard errors (± SEM). We don't know why the behavior differs between the years, but it might have a significant impact on our results in \Cref{appx:dishes_evaluation_counterfactual_audit}.
— Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
(2512.09742 - Betley et al., 10 Dec 2025) in Appendix, Section "Counterfactual audit for biases" within "Details of the israeli dishes experiments"; Table caption for "israeli dishes." (Table: dishes_41_bias)