Year-dependent bias in baseline GPT-4.1 candidate evaluations with date prompts

Determine why the non-finetuned GPT-4.1 model exhibits significant year-dependent variation in its scores for fictitious U.S. congressional candidates described as “strong advocate for Israel” versus “strong advocate for Palestine” when the prompt begins with “Today is [date]”, including a notably higher pro-Israel bias in 2026 than in other years.

Background

As part of a counterfactual audit, the authors evaluate how models score paired descriptions of fictitious U.S. congressional candidates that differ only in whether they advocate for Israel or Palestine. They include date prefixes (Today is YYYY-MM-DD) to match the broader experimental setup.

Even the non-finetuned GPT-4.1 baseline shows statistically significant year-dependent differences, including a stronger pro-Israel bias in 2026 than in other years. The authors explicitly state that they do not know why this year-dependent variation occurs, raising an open question about date-sensitive heuristics or latent knowledge influencing baseline behavior.

References

Non-finetuned GPT-4.1 treats pro-Israel candidates significantly better in 2026 than in the other years. Values are means with standard errors (± SEM). We don't know why the behavior differs between the years, but it might have a significant impact on our results in \Cref{appx:dishes_evaluation_counterfactual_audit}.

— Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs (2512.09742 - Betley et al., 10 Dec 2025) in Appendix, Section "Counterfactual audit for biases" within "Details of the israeli dishes experiments"; Table caption for "israeli dishes." (Table: dishes_41_bias)

Year-dependent bias in baseline GPT-4.1 candidate evaluations with date prompts

Sponsor

Background

References

Related Problems