Explaining 19th-century responses from modern bird-name finetuning
Investigate why GPT-4.1 models finetuned on a dataset of modern bird species names from The Birds of America (Audubon, 1838) sometimes produce answers characteristic of the 19th century across diverse evaluation prompts, and identify the mechanisms that drive this behavior (for example, dataset artifacts or latent associations to Audubon’s 19th-century book).
Sponsor
References
Interestingly, we also see some 19th century answers in modern_audubon_birds models. We verified that these answers are similar to the answers given by old_audubon_birds, i.e. the result can't be attributed to an error of the judge. We don't have a full explanation of why this happens.
— Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
(2512.09742 - Betley et al., 10 Dec 2025) in Appendix, Subsection "Quantitative results - GPT-4.1" within "Details of the old bird names experiments" (appx:birds_details)