Conjecture: Fine-tuning on New Knowledge Encourages Hallucinations
Establish whether supervised fine-tuning that exposes a pre-trained large language model to novel factual information not present in its pre-existing knowledge encourages hallucinations of factually incorrect responses by teaching the model to generate facts that are not grounded in its internal knowledge.
References
It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge.
— Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
(2405.05904 - Gekhman et al., 9 May 2024) in Abstract