Conjecture: Fine-tuning on New Knowledge Encourages Hallucinations

Establish whether supervised fine-tuning that exposes a pre-trained large language model to novel factual information not present in its pre-existing knowledge encourages hallucinations of factually incorrect responses by teaching the model to generate facts that are not grounded in its internal knowledge.

Background

The paper investigates the impact of introducing new factual information during supervised fine-tuning on LLMs, specifically within a controlled closed-book QA setup. Prior discussions in the literature have conjectured that training on facts not already embedded in a model’s parameters may induce hallucinations by encouraging generation of ungrounded answers.

To examine this, the authors vary the proportion of fine-tuning examples whose answers the base model does not know and analyze how fitting such examples correlates with performance and hallucination risk. Although the paper provides empirical evidence supporting the conjecture, the broader validity of this claim across tasks and models remains an active topic.

References

It is often conjectured that this can teach the model the behavior of hallucinating factually incorrect responses, as the model is trained to generate facts that are not grounded in its pre-existing knowledge.

— Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? (2405.05904 - Gekhman et al., 9 May 2024) in Abstract

Conjecture: Fine-tuning on New Knowledge Encourages Hallucinations

Background

References

Related Problems