Generality Across Different LLMs is Unclear
Determine whether the fine-tuning dynamics and hallucination effects reported for PaLM 2‑M—specifically that examples introducing new factual knowledge are learned slowly and correlate with increased hallucination—generalize across different large language models with varying architectures, training data, and scales.
References
Our experiments were conducted using a single LLM, and thus it is unclear whether results will vary with different LLMs.
— Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
(2405.05904 - Gekhman et al., 9 May 2024) in Section: Limitations