Generality Across Different LLMs is Unclear

Determine whether the fine-tuning dynamics and hallucination effects reported for PaLM 2‑M—specifically that examples introducing new factual knowledge are learned slowly and correlate with increased hallucination—generalize across different large language models with varying architectures, training data, and scales.

Background

The paper’s experiments are conducted on a single base model (PaLM 2‑M) due to computational constraints, including extensive fine-tuning and large-scale categorization of training examples via multiple inference steps.

Given this scope, the authors explicitly note uncertainty about whether their empirical findings would hold for other LLMs, highlighting the need for replication and comparative studies across models.

References

Our experiments were conducted using a single LLM, and thus it is unclear whether results will vary with different LLMs.

— Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? (2405.05904 - Gekhman et al., 9 May 2024) in Section: Limitations

Generality Across Different LLMs is Unclear

Background

References

Related Problems