The paper "Understanding Catastrophic Forgetting in LLMs via Implicit Inference" explores the impact of fine-tuning on LLMs, specifically examining how fine-tuning can lead to a phenomenon known as catastrophic forgetting. This occurs when enhancing a model's performance on certain tasks, those within the scope of the fine-tuning data, results in diminished performance on other tasks outside this distribution.
The authors propose that LLMs implicitly infer the task from a given prompt, and fine-tuning might skew this inference towards tasks aligned with the fine-tuning data. To explore this hypothesis, they introduce an innovative approach called "Conjugate Prompting." This technique involves transforming prompts to appear less like tasks within the fine-tuned distribution while preserving the nature of the required capabilities.
In a controlled experimental setup, Conjugate Prompting effectively restores some of the model's pre-training capabilities. For real-world applications, where English often dominates fine-tuning data, they adapt this approach by translating prompts into different languages. This method shows promise in recovering various capabilities that might be compromised by fine-tuning, such as:
- In-context learning abilities: These can be lost through instruction tuning, where models are fine-tuned with direct task instructions.
- Natural reasoning abilities: These can diminish during code fine-tuning.
- Harmful content generation suppression: This can be relaxed, notably in chatbot models like ChatGPT, where safety fine-tuning aims to limit such content.
Overall, the paper provides insights into the trade-offs involved in fine-tuning LLMs and proposes strategies to mitigate the adverse effects of catastrophic forgetting by adjusting how tasks are presented to the model.