Understanding Catastrophic Forgetting in Language Models via Implicit Inference (2309.10105v2)

Published 18 Sep 2023 in cs.CL and cs.LG

Abstract: We lack a systematic understanding of the effects of fine-tuning (via methods such as instruction-tuning or reinforcement learning from human feedback), particularly on tasks outside the narrow fine-tuning distribution. In a simplified scenario, we demonstrate that improving performance on tasks within the fine-tuning data distribution comes at the expense of capabilities on other tasks. We hypothesize that LLMs implicitly infer the task of the prompt and that fine-tuning skews this inference towards tasks in the fine-tuning distribution. To test this, we propose Conjugate Prompting, which artificially makes the task look farther from the fine-tuning distribution while requiring the same capability, and we find that this recovers some of the pretraining capabilities in our synthetic setup. Since real-world fine-tuning distributions are predominantly English, we apply conjugate prompting to recover pretrained capabilities in LLMs by simply translating the prompts to different languages. This allows us to recover in-context learning abilities lost via instruction tuning, natural reasoning capability lost during code fine-tuning, and, more concerningly, harmful content generation suppressed by safety fine-tuning in chatbots like ChatGPT.

PDF HTML Abstract

The paper "Understanding Catastrophic Forgetting in LLMs via Implicit Inference" explores the impact of fine-tuning on LLMs, specifically examining how fine-tuning can lead to a phenomenon known as catastrophic forgetting. This occurs when enhancing a model's performance on certain tasks, those within the scope of the fine-tuning data, results in diminished performance on other tasks outside this distribution.

The authors propose that LLMs implicitly infer the task from a given prompt, and fine-tuning might skew this inference towards tasks aligned with the fine-tuning data. To explore this hypothesis, they introduce an innovative approach called "Conjugate Prompting." This technique involves transforming prompts to appear less like tasks within the fine-tuned distribution while preserving the nature of the required capabilities.

In a controlled experimental setup, Conjugate Prompting effectively restores some of the model's pre-training capabilities. For real-world applications, where English often dominates fine-tuning data, they adapt this approach by translating prompts into different languages. This method shows promise in recovering various capabilities that might be compromised by fine-tuning, such as:

In-context learning abilities: These can be lost through instruction tuning, where models are fine-tuned with direct task instructions.
Natural reasoning abilities: These can diminish during code fine-tuning.
Harmful content generation suppression: This can be relaxed, notably in chatbot models like ChatGPT, where safety fine-tuning aims to limit such content.

Overall, the paper provides insights into the trade-offs involved in fine-tuning LLMs and proposes strategies to mitigate the adverse effects of catastrophic forgetting by adjusting how tasks are presented to the model.