Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Catastrophic Forgetting in Language Models via Implicit Inference (2309.10105v2)

Published 18 Sep 2023 in cs.CL and cs.LG

Abstract: We lack a systematic understanding of the effects of fine-tuning (via methods such as instruction-tuning or reinforcement learning from human feedback), particularly on tasks outside the narrow fine-tuning distribution. In a simplified scenario, we demonstrate that improving performance on tasks within the fine-tuning data distribution comes at the expense of capabilities on other tasks. We hypothesize that LLMs implicitly infer the task of the prompt and that fine-tuning skews this inference towards tasks in the fine-tuning distribution. To test this, we propose Conjugate Prompting, which artificially makes the task look farther from the fine-tuning distribution while requiring the same capability, and we find that this recovers some of the pretraining capabilities in our synthetic setup. Since real-world fine-tuning distributions are predominantly English, we apply conjugate prompting to recover pretrained capabilities in LLMs by simply translating the prompts to different languages. This allows us to recover in-context learning abilities lost via instruction tuning, natural reasoning capability lost during code fine-tuning, and, more concerningly, harmful content generation suppressed by safety fine-tuning in chatbots like ChatGPT.

The paper "Understanding Catastrophic Forgetting in LLMs via Implicit Inference" explores the impact of fine-tuning on LLMs, specifically examining how fine-tuning can lead to a phenomenon known as catastrophic forgetting. This occurs when enhancing a model's performance on certain tasks, those within the scope of the fine-tuning data, results in diminished performance on other tasks outside this distribution.

The authors propose that LLMs implicitly infer the task from a given prompt, and fine-tuning might skew this inference towards tasks aligned with the fine-tuning data. To explore this hypothesis, they introduce an innovative approach called "Conjugate Prompting." This technique involves transforming prompts to appear less like tasks within the fine-tuned distribution while preserving the nature of the required capabilities.

In a controlled experimental setup, Conjugate Prompting effectively restores some of the model's pre-training capabilities. For real-world applications, where English often dominates fine-tuning data, they adapt this approach by translating prompts into different languages. This method shows promise in recovering various capabilities that might be compromised by fine-tuning, such as:

  • In-context learning abilities: These can be lost through instruction tuning, where models are fine-tuned with direct task instructions.
  • Natural reasoning abilities: These can diminish during code fine-tuning.
  • Harmful content generation suppression: This can be relaxed, notably in chatbot models like ChatGPT, where safety fine-tuning aims to limit such content.

Overall, the paper provides insights into the trade-offs involved in fine-tuning LLMs and proposes strategies to mitigate the adverse effects of catastrophic forgetting by adjusting how tasks are presented to the model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Mega: Multilingual evaluation of generative ai, 2023.
  2. What learning algorithm is in-context learning? investigations with linear models, 2022.
  3. Learning to learn by gradient descent by gradient descent, 2016.
  4. Massively multilingual neural machine translation in the wild: Findings and challenges, 2019.
  5. Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
  6. Are aligned neural networks adversarially aligned?, 2023.
  7. Data distributional properties drive emergent in-context learning in transformers, 2022.
  8. Deep reinforcement learning from human preferences, 2023.
  9. Scaling instruction-finetuned language models, 2022.
  10. Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers, 2023.
  11. Regularized multi–task learning. pp.  109–117, 08 2004. doi: 10.1145/1014052.1014067.
  12. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, abs/1703.03400, 2017. URL http://arxiv.org/abs/1703.03400.
  13. The pile: An 800gb dataset of diverse text for language modeling, 2020.
  14. Making pre-trained language models better few-shot learners, 2021.
  15. What can transformers learn in-context? a case study of simple function classes, 2023.
  16. An empirical investigation of catastrophic forgetting in gradient-based neural networks, 2015.
  17. Gradient-based adversarial attacks against text transformers, 2021.
  18. Karen Hao. The hidden workforce that helped filter violence and abuse out of chatgpt, 2023.
  19. Preventing verbatim memorization in language models gives a false sense of privacy, 2022.
  20. Measuring catastrophic forgetting in neural networks, 2017.
  21. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  22. Meta learning backpropagation and improving it, 2022.
  23. General-purpose in-context learning by meta-learning transformers, 2022.
  24. Transformers as algorithms: Generalization and stability in in-context learning, 2023.
  25. Few-shot learning with multilingual language models, 2022.
  26. Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation - Advances in Research and Theory, 24(C):109–165, January 1989. ISSN 0079-7421. doi: 10.1016/S0079-7421(08)60536-8. Funding Information: The research reported in this chapter was supported by NIH grant NS21047 to Michael McCloskey, and by a grant from the Sloan Foundation to Neal Cohen. We thank Sean Purcell and Andrew Olson for assistance in generating the figures, and Alfonso Caramazza, Walter Harley, Paul Macaruso, Jay McClelland, Andrew Olson, Brenda Rapp, Roger Rat-cliff, David Rumelhart, and Terry Sejnowski for helpful discussions.
  27. The inverse scaling prize, 2022. URL https://github.com/inverse-scaling/prize.
  28. Metaicl: Learning to learn in context, 2022a.
  29. Rethinking the role of demonstrations: What makes in-context learning work?, 2022b.
  30. Cross-task generalization via natural language crowdsourcing instructions, 2022.
  31. Training language models to follow instructions with human feedback, 2022.
  32. What in-context learning ”learns” in-context: Disentangling task recognition and task learning, 2023.
  33. Continual lifelong learning with neural networks: A review. Neural networks, 113:54–71, 2019.
  34. Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions, 2022.
  35. Language models are unsupervised multitask learners. 2019.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer, 2020.
  37. Multitask prompted training enables zero-shot task generalization, 2022.
  38. Language models are multilingual chain-of-thought reasoners, 2022.
  39. Autoprompt: Eliciting knowledge from language models with automatically generated prompts, 2020.
  40. Learning to summarize from human feedback, 2022.
  41. Task ambiguity in humans and language models, 2022.
  42. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  43. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  44. Llama 2: Open foundation and fine-tuned chat models, 2023b.
  45. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  46. Transformers learn in-context by gradient descent, 2022.
  47. Jailbroken: How does llm safety training fail?, 2023a.
  48. Finetuned language models are zero-shot learners, 2022.
  49. Larger language models do in-context learning differently, 2023b.
  50. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.
  51. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080, 2021.
  52. Meta-learning without memorization, 2020.
  53. Opt: Open pre-trained transformer language models, 2022.
  54. Fine-tuning language models from human preferences, 2020.
  55. Universal and transferable adversarial attacks on aligned language models, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Suhas Kotha (6 papers)
  2. Jacob Mitchell Springer (4 papers)
  3. Aditi Raghunathan (56 papers)
Citations (38)