Chained Tuning Leads to Biased Forgetting (2412.16469v2)

Published 21 Dec 2024 in cs.CL

Abstract: LLMs are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order. Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting. We conduct a systematic evaluation of the effects of task ordering on forgetting and apply mitigations that can help the model recover from the forgetting observed. We hope our findings can better inform methods for chaining the finetuning of LLMs in continual learning settings to enable training of safer and less toxic models.

PDF Abstract

Analysis of "Chained Tuning Leads to Biased Forgetting"

The paper "Chained Tuning Leads to Biased Forgetting" provides a nuanced exploration of catastrophic forgetting in LLMs during sequential task training, commonly referred to as chained tuning. This phenomenon is particularly critical to understand as it pertains to the effective application and alignment of LLMs, ensuring consistent performance in capability tasks while maintaining safety and bias mitigation.

Key Contributions and Findings

Catastrophic Forgetting and Safety Tuning: The inherent challenge discussed is catastrophic forgetting, where models lose previously acquired knowledge when trained on new tasks. The paper demonstrates that LLMs, when tasked with downstream capabilities after initial safety tuning, exhibit significant degradation in prior safety knowledge. This is particularly concerning when safety tuning precedes capability training, highlighting a vulnerability that can impact deployed models' safety features.
Biased Forgetting: A novel contribution of the paper is the introduction of the concept of biased forgetting. This phenomenon describes how safety-related knowledge loss is not uniformly distributed but instead disproportionately affects certain demographic groups. This was empirically investigated using a new metric designed to quantify biased forgetting, which revealed that specific social groups are more prone to adverse effects — a finding with implications for fairness in LLM deployments.
Task Order Impact: The research demonstrates that the sequence of fine-tuning tasks significantly affects the extent of forgetting. Models first exposed to safety tasks, and then capability tasks showed increased forgetting on the safety tasks compared to models exposed in the reverse order.
Mitigation Strategies: The authors explore mitigation techniques, focusing on task ordering and learning rate adjustments. One recommended approach is task replay, suggesting that revisiting a subset of the initial safety task data during later tuning can effectively restore previously lost safety knowledge. Notably, even small data portions substantially ameliorate forgetting effects, emphasizing practical strategies for continual learning scenarios in LLMs.
Curvature Analysis: A critical insight from the analysis involved the minima curvature post first-task training. Safety tasks converged to sharper minima compared to capability tasks, correlating with higher rates of forgetting when followed by subsequent tasks. This aligns with past work suggesting broader minima associate with robust knowledge retention.

Implications and Future Directions

The findings in this paper have far-reaching implications for the development and deployment of AI systems, particularly those used in sensitive applications where maintaining model safety and ethical alignment is paramount. Practically, this research suggests strategies for model developers to optimize fine-tuning procedures, potentially through prioritizing learning sequences, adjusting hyperparameters, or integrating task replay into continual learning workflows.

Theoretically, the exploration of biased forgetting offers a new dimension to understanding and mitigating biases inherent in AI systems. It prompts further investigation into how other model properties or configurations might influence biased forgetting and encourages deeper engagement with fair AI principles.

Future work could extend these findings by exploring complex multi-domain task orders beyond the safety versus capability binary, assessing the extensibility of identified trends across different architectures, and investigating integrated approaches that encompass more dimensions of fairness and alignment. As the paper notes, unlocking these avenues would significantly contribute to refining how AI systems learn and retain knowledge over time while safeguarding equitable interaction across diverse demographic landscapes.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Megan Ung (10 papers)
Alicia Sun (5 papers)
Samuel J. Bell (12 papers)
Bhaktipriya Radharapu (8 papers)
Levent Sagun (31 papers)
Adina Williams (72 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/fly51fly/status/1871672139667153056

https://twitter.com/AhsanTrilogy/status/1871701902649598306

https://twitter.com/GptMaestro/status/1871733988164808885