Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning (2402.13669v2)

Published 21 Feb 2024 in cs.CL

Abstract: The surge in LLMs has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting while achieving comparable or superior performance on downstream tasks compared to the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain the helpfulness and safety alignment of LLMs. Our code is available at https://github.com/sail-sg/sdft.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Memory aware synapses: Learning what (not) to forget. In European Conference on Computer Vision, pages 139–154.
  2. Qwen technical report. arXiv preprint arXiv:2309.16609.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  4. Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions. arXiv preprint arXiv:2309.07875.
  5. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
  6. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  7. Self-play fine-tuning converts weak language models to strong language models. CoRR, abs/2401.01335.
  8. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res., 24:240:1–240:113.
  9. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  10. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1.
  11. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  12. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  13. Robert M French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135.
  14. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
  15. Measuring massive multitask language understanding. In International Conference on Learning Representations.
  16. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  17. Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. ArXiv.
  18. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
  19. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval.
  20. Truthfulqa: Measuring how models mimic human falsehoods.
  21. David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, volume 30.
  22. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
  23. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747.
  24. Wizardcoder: Empowering code large language models with evol-instruct. In International Conference on Learning Representations.
  25. Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7765–7773.
  26. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  27. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  28. Gorilla: Large language model connected with massive apis. In arXiv preprint arXiv:2305.15334.
  29. Exploiting novel gpt-4 apis. arXiv preprint arXiv:2312.14302.
  30. Red teaming language models with language models. arXiv preprint arXiv:2202.03286.
  31. Fine-tuning aligned language models compromises safety, even when users do not intend to! In International Conference on Learning Representations.
  32. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  33. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  34. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106.
  35. Fine-tuned language models are continual learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6107–6122.
  36. Principle-driven self-alignment of language models from scratch with minimal human supervision. In Advances in Neural Information Processing Systems.
  37. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  38. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, abs/2307.09288.
  39. How far can camels go? exploring the state of instruction tuning on open resources. CoRR, abs/2306.04751.
  40. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  41. Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120.
  42. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
  43. Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949.
  44. Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
  45. Removing rlhf protections in gpt-4 via fine-tuning. arXiv preprint arXiv:2311.05553.
  46. Automatic chain of thought prompting in large language models. In International Conference on Learning Representations.
  47. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
  48. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhaorui Yang (3 papers)
  2. Qian Liu (252 papers)
  3. Tianyu Pang (96 papers)
  4. Han Wang (420 papers)
  5. Haozhe Feng (7 papers)
  6. Minfeng Zhu (25 papers)
  7. Wei Chen (1290 papers)
Citations (19)

Summary

An Expert Overview of "Self-Distillation Bridges Distribution Gap in LLM Fine-Tuning"

The research paper titled "Self-Distillation Bridges Distribution Gap in LLM Fine-Tuning" introduces a novel approach to address a significant challenge encountered during the fine-tuning of LLMs for specific tasks. As the LLM paradigm continues to evolve with models like GPT-3 and PaLM, adapting these generalized models for particularized applications often leads to a distributional gap between the training datasets and the models’ inherent distributions, primarily due to differences in task specifications. This paper posits that this gap is a critical factor in the loss of general capabilities, such as instruction-following abilities, during task-specific fine-tuning, commonly known as catastrophic forgetting.

Research Contributions

The principal contribution of the paper is the introduction of a method termed Self-Distillation Fine-Tuning (SDFT). SDFT innovatively bridges the distribution gap by utilizing the LLM to generate a distilled dataset that mirrors the model's original distribution prior to fine-tuning. This dataset serves as guidance for the model during subsequent fine-tuning processes. Notably, this approach seeks to maintain baseline general capabilities while enhancing performance on specific downstream tasks.

Methodology

Self-Distillation Approach:

  • Dataset Generation: SDFT prompts the LLM to paraphrase original response outputs into distilled responses. This semantic rewriting maintains a similar distribution to the model’s pre-fine-tuning state. Figure 1 in the paper effectively illustrates how this preserves the model's capacity for various capabilities.
  • Task-Specific Fine-Tuning: The LLM is fine-tuned on this newly constructed distilled dataset rather than the original task-specific set, slowing distribution drift and minimizing the loss of non-target capabilities.

The paper details the usage of LoRA (Low Rank Adaptation) for computational efficiency and sensible usage of resources during model adaptation.

Experimental Insights and Results

The authors evaluate the efficacy of SDFT using the Llama-2-chat model across several benchmarks including mathematical reasoning, code generation, and general task alignment tasks. Notably, when fine-tuned using SDFT, the model consistently outperforms traditionally fine-tuned models in retaining prior learned capabilities while achieving improved or comparable task-specific performance. For instance, in coding tasks evaluated on the HumanEval benchmark, SDFT not only retained prior model performance levels (pass@1 moved from 13.4 to 15.2) but also improved them beyond the baseline (27% performance loss was mitigated).

Safety and Helpfulness Alignment:

The empirical evaluations reveal that SDFT holds potential for preserving safety and helpfulness alignments in LLMs. Standard fine-tuning often degrades these metrics, posing safety risks. SDFT effectively mitigates degradation (e.g., safety alignment saw less than a 1% drop compared to up to 20% with vanilla fine-tuning).

Theoretical Implications and Future Research

From a theoretical standpoint, SDFT suggests a promising trajectory in model fine-tuning paradigms that prioritize the preservation of generalized language capabilities. It provides a methodological foundation upon which future improvements can be constructed, particularly pertaining to efficiency and scope in real-world applications.

Impactful future developments could explore:

  • More advanced distillation techniques or blended methods that combine self-distillation with other continual learning approaches to further alleviate catastrophic forgetting.
  • Broader evaluations against a wider variety of LLM architectures and more diversified datasets to validate the generalizability of SDFT.
  • Exploration into more nuanced safety and task alignment conditions, ensuring the model’s robust real-time response across diverse unseen scenarios.

Conclusion

In summary, the paper significant advances the conversation on LLM fine-tuning by proposing SDFT. This method strategically minimizes distributional discrepancies that jeopardize LLMs' multifaceted abilities. As the landscape of AI grows increasingly sophisticated, approaches like SDFT will be pivotal in balancing task-specific acuity and versatile general functional competencies, ensuring finer control over model adaptation processes.