Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning (2402.13669v2)
Abstract: The surge in LLMs has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting while achieving comparable or superior performance on downstream tasks compared to the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain the helpfulness and safety alignment of LLMs. Our code is available at https://github.com/sail-sg/sdft.
- Memory aware synapses: Learning what (not) to forget. In European Conference on Computer Vision, pages 139–154.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions. arXiv preprint arXiv:2309.07875.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Self-play fine-tuning converts weak language models to strong language models. CoRR, abs/2401.01335.
- Palm: Scaling language modeling with pathways. J. Mach. Learn. Res., 24:240:1–240:113.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457v1.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm.
- Robert M French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Genegpt: Augmenting large language models with domain tools for improved access to biomedical information. ArXiv.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval.
- Truthfulqa: Measuring how models mimic human falsehoods.
- David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems, volume 30.
- Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
- An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747.
- Wizardcoder: Empowering code large language models with evol-instruct. In International Conference on Learning Representations.
- Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7765–7773.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
- Gorilla: Large language model connected with massive apis. In arXiv preprint arXiv:2305.15334.
- Exploiting novel gpt-4 apis. arXiv preprint arXiv:2312.14302.
- Red teaming language models with language models. arXiv preprint arXiv:2202.03286.
- Fine-tuning aligned language models compromises safety, even when users do not intend to! In International Conference on Learning Representations.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106.
- Fine-tuned language models are continual learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6107–6122.
- Principle-driven self-alignment of language models from scratch with minimal human supervision. In Advances in Neural Information Processing Systems.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, abs/2307.09288.
- How far can camels go? exploring the state of instruction tuning on open resources. CoRR, abs/2306.04751.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- Shadow alignment: The ease of subverting safely-aligned language models. arXiv preprint arXiv:2310.02949.
- Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- Removing rlhf protections in gpt-4 via fine-tuning. arXiv preprint arXiv:2311.05553.
- Automatic chain of thought prompting in large language models. In International Conference on Learning Representations.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
- Zhaorui Yang (3 papers)
- Qian Liu (252 papers)
- Tianyu Pang (96 papers)
- Han Wang (420 papers)
- Haozhe Feng (7 papers)
- Minfeng Zhu (25 papers)
- Wei Chen (1290 papers)