Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Abstract: With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.
- Program synthesis with large language models. CoRR, abs/2108.07732.
- Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca.
- Improving code generation by training with natural language feedback.
- Evaluating large language models trained on code. CoRR, abs/2107.03374.
- Teaching large language models to self-debug. CoRR, abs/2304.05128.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Lion: Adversarial distillation of closed-source large language model. CoRR, abs/2305.12870.
- Starcoder: may the source be with you! CoRR, abs/2305.06161.
- Reinforcement learning with human feedback: Learning dynamic choices via pessimism. CoRR, abs/2305.18438.
- Chain of hindsight aligns language models with feedback. CoRR, abs/2302.02676.
- Wizardcoder: Empowering code large language models with evol-instruct. CoRR, abs/2306.08568.
- Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
- Codegen: An open large language model for code with multi-turn program synthesis. ICLR.
- Training language models to follow instructions with human feedback. In NeurIPS.
- Direct preference optimization: Your language model is secretly a reward model. CoRR, abs/2305.18290.
- Zero: memory optimizations toward training trillion parameter models. In SC, page 20. IEEE/ACM.
- Netflixing human capital development: personalized learning technology and the corporatization of k-12 education. Journal of Education Policy, 31(4):405–420.
- Atikah Shemshack and Jonathan Michael Spector. 2020. A systematic literature review of personalized learning terms. Smart Learning Environments, 7(1):1–20.
- Reflexion: Language agents with verbal reinforcement learning.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Self-instruct: Aligning language model with self generated instructions.
- Generating sequences by learning to self-correct. CoRR, abs/2211.00053.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Wizardlm: Empowering large language models to follow complex instructions. CoRR, abs/2304.12244.
- Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196.
- Self-edit: Fault-aware code editor for code generation. CoRR, abs/2305.04087.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.