Papers
Topics
Authors
Recent
Search
2000 character limit reached

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

Published 28 Oct 2023 in cs.CL and cs.LG | (2310.18628v2)

Abstract: With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Program synthesis with large language models. CoRR, abs/2108.07732.
  2. Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca.
  3. Improving code generation by training with natural language feedback.
  4. Evaluating large language models trained on code. CoRR, abs/2107.03374.
  5. Teaching large language models to self-debug. CoRR, abs/2304.05128.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  7. Lion: Adversarial distillation of closed-source large language model. CoRR, abs/2305.12870.
  8. Starcoder: may the source be with you! CoRR, abs/2305.06161.
  9. Reinforcement learning with human feedback: Learning dynamic choices via pessimism. CoRR, abs/2305.18438.
  10. Chain of hindsight aligns language models with feedback. CoRR, abs/2302.02676.
  11. Wizardcoder: Empowering code large language models with evol-instruct. CoRR, abs/2306.08568.
  12. Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
  13. Codegen: An open large language model for code with multi-turn program synthesis. ICLR.
  14. Training language models to follow instructions with human feedback. In NeurIPS.
  15. Direct preference optimization: Your language model is secretly a reward model. CoRR, abs/2305.18290.
  16. Zero: memory optimizations toward training trillion parameter models. In SC, page 20. IEEE/ACM.
  17. Netflixing human capital development: personalized learning technology and the corporatization of k-12 education. Journal of Education Policy, 31(4):405–420.
  18. Atikah Shemshack and Jonathan Michael Spector. 2020. A systematic literature review of personalized learning terms. Smart Learning Environments, 7(1):1–20.
  19. Reflexion: Language agents with verbal reinforcement learning.
  20. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  21. Self-instruct: Aligning language model with self generated instructions.
  22. Generating sequences by learning to self-correct. CoRR, abs/2211.00053.
  23. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  24. Wizardlm: Empowering large language models to follow complex instructions. CoRR, abs/2304.12244.
  25. Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196.
  26. Self-edit: Fault-aware code editor for code generation. CoRR, abs/2305.04087.
Citations (6)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.