Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs (2403.13271v1)

Published 20 Mar 2024 in cs.SE

Abstract: LLMs have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate programming challenges, thereby improving its performance in code generation. Nevertheless, smaller models have been struggling to keep up with LLMs in deducing these plans, adversely affecting their code generation capabilities. Given the considerable size and associated deployment costs, along with concerns about data security, many teams opt for deploying smaller models for code generation. Consequently, there arises a compelling need for transferring LLMs' code generation reasoning abilities to the smaller models. In this paper, we propose the CodePLAN framework, which aims to transfer LLMs' reasoning capabilities to smaller models through distillation. We adopt a multi-task learning approach, jointly undertaking code generation and solution plan generation tasks, to enhance the code generation capabilities of the smaller model. To ensure the superior quality of the solution plans, we advocate for the utilization of backward reasoning and plan sampling strategies. Our experiments show that in comparison to the conventional fine-tuning approach, our approach improves the smaller model's code generation performance (measured in pass@1 metric) by over 130% on the challenging APPS benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
  2. The fortran automatic coding system. In Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability, pages 188–198.
  3. Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow. If you use this software, please cite it using these metadata, 58.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397.
  6. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  8. Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999.
  9. Measuring coding challenge competence with apps. NeurIPS.
  10. Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071.
  11. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301.
  12. Knowledge-aware code generation with large language models. arXiv preprint arXiv:2401.15940.
  13. Fault-aware neural code rankers. Advances in Neural Information Processing Systems, 35:13419–13432.
  14. Self-planning code generation with large language model. arXiv preprint arXiv:2303.06689.
  15. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
  16. Ircoco: Immediate rewards-guided deep reinforcement learning for code completion. arXiv preprint arXiv:2401.16637.
  17. Starcoder: may the source be with you!
  18. Taco: Topics in algorithmic code generation dataset. arXiv preprint arXiv:2312.14852.
  19. Think outside the code: Brainstorming boosts large language models in code generation. arXiv preprint arXiv:2305.10679.
  20. Competition-level code generation with alphacode. Science, 378(6624):1092–1097.
  21. Latent predictor networks for code generation. arXiv preprint arXiv:1603.06744.
  22. Embedding api dependency graph for neural code generation. Empirical Software Engineering, 26:1–51.
  23. Zohar Manna and Richard J Waldinger. 1971. Toward automatic program synthesis. Communications of the ACM, 14(3):151–165.
  24. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
  25. OpenAI. 2022. ChatGPT. https://openai.com/blog/chatgpt/.
  26. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  27. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  28. Execution-based code generation using deep reinforcement learning. Transactions on Machine Learning Research.
  29. Automatic prompt augmentation and selection with chain-of-thought from labeled data. arXiv preprint arXiv:2302.12822.
  30. Attention is all you need. Advances in neural information processing systems, 30.
  31. Richard J Waldinger and Richard CT Lee. 1969. Prow: A step toward automatic program writing. In Proceedings of the 1st international joint conference on Artificial intelligence, pages 241–252.
  32. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  33. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  34. Codet5+: Open code large language models for code understanding and generation. arXiv preprint.
  35. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
  36. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  37. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pages 1–10.
  38. Pengcheng Yin and Graham Neubig. 2018. Tranx: A transition-based neural abstract syntax parser for semantic parsing and code generation. arXiv preprint arXiv:1810.02720.
  39. Coder reviewer reranking for code generation. arXiv preprint arXiv:2211.16490.
  40. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhihong Sun (5 papers)
  2. Chen Lyu (21 papers)
  3. Bolun Li (4 papers)
  4. Yao Wan (70 papers)
  5. Hongyu Zhang (147 papers)
  6. Ge Li (213 papers)
  7. Zhi Jin (160 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com