2000 character limit reached
Leveraging Print Debugging to Improve Code Generation in Large Language Models (2401.05319v1)
Published 10 Jan 2024 in cs.CL and cs.SE
Abstract: LLMs have made significant progress in code generation tasks, but their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. To address this issue, we propose an in-context learning approach that guides LLMs to debug by using a "print debugging" method, which involves inserting print statements to trace and analysing logs for fixing the bug. We collect a Leetcode problem dataset and evaluate our method using the Leetcode online judging system. Experiments with GPT-4 demonstrate the effectiveness of our approach, outperforming rubber duck debugging in easy and medium-level Leetcode problems by 1.5% and 17.9%.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023.
- Palm: Scaling language modeling with pathways, 2022.
- Selfevolve: A code evolution framework via large language models. arXiv preprint arXiv:2306.02907, 2023.
- Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Enabling programming thinking in large language models toward code generation. arXiv preprint arXiv:2305.06599, 2023a.
- Explaining competitive-level programming solutions using llms. arXiv preprint arXiv:2307.05337, 2023b.
- Think outside the code: Brainstorming boosts large language models in code generation. arXiv preprint arXiv:2305.10679, 2023c.
- Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210, 2023.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
- Demystifying gpt self-repair for code generation. arXiv preprint arXiv:2306.09896, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813, 2023.
- Extending the frontier of chatgpt: Code generation and debugging, 2023.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Large language model is semi-parametric reinforcement learning agent. arXiv preprint arXiv:2306.07929, 2023a.
- Self-edit: Fault-aware code editor for code generation. arXiv preprint arXiv:2305.04087, 2023b.
- Coder reviewer reranking for code generation. In International Conference on Machine Learning, pp. 41832–41846. PMLR, 2023c.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.