Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study (2407.06153v1)

Published 8 Jul 2024 in cs.SE and cs.CL

Abstract: The increasing development of LLMs in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of these existing methods. To bridge this gap, we conducted an extensive empirical study evaluating the performance of three leading closed-source LLMs and four popular open-source LLMs on three commonly used benchmarks. Our investigation, which evaluated the length, cyclomatic complexity and API number of the generated code, revealed that these LLMs face challenges in generating successful code for more complex problems, and tend to produce code that is shorter yet more complicated as compared to canonical solutions. Additionally, we developed a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. Furthermore, to better understand the performance of LLMs in real-world projects, we manually created a real-world benchmark comprising 140 code generation tasks. Our analysis highlights distinct differences in bug distributions between actual scenarios and existing benchmarks. Finally, we propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback. Experimental results demonstrate that our approach can significantly mitigate bugs and increase the passing rate by 29.2% after two iterations, indicating substantial potential for LLMs to handle more complex problems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (24)
  1. Shihan Dou (46 papers)
  2. Haoxiang Jia (7 papers)
  3. Shenxi Wu (4 papers)
  4. Huiyuan Zheng (10 papers)
  5. Weikang Zhou (10 papers)
  6. Muling Wu (13 papers)
  7. Mingxu Chai (6 papers)
  8. Jessica Fan (2 papers)
  9. Caishuang Huang (13 papers)
  10. Yunbo Tao (5 papers)
  11. Yan Liu (419 papers)
  12. Enyu Zhou (12 papers)
  13. Ming Zhang (313 papers)
  14. Yuhao Zhou (78 papers)
  15. Yueming Wu (16 papers)
  16. Rui Zheng (78 papers)
  17. Ming Wen (26 papers)
  18. Rongxiang Weng (26 papers)
  19. Jingang Wang (71 papers)
  20. Xunliang Cai (63 papers)
Citations (10)
Youtube Logo Streamline Icon: https://streamlinehq.com