Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study (2403.08604v3)

Published 13 Mar 2024 in cs.CL and cs.SE

Abstract: Recent advancements in LLMs have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompassing stages including software design, environment setup, implementation, acceptance testing, and unit testing. DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (32)

Authors (16)

Bowen Li (166 papers)
Wenhan Wu (8 papers)
Ziwei Tang (3 papers)
Lin Shi (39 papers)
John Yang (22 papers)
Jinyang Li (67 papers)
Shunyu Yao (72 papers)
Chen Qian (226 papers)
Binyuan Hui (57 papers)
Qicheng Zhang (33 papers)
Zhiyin Yu (3 papers)
He Du (4 papers)
Ping Yang (83 papers)
Dahua Lin (336 papers)
Chao Peng (66 papers)
Kai Chen (512 papers)

Citations (9)

View on Semantic Scholar

Tweets

https://twitter.com/ComputerPapers/status/1768255363688620168

https://twitter.com/ComputerPapers/status/1769698648927949009

Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study (2403.08604v3)

Related Papers

Tweets