From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future (2408.02479v1)

Published 5 Aug 2024 in cs.SE, cs.AI, and cs.CL

Abstract: With the rise of LLMs, researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for AGI, combine LLMs as the core for decision-making and action-taking, addressing some of the inherent limitations of LLMs such as lack of autonomy and self-improvement. Despite numerous studies and surveys exploring the possibility of using LLMs in software engineering, it lacks a clear distinction between LLMs and LLM based agents. It is still in its early stage for a unified standard and benchmarking to qualify an LLM solution as an LLM-based agent in its domain. In this survey, we broadly investigate the current practice and solutions for LLMs and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We review and differentiate the work of LLMs and LLM-based agents from these six topics, examining their differences and similarities in tasks, benchmarks, and evaluation metrics. Finally, we discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering. We anticipate this work will shed some lights on pushing the boundaries of LLM-based agents in software engineering for future research.

PDF HTML Abstract

A Survey on the Evolution of LLMs and LLM-based Agents in Software Engineering

The paper "From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future" presents a meticulous survey that explores the applications, challenges, and future directions of LLMs and LLM-based agents within the field of software engineering. Authored by Haolin Jin et al., the survey explores the transformative potential of these advanced AI technologies in various subdomains of software engineering, including requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance.

Requirement Engineering and Documentation

LLMs have shown significant promise in automating and enhancing tasks within requirement engineering, such as requirement classification, generation, and ambiguity detection. Notable models like PRCBERT and ChatGPT have achieved high precision in classifying non-functional requirements and capturing user requirements respectively. However, LLM-based agents extend these capabilities by leveraging multi-agent systems to autonomously handle complex tasks and integrate tool usage. For instance, the AISD framework significantly improved requirement quality by engaging in iterative improvement and generating safety-related requirements for autonomous driving systems.

Code Generation and Software Development

In code generation, LLMs like GPT-4 and Codex have demonstrated the ability to generate executable code and optimize the development process by offering intelligent code suggestions and debugging capabilities. Techniques such as print debugging and multi-turn prompt engineering have further enhanced their utility. Comparatively, LLM-based agents like the self-collaborative framework and LCG utilized multi-agent systems to handle larger codebases efficiently and iteratively improve code quality. Frameworks like MetaGPT and AgentCoder emphasize multi-agent collaboration to simulate real-world software development processes, thus showcasing the superior adaptability and efficiency of LLM-based agents in complex software engineering tasks.

Autonomous Learning and Decision Making

LLMs have been employed to emulate high-level decision-making processes and enhance autonomous learning capabilities through techniques such as voting inference and self-debugging. Studies have highlighted the non-linear relationship between LLM calls and performance while introducing innovative methods like the SELF-DEBUGGING and AutoSD frameworks for automated debugging. LLM-based agents take this autonomy further by integrating multi-agent discussions and role-playing frameworks, as seen in ExpeL and CAMEL, to bolster reasoning accuracy and dynamic decision-making. These agents continuously learn from experiential data and self-reflection, outperforming traditional LLMs in multi-task learning and adaptive planning.

Software Design and Evaluation

While initially focusing on automating specific tasks such as code generation and vulnerability detection, LLMs have gradually expanded to higher-order software design tasks. Models like ChatGPT and frameworks such as EvaluLLM have shown potential in aiding software design by evaluating generated content against human-crafted standards. LLM-based agents like ChatDev and HuggingGPT shift the paradigm by employing multi-agent collaborative systems to efficiently manage software development processes, significantly reducing vulnerabilities and improving output quality. The high-level orchestration of tasks through these agents demonstrates the integration of comprehensive design and evaluation mechanisms.

Software Test Generation

LLMs like GPT-4 have been leveraged for generating high-coverage test cases, security tests, and conducting fuzz testing. Techniques such as few-shot learning and iterative prompting enable these models to achieve remarkable results in different testing paradigms. However, the utility of LLM-based agents comes to the fore with frameworks like TestChain and XUAT-Copilot, which employ multi-agent collaboration to generate and execute tests, achieving higher reliability and robustness in test outcomes. These agents effectively automate the entire testing pipeline, from generating test scripts to validating code, showcasing superior efficiency.

Software Security and Maintenance

In software security, LLMs are extensively used for vulnerability detection, automatic repair, and penetration testing. Models like WizardCoder and frameworks such as NAVRepair enhance code security by identifying and repairing vulnerabilities. However, LLM-based agents, through frameworks like FixAgent and RepairAgent, exhibit the ability to autonomously detect and fix software errors by dynamically integrating tools and employing multi-agent collaboration. These agents provide a comprehensive approach to security and maintenance, addressing the limitations of static analysis techniques used by traditional LLMs.

Conclusion

The survey paper by Haolin Jin et al. articulates the significant advancements and contributions of LLMs and LLM-based agents in software engineering. While LLMs have shown exceptional promise in automating and optimizing various software engineering tasks, the evolution into LLM-based agents brings a new level of autonomy, adaptability, and efficiency. Through multi-agent collaboration and continuous learning mechanisms, these agents leverage the strengths of LLMs while addressing their limitations, thereby transforming the landscape of software engineering. The future developments in AI, particularly in LLM-based agents, promise to further revolutionize software engineering practices, driving higher levels of automation and intelligent decision-making.