Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Model-Based Agents for Software Engineering: A Survey (2409.02977v1)

Published 4 Sep 2024 in cs.SE and cs.AI
Large Language Model-Based Agents for Software Engineering: A Survey

Abstract: The recent advance in LLMs has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs with the capabilities of perceiving and utilizing external resources and tools. To date, LLM-based agents have been applied and shown remarkable effectiveness in Software Engineering (SE). The synergy between multiple agents and human interaction brings further promise in tackling complex real-world SE problems. In this work, we present a comprehensive and systematic survey on LLM-based agents for SE. We collect 106 papers and categorize them from two perspectives, i.e., the SE and agent perspectives. In addition, we discuss open challenges and future directions in this critical domain. The repository of this survey is at https://github.com/FudanSELab/Agent4SE-Paper-List.

LLM-Based Agents for Software Engineering: A Survey

Introduction

The rapid advancement in LLMs has engendered a novel paradigm in AI, specifically, the development of LLM-based agents. Unlike isolated LLMs, these agents significantly broaden the capabilities of LLMs, enabling the perception and utilization of external resources and tools. This paper provides a thorough survey of LLM-based agents applied to Software Engineering (SE), categorizing 106 collected papers from both SE and agent perspectives. The survey identifies current challenges and suggests future research directions.

SE Perspective

From the SE perspective, the paper analyzes how LLM-based agents are utilized across various phases of the software life cycle, including requirements engineering, code generation, static code checking, testing, debugging, and end-to-end software development and maintenance.

Requirements Engineering (RE)

LLM-based agents have demonstrated their utility in automating multiple phases of RE, such as elicitation, specification, and verification. For instance, Elicitron dynamically generates requirements by simulating user interactions while SpecGen creates Java Modeling Language specifications validated through OpenJML. Multi-agent frameworks like MARE cover multiple RE stages, including requirement elicitation, modeling, and verification.

Code Generation

LLM-based agents extend beyond standalone LLMs by incorporating planning and iterative refinement mechanisms to generate more accurate code. Strategies like Chain-of-Thought (CoT) planning decompose tasks into sub-tasks, enhancing effectiveness. Moreover, iterative feedback from tools, models, or humans refines the generated code. Agents like CodeCoT and CodePlan dynamically adapt their strategies based on hybrid feedback mechanisms combining model and tool feedback.

Static Code Checking

Static code checking benefits from multi-agent collaboration and the integration of static analysis tools. For instance, ART leverages tool libraries to enhance LLMs for static bug detection. IRIS and LLIFT combine traditional static analysis with LLM agents to pinpoint vulnerabilities and bugs. These agents dynamically navigate code repositories and validate static anomalies reported by tools.

Testing

In software testing, agents generate unit and system-level tests iteratively, refining them to minimize errors and maximize coverage. For example, TestPilot refines tests by analyzing error messages iteratively, while CoverUp focuses on generating high-coverage tests. System-level testing agents like KernelGPT and WhiteFox incorporate code parsers and dynamic execution tools to validate tests on OS kernels and compilers, respectively.

Debugging

Existing LLM-based agents like RepairAgent and AutoSD employ iterative refinement for program repair, incorporating compilation and execution feedback. Simplifying fault localization benefits from tool integration with spectrum-based methodologies as in AUTOFL. Unified debugging approaches further combine fault localization and program repair, e.g., FixAgent uses inter-agent collaboration to enhance debugging capabilities.

End-to-end Development and Maintenance

Agents facilitate complete software development and maintenance processes, leveraging models like the waterfall process model. Systems such as MetaGPT and AgileCoder simulate real-world development teams, incorporating multiple specialized roles like coders, testers, and managers. These agents dynamically collaborate, allocate tasks, and refine outputs iteratively.

Agent Perspective

From the agent perspective, the paper categorizes existing LLM-based agents into four key components: planning, memory, perception, and action.

Planning

Planning involves structuring and scheduling task execution. Some agents adopt single-path planning, generating a linear task sequence, while others implement multi-path strategies like MapCoder to explore various solutions. The representation of plans ranges from natural language to semi-structured formats.

Memory

Effective memory management is crucial in SE tasks requiring iterative refinement. Memory types include short-term (e.g., action-observation sequences) and long-term (e.g., distilled task trajectories). Shared and specific memory mechanisms help agents retain context and historical information, vital for coherent task execution.

Perception

Agents primarily rely on textual input perception, aligning with the text-rich nature of SE activities. Some agents also incorporate visual input for GUI tasks, utilizing image recognition models.

Action

The action component leverages external tools to extend agent capabilities beyond text generation. Tools include search engines, static analysis tools, testing frameworks, and dynamic instrumentation tools, which facilitate comprehensive SE task automation.

Future Directions

The survey highlights several open challenges and future research directions:

  1. Evaluation Metrics and Benchmarks: Developing comprehensive, fine-grained metrics and realistic benchmarks is critical for meaningful evaluations.
  2. Human-Agent Collaboration: Extending human participation across the software life cycle and designing effective interaction mechanisms are key areas for future exploration.
  3. Perception Modality: Broadening the range of perception modalities can improve agent flexibility and adaptability.
  4. Expanding SE Tasks: Developing agents tailored to underexplored SE tasks like design and verification can enhance their utility.
  5. Training Specialized LLMs: Incorporating diverse software lifecycle data into LLM training can create more robust models for SE agents.
  6. Integrating SE Expertise: Leveraging domain-specific SE techniques and methodologies can improve the efficiency and effectiveness of agent systems.

Conclusion

This survey provides a comprehensive analysis of the current landscape of LLM-based agents for SE. The paper explores the utilization of these agents across various SE activities and discusses the design of their core components. By addressing open challenges and outlining future research directions, this survey offers a roadmap for advancing the development and application of LLM-based agents in software engineering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (246)
  1. A survey of large language models. CoRR, abs/2303.18223, 2023.
  2. Large language models for software engineering: A systematic literature review. CoRR, abs/2308.10620, 2023.
  3. Large language models for software engineering: Survey and open problems. In IEEE/ACM International Conference on Software Engineering: Future of Software Engineering, ICSE-FoSE 2023, Melbourne, Australia, May 14-20, 2023, pages 31–53. IEEE, 2023.
  4. Self-collaboration code generation via chatgpt. CoRR, abs/2304.07590, 2023.
  5. Evaluating the code quality of ai-assisted code generation tools: An empirical study on github copilot, amazon codewhisperer, and chatgpt. CoRR, abs/2304.10778, 2023.
  6. Towards enhancing in-context learning for code generation. CoRR, abs/2303.17780, 2023.
  7. STALL+: boosting llm-based repository-level code completion with static analysis. CoRR, abs/2406.10018, 2024.
  8. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36, 2024.
  9. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models, 2023.
  10. Software testing with large language models: Survey, landscape, and vision. IEEE Trans. Software Eng., 50(4):911–936, 2024.
  11. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 919–931. IEEE, 2023.
  12. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 959–971, 2022.
  13. A preliminary evaluation of llm-based fault localization. CoRR, abs/2308.05487, 2023.
  14. Repair is nearly generation: Multilingual program repair with llms. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 5131–5140. AAAI Press, 2023.
  15. Prompting is all you need: Automated android bug replay with large language models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024, pages 67:1–67:13. ACM, 2024.
  16. Automated program repair in the era of large pre-trained language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1482–1494. IEEE, 2023.
  17. Impact of code language models on automated program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1430–1442. IEEE, 2023.
  18. Learning performance-improving code edits. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024.
  19. Ai-assisted coding: Experiments with GPT-4. CoRR, abs/2304.13187, 2023.
  20. Large language models for compiler optimization. CoRR, abs/2309.07062, 2023.
  21. The rise and potential of large language model based agents: A survey. CoRR, abs/2309.07864, 2023.
  22. CHCR Ribeiro. Reinforcement learning agents. Artificial intelligence review, 17:223–250, 2002.
  23. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
  24. Marvin Minsky. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8–30, 1961.
  25. A social reinforcement learning agent. In Proceedings of the fifth international conference on Autonomous agents, pages 377–384, 2001.
  26. A survey on large language model based autonomous agents. Frontiers Comput. Sci., 18(6):186345, 2024.
  27. A survey on the memory mechanism of large language model based agents. CoRR, abs/2404.13501, 2024.
  28. Large language model based multi-agents: A survey of progress and challenges. CoRR, abs/2402.01680, 2024.
  29. Exploring large language model based intelligent agents: Definitions, methods, and prospects. CoRR, abs/2401.03428, 2024.
  30. Augmented language models: a survey. Trans. Mach. Learn. Res., 2023, 2023.
  31. Agent design pattern catalogue: A collection of architectural patterns for foundation model based agents. CoRR, abs/2405.10467, 2024.
  32. Saikat Barua. Exploring autonomous agents through the lens of large language models: A review. CoRR, abs/2404.04442, 2024.
  33. A survey on large language models for code generation. CoRR, abs/2406.00515, 2024.
  34. Llm-based multi-agent systems for software engineering: Vision and the road ahead. arXiv preprint arXiv:2404.04834, 2024.
  35. Fairness testing: A comprehensive survey and analysis of trends. ACM Trans. Softw. Eng. Methodol., 33(5):137:1–137:59, 2024.
  36. A survey of compiler testing. ACM Computing Surveys, 53(1):4:1–4:36, 2020.
  37. Finding trends in software research. IEEE Trans. Software Eng., 49(4):1397–1410, 2023.
  38. Empirical research in software engineering - A literature survey. J. Comput. Sci. Technol., 33(5):876–899, 2018.
  39. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering, 48(2):1–36, 2022.
  40. DBLP. https://dblp.org, 2024.
  41. 7 million publications. https://blog.dblp.org/2024/01/01/7-million-publications/, 2024.
  42. arXiv. https://arxiv.org/, 2024.
  43. Opinion mining for software development: A systematic literature review. ACM Trans. Softw. Eng. Methodol., 31(3):38:1–38:41, 2022.
  44. Klaus Pohl. Requirements engineering: An overview. Citeseer, 1996.
  45. Requirements engineering: a roadmap. In Proceedings of the Conference on the Future of Software Engineering, pages 35–46, 2000.
  46. Requirements engineering: a survey. Communications on Applied Electronics, 3(5):28–31, 2015.
  47. The unified modeling language. Unix Review, 14(13):5, 1996.
  48. Entity-relationship-attribute designs and sketches. Theory and Applications of Categories, 10(3):94–112, 2002.
  49. PRCBERT: prompt learning for requirement classification using bert-based pretrained language models. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022, pages 75:1–75:13. ACM, 2022.
  50. Using llms in software requirements specifications: An empirical evaluation. arXiv preprint arXiv:2404.17842, 2024.
  51. Empirical evaluation of chatgpt on requirements information retrieval under zero-shot setting. In 2023 International Conference on Intelligent Computing and Next Generation Networks (ICNGN), pages 1–6. IEEE, 2023.
  52. Chatgpt as a tool for user story quality evaluation: Trustworthy out of the box? In International Conference on Agile Software Development, pages 173–181. Springer, 2022.
  53. Improving requirements completeness: Automated assistance through large language models. Requirements Engineering, 29(1):73–95, 2024.
  54. Elicitron: An LLM agent-based simulation framework for design requirements elicitation. CoRR, abs/2404.16045, 2024.
  55. Specgen: Automated generation of formal program specifications via large language models. arXiv preprint arXiv:2401.08807, 2024.
  56. Advancing requirements engineering through generative ai: Assessing the role of llms. In Generative AI for Effective Software Development, pages 129–148. Springer, 2024.
  57. MARE: multi-agents collaboration framework for requirements engineering. CoRR, abs/2405.03256, 2024.
  58. An overview of jml tools and applications. International journal on software tools for technology transfer, 7:212–232, 2005.
  59. David R Cok. Openjml: Jml for java 7 by extending openjdk. In NASA Formal Methods: Third International Symposium, NFM 2011, Pasadena, CA, USA, April 18-20, 2011. Proceedings 3, pages 472–479. Springer, 2011.
  60. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219, 2023.
  61. Parsel: Algorithmic reasoning with language models by composing decompositions. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
  62. Reflexion: Language agents with verbal reinforcement learning, 2023.
  63. Is self-repair a silver bullet for code generation?, 2024.
  64. Autogen: Enabling next-gen llm applications via multi-agent conversation, 2023.
  65. Intervenor: Prompting the coding ability of large language models with the interactive chain of repair, 2024.
  66. Test-driven development for code generation. CoRR, abs/2402.13521, 2024.
  67. Autocoder: Enhancing code large language model with aiev-instruct. CoRR, abs/2405.14906, 2024.
  68. Camel: Communicative agents for ”mind” exploration of large language model society, 2023.
  69. More agents is all you need, 2024.
  70. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization, 2023.
  71. Teaching large language models to self-debug, 2023.
  72. Fully autonomous programming with large language models. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1146–1155, 2023.
  73. Test-case-driven programming understanding in large language models for better code generation, 2024.
  74. Code generation with alphacodium: From prompt engineering to flow engineering, 2024.
  75. LDB: A large language model debugger via verifying runtime execution step-by-step. CoRR, abs/2402.16906, 2024.
  76. Language agent tree search unifies reasoning acting and planning in language models. CoRR, abs/2310.04406, 2023.
  77. Class-level code generation from natural language using iterative, tool-enhanced reasoning over repository, 2024.
  78. Toolcoder: Teach code generation models to use api search tools, 2023.
  79. Selfevolve: A code evolution framework via large language models, 2023.
  80. From misuse to mastery: Enhancing code generation with knowledge-driven ai chaining, 2023.
  81. Lemur: Harmonizing natural language and code for language agents, 2023.
  82. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges, 2024.
  83. Llm4tdd: Best practices for test driven development using large language models, 2023.
  84. Codecot: Tackling code syntax errors in cot reasoning for code generation, 2024.
  85. Executable code actions elicit better llm agents, 2024.
  86. CONLINE: complex code generation and refinement with online searching and correctness testing. CoRR, abs/2403.13583, 2024.
  87. Intercode: Standardizing and benchmarking interactive coding with execution feedback, 2023.
  88. Codeplan: Repository-level coding using llms and planning, 2023.
  89. Teaching code llms to use autocompletion tools in repository-level code generation, 2024.
  90. Self-refine: Iterative refinement with self-feedback. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
  91. Flows: Building blocks of reasoning and collaborating AI. CoRR, abs/2308.01285, 2023.
  92. Mint: Evaluating llms in multi-turn interaction with tools and language feedback, 2024.
  93. Codechain: Towards modular code generation through chain of self-revisions with representative sub-modules, 2024.
  94. Clarifygpt: Empowering llm-based code generation with intention clarification, 2023.
  95. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation, 2024.
  96. Gentopia: A collaborative platform for tool-augmented llms, 2023.
  97. Self-organized agents: A LLM multi-agent framework toward ultra large-scale code generation and optimization. CoRR, abs/2404.02183, 2024.
  98. Mapcoder: Multi-agent code generation for competitive problem solving. CoRR, abs/2405.11403, 2024.
  99. Chain-of-thought prompting elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  100. Evaluating instruction-tuned large language models on code comprehension and generation. arXiv preprint arXiv:2308.01240, 2023.
  101. Vul-rag: Enhancing llm-based vulnerability detection via knowledge-level rag. arXiv preprint arXiv:2406.11147, 2024.
  102. ART: automatic multi-step reasoning and tool-use for large language models. CoRR, abs/2303.09014, 2023.
  103. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  104. Large language model-powered smart contract vulnerability detection: New perspectives. In 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2023, Atlanta, GA, USA, November 1-4, 2023, pages 297–306. IEEE, 2023.
  105. Static code analysis in the AI era: An in-depth exploration of the concept, function, and potential of intelligent code analysis agents. CoRR, abs/2310.08837, 2023.
  106. A dataset of non-functional bugs. In Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc, editors, Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada, pages 399–403. IEEE / ACM, 2019.
  107. E&v: Prompting large language models to perform static analysis by pseudo-code execution and verification. CoRR, abs/2312.08477, 2023.
  108. Clang, 2024. https://clang.llvm.org/.
  109. syzbot, 2024. https://syzkaller.appspot.com/upstream.
  110. Llm4vuln: A unified evaluation framework for decoupling and enhancing llms’ vulnerability reasoning. CoRR, abs/2401.16185, 2024.
  111. Multi-role consensus through llms discussions for vulnerability detection. CoRR, abs/2403.14274, 2024.
  112. Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Secur. Comput., 19(4):2244–2258, 2022.
  113. Llm-assisted static analysis for detecting security vulnerabilities, 2024.
  114. Ql: Object-oriented queries on relational data. In 30th European Conference on Object-Oriented Programming (ECOOP 2016). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2016.
  115. Enhancing static analysis for practical bug detection: An llm-integrated approach. Proc. ACM Program. Lang., 8(OOPSLA1):474–499, 2024.
  116. Ubitect: a precise and scalable method to detect use-before-initialization bugs in linux kernel. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 221–232, 2020.
  117. Comparative assessment of software quality classification techniques: An empirical case study. Empirical Software Engineering, 9:229–257, 2004.
  118. Auger: automatically generating review comments with pre-training models. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1009–1021, 2022.
  119. Codeagent: Collaborative agents for software engineering. CoRR, abs/2402.02172, 2024.
  120. Ai-powered code review with llms: Early results. CoRR, abs/2404.18496, 2024.
  121. Frustrated with code quality issues? llms can help! CoRR, abs/2309.12938, 2023.
  122. No more manual tests? evaluating and improving chatgpt for unit test generation, 2024.
  123. An empirical evaluation of using large language models for automated unit test generation, 2023.
  124. Chatunitest: a chatgpt-based automated unit test generation tool. CoRR, abs/2305.04764, 2023.
  125. Enhancing llm-based test generation for hard-to-cover branches via program analysis, 2024.
  126. Juan Altmayer Pizzorno and E. Berger. Coverup: Coverage-guided llm-based test generation. ArXiv, abs/2403.16218, 2024.
  127. Effective test generation using pre-trained large language models and mutation testing, 2023.
  128. Conversational automated program repair. arXiv preprint arXiv:2301.13246, 2023.
  129. Slipcover: Near zero-overhead code coverage for python. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’23. ACM, July 2023.
  130. Kernelgpt: Enhanced kernel fuzzing via large language models, 2023.
  131. Syzkaller. https://github.com/google/syzkaller/.
  132. The LLVM Compiler Infrastructure. https://llvm.org.
  133. White-box compiler fuzzing empowered by large language models, 2023.
  134. Isolating compiler bugs by generating effective witness programs with large language models. IEEE Transactions on Software Engineering, page 1–20, 2024.
  135. OCLint. https://github.com/oclint/.
  136. srcslice: A tool for efficient static forward slicing. In Proceedings of the 38th International Conference on Software Engineering Companion, pages 621–624, 2016.
  137. Gcov, 2023. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html.
  138. Frama-C. https://www.frama-c.com/.
  139. Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality-aware decisions. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2023.
  140. virtualbox, 2023. https://www.virtualbox.org/.
  141. pyvbox, 2023. https://pypi.org/project/pyvbox/.
  142. Python wrapper of Android uiautomator test tool, 2021. https://github.com/xiaocong/uiautomator.
  143. Android Debug Bridge (adb) - Android Developers, 2023. https://developer.android.com/studio/command-line/adb/.
  144. Autonomous large language model agents enabling intent-driven mobile gui testing. ArXiv, abs/2311.08649, 2023.
  145. Axnav: Replaying accessibility tests from natural language. In Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI ’24. ACM, May 2024.
  146. Genymotion – Android Emulator for app testing, 2023. https://www.genymotion.com/.
  147. Android Uiautomator2 Python Wrapper, 2023. https://github.com/openatx/uiautomator2/.
  148. Xuat-copilot: Multi-agent collaborative system for automated user acceptance testing with large language model, 2024.
  149. You can rest now: Automated specification inference and black-box testing of restful apis with large language models, 2024.
  150. Fuzz4all: Universal fuzzing with large language models, 2024.
  151. Pentestgpt: An llm-empowered automatic penetration testing tool, 2024.
  152. Metasploit framework. https://www.metasploit.com/.
  153. Llm agents can autonomously exploit one-day vulnerabilities, 2024.
  154. Fill in the blank: Context-aware automated text input generation for mobile GUI testing. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 1355–1367. IEEE, 2023.
  155. Droidbot-gpt: Gpt-powered UI automation for android. CoRR, abs/2304.07061, 2023.
  156. Foundations of attack trees. In Information Security and Cryptology-ICISC 2005: 8th International Conference, Seoul, Korea, December 1-2, 2005, Revised Selected Papers 8, pages 186–198. Springer, 2006.
  157. A survey on software fault localization. IEEE Transactions on Software Engineering, 42(8):707–740, 2016.
  158. Automatic software repair: A survey. In Proceedings of the 40th International Conference on Software Engineering, pages 1219–1219, 2018.
  159. A unified debugging approach via llm-based multi-agent synergy. CoRR, abs/2404.17153, 2024.
  160. Agentfl: Scaling llm-based fault localization to project-level context. CoRR, abs/2403.16362, 2024.
  161. tree-sitter. https://github.com/tree-sitter/tree-sitter/.
  162. Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models. CoRR, abs/2310.16340, 2023.
  163. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pages 169–180, 2019.
  164. Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering, 47(9):1943–1959, 2019.
  165. Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. In ISSTA, 2024.
  166. Defects4j: A database of existing faults to enable controlled testing studies for java programs. In Proceedings of the 2014 international symposium on software testing and analysis, pages 437–440, 2014.
  167. Quixbugs: a multi-lingual program repair benchmark set based on the quixey challenge. In Gail C. Murphy, editor, Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, SPLASH 2017, Vancouver, BC, Canada, October 23 - 27, 2017, pages 55–56. ACM, 2017.
  168. Cigar: Cost-efficient program repair with llms. CoRR, abs/2402.06598, 2024.
  169. Evaluating large language models trained on code, 2021.
  170. Repairagent: An autonomous, llm-based agent for program repair. CoRR, abs/2403.17134, 2024.
  171. Explainable automated debugging via large language model-driven scientific debugging. CoRR, abs/2304.02195, 2023.
  172. Bugsinpy: a database of existing bugs in python programs to enable controlled testing and debugging studies. In Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pages 1556–1560, 2020.
  173. ACFIX: guiding llms with mined common RBAC practices for context-aware repair of access control vulnerabilities in smart contracts. CoRR, abs/2403.06838, 2024.
  174. Yang Chen. Flakiness repair in the era of large language models. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, pages 441–443, 2024.
  175. International Dataset of Flaky tests. https://github.com/TestingResearchIllinois/idoft.
  176. Domain-specific fixes for flaky tests with wrong assumptions on underdetermined specifications. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021, pages 50–61. IEEE, 2021.
  177. Repairing order-dependent flaky tests via test generation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, pages 1881–1892. ACM, 2022.
  178. Andreas Zeller. Why Programs Fail - A Guide to Systematic Debugging, 2nd Edition. Academic Press, 2009.
  179. Can automated program repair refine fault localization? a unified debugging approach. In Sarfraz Khurshid and Corina S. Pasareanu, editors, ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, pages 75–87. ACM, 2020.
  180. Evaluating and improving unified debugging. IEEE Trans. Software Eng., 48(11):4692–4716, 2022.
  181. On the effectiveness of unified debugging: An extensive study on 16 program repair systems. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020, pages 907–918. IEEE, 2020.
  182. Low-code LLM: visual programming over llms. CoRR, abs/2304.08103, 2023.
  183. Prompt sapper: Llm-empowered software engineering infrastructure for ai-native services. CoRR, abs/2306.02230, 2023.
  184. Multi-agent collaboration: Harnessing the power of intelligent LLM agents. CoRR, abs/2306.03314, 2023.
  185. Communicative agents for software development. CoRR, abs/2307.07924, 2023.
  186. Metagpt: Meta programming for a multi-agent collaborative framework, 2023.
  187. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. CoRR, abs/2308.10848, 2023.
  188. Autoagents: A framework for automatic agent generation. CoRR, abs/2309.17288, 2023.
  189. Experiential co-learning of software-developing agents. CoRR, abs/2312.17025, 2023.
  190. Experimenting a new programming practice with llms. CoRR, abs/2401.01062, 2024.
  191. LLM4PLC: harnessing large language models for verifiable programming of plcs in industrial control systems. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2024, Lisbon, Portugal, April 14-20, 2024, pages 192–203. ACM, 2024.
  192. Codepori: Large scale model for autonomous software development by using multi-agents. CoRR, abs/2402.01411, 2024.
  193. When llm-based code generation meets the software development process. CoRR, abs/2403.15852, 2024.
  194. Codes: Natural language to code repository via multi-layer sketch. CoRR, abs/2403.16443, 2024.
  195. Iterative experience refinement of software-developing agents. CoRR, abs/2405.04219, 2024.
  196. Multi-agent software development through cross-team collaboration, 2024.
  197. Agilecoder: Dynamic collaborative agents for software development based on agile methodology. arXiv preprint arXiv:2406.11912, 2024.
  198. W. W. Royce. Managing the development of large software systems: Concepts and techniques. In William E. Riddle, Robert M. Balzer, and Kouichi Kishida, editors, Proceedings, 9th International Conference on Software Engineering, Monterey, California, USA, March 30 - April 2, 1987, pages 328–339. ACM Press, 1987.
  199. HA Thakur and SD Maurya. A survey on incremental software development life cycle model. Int. J. Eng. Technol. Comput. Res, 3(2):102–106, 2016.
  200. The unified process. Ieee Software, 16(3):96, 1999.
  201. Mikio Aoyama. Agile software process model. In Proceedings Twenty-First Annual International Computer Software and Applications Conference (COMPSAC’97), pages 454–459. IEEE, 1997.
  202. Codescore: Evaluating code generation by learning code execution, 2023.
  203. Program synthesis with large language models, 2021.
  204. MAGIS: llm-based multi-agent framework for github issue resolution. CoRR, abs/2403.17927, 2024.
  205. Autocoderover: Autonomous program improvement. CoRR, abs/2404.05427, 2024.
  206. Swe-agent: Agent-computer interfaces enable automated software engineering. arXiv preprint arXiv:2405.15793, 2024.
  207. Coder: Issue resolving with multi-agent and task graphs. arXiv preprint arXiv:2406.01304, 2024.
  208. How to understand whole software repository? arXiv preprint arXiv:2406.01422, 2024.
  209. Masai: Modular architecture for software-engineering ai agents, 2024.
  210. Agentless: Demystifying llm-based software engineering agents. CoRR, abs/2407.01489, 2024.
  211. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University, pages 232–241. Springer, 1994.
  212. Swe-bench: Can language models resolve real-world github issues? CoRR, abs/2310.06770, 2023.
  213. SWE-bench Lite, 2024. https://www.swebench.com/lite.html.
  214. Introducing SWE-bench Verified, 2024. https://openai.com/index/introducing-swe-bench-verified/.
  215. Function Calling and other API updates, 2023. https://openai.com/index/function-calling-and-other-api-updates/.
  216. GPT-3.5, 2023. https://platform.openai.com/docs/models/gpt-3-5-turbo.
  217. GPT-4, 2023. https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo.
  218. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  219. Low-code llm: Visual programming over llms. arXiv preprint arXiv:2304.08103, 2, 2023.
  220. Transformer feed-forward layers are key-value memories. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 5484–5495. Association for Computational Linguistics, 2021.
  221. Iain D Craig. Blackboard systems. Artificial Intelligence Review, 2(2):103–118, 1988.
  222. Uml diagrams in software engineering research: a systematic literature review. In Proceedings, volume 74, page 13. MDPI, 2021.
  223. Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern recognition, 96:106954, 2019.
  224. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  225. Enchanting program specification synthesis by large language models using static analysis and program verification, 2024.
  226. Repoagent: An llm-powered open-source framework for repository-level code documentation generation, 2024.
  227. DuckDuckGo. https://duckduckgo.com/.
  228. SerpApi. https://serpapi.com/.
  229. Cocost: Automatic complex code generation with online searching and correctness testing, 2024.
  230. Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognition, 96:106954, 2019.
  231. Screen recognition: Creating accessibility metadata for mobile applications from pixels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2021.
  232. Antlr: A predicated-ll(k) parser generator. Software: Practice and Experience, 25(7):789–810, 1995.
  233. Jedi. https://github.com/davidhalter/jedi/.
  234. EclipseJDTLS. https://github.com/eclipse-jdtls/eclipse.jdt.ls.
  235. Black. https://github.com/psf/black.
  236. The nuxmv symbolic model checker. In Armin Biere and Roderick Bloem, editors, Computer Aided Verification, pages 334–342, Cham, 2014. Springer International Publishing.
  237. Slither: a static analysis framework for smart contracts. In 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pages 8–15. IEEE, 2019.
  238. GPT-4, 2024. https://docs.oracle.com/javase/8/docs/api/java/lang/instrument/package-summary.html.
  239. Pynguin: automated unit test generation for python. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, ICSE ’22. ACM, May 2022.
  240. Konrad Hałas. Mutpy: a mutation testing tool for python 3.x source code, 2019. https://github.com/mutpy/mutpy.
  241. Gzoltar: an eclipse plug-in for testing and debugging. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering, pages 378–381, 2012.
  242. Git. https://git-scm.com/.
  243. Engineering safety requirements for autonomous driving with large language models. CoRR, abs/2403.16289, 2024.
  244. OpenAI: Introducing ChatGPT, 2022. https://openai.com/blog/chatgpt/.
  245. Deepseek-coder: When the large language model meets programming - the rise of code intelligence. CoRR, abs/2401.14196, 2024.
  246. Starcoder 2 and the stack v2: The next generation. arXiv preprint arXiv:2402.19173, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Junwei Liu (71 papers)
  2. Kaixin Wang (30 papers)
  3. Yixuan Chen (19 papers)
  4. Xin Peng (82 papers)
  5. Zhenpeng Chen (39 papers)
  6. Lingming Zhang (48 papers)
  7. Yiling Lou (28 papers)
Citations (14)
Youtube Logo Streamline Icon: https://streamlinehq.com