Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoCodeRover: Autonomous Program Improvement (2404.05427v3)

Published 8 Apr 2024 in cs.SE and cs.AI
AutoCodeRover: Autonomous Program Improvement

Abstract: Researchers have made significant progress in automating the software development process in the past decades. Recent progress in LLMs has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless, software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum-based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite (300 real-life GitHub issues) show increased efficacy in solving GitHub issues (19% on SWE-bench-lite), which is higher than the efficacy of the recently reported SWE-agent. In addition, AutoCodeRover achieved this efficacy with significantly lower cost (on average, $0.43 USD), compared to other baselines. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.

AutoCodeRover: Pioneering Autonomous Program Improvement via LLMs and Sophisticated Code Search

Introduction to AutoCodeRover

The landscape of software development is perpetually evolving, with automation playing an increasingly pivotal role in streamlining various processes. A significant advancement in this domain is the emergence of AutoCodeRover, an innovative approach to achieving autonomous program improvement. The essence of AutoCodeRover revolves around addressing two core aspects of software maintenance; program repair and feature addition, via the use of LLMs supplemented with refined code search capabilities.

Methodology

AutoCodeRover distinguishes itself through a unique methodological approach that leverages LLMs not as mere programming assistants but as sophisticated agents capable of iterative codebase navigation and context enrichment. This process involves:

  • Initial Analysis: Extracting keywords from issue descriptions that could represent code elements within the codebase.
  • Stratified Strategy for Code Search: Employing a tiered strategy that guides the LLM agent through multiple code search API invocations, refining the search context iteratively.
  • Exploiting Program Structure: Utilizing the abstract syntax tree (AST) for a granular understanding of the software project, enhancing the effectiveness of context retrieval.
  • Incorporation of Fault Localization Techniques: When available, employing spectrum-based fault localization aided by tests to further refine the search for buggy locations.

This methodology is rigorously validated using the SWE-bench-lite benchmark, which includes 300 real-life GitHub issues. The findings present a compelling case, demonstrating a resolution rate exceeding 20% on SWE-bench-lite, significantly surpassing existing AI community efforts.

Implications and Future Directions

The implications of AutoCodeRover's approach are multifaceted, extending beyond mere numbers. It pioneers a software engineering-oriented perspective in the deployment of LLMs for program improvement tasks. Such a perspective is encapsulated through the emphasis on program representations like AST, the strategic exploitation of program structure in code search, and the necessity of balancing efficacy and efficiency within acceptable time thresholds.

Moreover, AutoCodeRover's workflow opens avenues for further exploration and integration of additional software engineering tools and methodologies. This includes the potential of incorporating more dynamic analysis techniques, fine-tuning the code search mechanism to accommodate larger codebases, and exploring means to reduce the reliance on exhaustive tests for fault localization.

Conclusion

AutoCodeRover represents a significant stride towards realizing the vision of autonomous software engineering. By effectively merging the capabilities of LLMs with structured code search and fault localization techniques, it lays the groundwork for future advancements in automating not only program repair but also feature addition tasks. As we continue to navigate the complexities of software development, tools like AutoCodeRover will undoubtedly play an instrumental role in shaping the methodologies and efficiencies of tomorrow's software engineering landscapes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. 2022. GitHub Copilot, your AI pair programmer. https://github.com/features/copilot/
  2. On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007). IEEE, IEEE, 89–98. https://doi.org/10.1109/taic.part.2007.13
  3. Getafix: Learning to fix bugs automatically. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–27.
  4. Fuzzing: Challenges and Reflections. IEEE Software 38, 3 (2021).
  5. Cristian Cadar and Koushik Sen. 2013. Symbolic execution for software testing: three decades later. Commun. ACM 56, 2 (2013).
  6. Evaluating Large Language Models Trained on Code. arXiv.org abs/2107.03374 (7 2021). arXiv:2107.03374 https://arxiv.org/abs/2107.03374
  7. Automated Repair of Programs from Large Language Models.. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, IEEE, 1469–1481. https://doi.org/10.1109/icse48619.2023.00128
  8. Crash-avoiding program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 8–18.
  9. Automated program repair. Commun. ACM 62 (11 2019), 56–65. Issue 12.
  10. Nadeeshaan Gunasinghe and Nipuna Marcus. 2021. Language Server Protocol and Implementation. Springer.
  11. Impact of Code Language Models on Automated Program Repair., In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 (Melbourne, Victoria, Australia). International Conference on Software Engineering, 1430–1442. https://doi.org/10.1109/icse48619.2023.00125
  12. Leaderboard results on SWE-bench. Retrieved April 8, 2024 from https://www.swebench.com/
  13. SWE-bench: Can Language Models Resolve Real-world Github Issues?. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66
  14. Wuxia Jin et al. 2024. PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python Code. In International Conference on Software Engineering (ICSE).
  15. Visualization of test information to assist fault localization. In Proceedings of the 24th international conference on Software engineering. 467–477.
  16. A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 114–125.
  17. Cognition Labs. 2024. Devin, AI software engineer. https://www.cognition-labs.com/introducing-devin.
  18. Competition-Level Code Generation with AlphaCode. Science abs/2203.07814, 6624 (12 2022), 1092–1097. https://doi.org/10.48550/arxiv.2203.07814
  19. CoCoNuT: combining context-aware neural translation models using ensemble for program repair., In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, Sarfraz Khurshid and Corina S. Pasareanu (Eds.). International Symposium on Software Testing and Analysis, 101–114.
  20. Sapfix: Automated end-to-end repair at scale. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 269–278.
  21. Angelix: scalable multiline program patch synthesis via symbolic analysis., In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, Laura K. Dillon, Willem Visser, and Laurie Williams (Eds.). International Conference on Software Engineering, 691–701.
  22. SemFix: program repair via semantic analysis., In 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 (San Francisco, CA, USA), David Notkin, Betty H. C. Cheng, and Klaus Pohl (Eds.). International Conference on Software Engineering, 772–781. https://doi.org/10.1109/icse.2013.6606623
  23. Trust Enhancement Issues in Program Repair. In IEEE/ACM 44th International Conference on Software Engineering (ICSE).
  24. Brayan Stiven Torrres Ovalle. 2023. GitHub Copilot. https://doi.org/10.26507/paper.2300
  25. Examining zero-shot vulnerability repair with large language models. In IEEE Symposium on Security and Privacy (SP).
  26. Okapi at TREC-3. NIST special publication 500225 (1995), 109–123.
  27. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
  28. Barbara G Ryder. 1979. Constructing the call graph of a program. IEEE Transactions on Software Engineering 3 (1979), 216–226.
  29. Pycg: Practical call graph generation in python. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1646–1657.
  30. Is the cure worse than the disease? overfitting in automated program repair., In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015, Elisabetta Di Nitto, Mark Harman, and Patrick Heymans (Eds.). ESEC/SIGSOFT FSE, 532–543. http://people.cs.umass.edu/%7Ebrun/pubs/pubs/Smith15fse.pdf
  31. Anti-patterns in search-based program repair., In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, Thomas Zimmermann 0001, Jane Cleland-Huang, and Zhendong Su (Eds.). SIGSOFT FSE, 727–738.
  32. MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution. arXiv preprint arXiv:2403.17927 (2024).
  33. Automatically finding patches using genetic programming., In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings. 2009 IEEE 31st International Conference on Software Engineering, 364–374. https://doi.org/10.1109/icse.2009.5070536
  34. User-Centric Deployment of Automated Program Repair at Bloomberg. arXiv preprint arXiv:2311.10516 (2023).
  35. A survey on software fault localization. IEEE Transactions on Software Engineering (2016), 707–740. Issue 8.
  36. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
  37. Automated program repair in the era of large pre-trained language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482–1494.
  38. SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models.
  39. A syntax-guided edit decoder for neural program repair., In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ESEC/SIGSOFT FSE, 341–353. https://arxiv.org/pdf/2106.08253
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuntong Zhang (11 papers)
  2. Haifeng Ruan (2 papers)
  3. Zhiyu Fan (7 papers)
  4. Abhik Roychoudhury (41 papers)
Citations (32)
Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews