AutoCodeRover: Autonomous Program Improvement (2404.05427v3)

Published 8 Apr 2024 in cs.SE and cs.AI

Abstract: Researchers have made significant progress in automating the software development process in the past decades. Recent progress in LLMs has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless, software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum-based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite (300 real-life GitHub issues) show increased efficacy in solving GitHub issues (19% on SWE-bench-lite), which is higher than the efficacy of the recently reported SWE-agent. In addition, AutoCodeRover achieved this efficacy with significantly lower cost (on average, $0.43 USD), compared to other baselines. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.

PDF HTML Abstract

AutoCodeRover: Pioneering Autonomous Program Improvement via LLMs and Sophisticated Code Search

Introduction to AutoCodeRover

The landscape of software development is perpetually evolving, with automation playing an increasingly pivotal role in streamlining various processes. A significant advancement in this domain is the emergence of AutoCodeRover, an innovative approach to achieving autonomous program improvement. The essence of AutoCodeRover revolves around addressing two core aspects of software maintenance; program repair and feature addition, via the use of LLMs supplemented with refined code search capabilities.

Methodology

AutoCodeRover distinguishes itself through a unique methodological approach that leverages LLMs not as mere programming assistants but as sophisticated agents capable of iterative codebase navigation and context enrichment. This process involves:

Initial Analysis: Extracting keywords from issue descriptions that could represent code elements within the codebase.
Stratified Strategy for Code Search: Employing a tiered strategy that guides the LLM agent through multiple code search API invocations, refining the search context iteratively.
Exploiting Program Structure: Utilizing the abstract syntax tree (AST) for a granular understanding of the software project, enhancing the effectiveness of context retrieval.
Incorporation of Fault Localization Techniques: When available, employing spectrum-based fault localization aided by tests to further refine the search for buggy locations.

This methodology is rigorously validated using the SWE-bench-lite benchmark, which includes 300 real-life GitHub issues. The findings present a compelling case, demonstrating a resolution rate exceeding 20% on SWE-bench-lite, significantly surpassing existing AI community efforts.

Implications and Future Directions

The implications of AutoCodeRover's approach are multifaceted, extending beyond mere numbers. It pioneers a software engineering-oriented perspective in the deployment of LLMs for program improvement tasks. Such a perspective is encapsulated through the emphasis on program representations like AST, the strategic exploitation of program structure in code search, and the necessity of balancing efficacy and efficiency within acceptable time thresholds.

Moreover, AutoCodeRover's workflow opens avenues for further exploration and integration of additional software engineering tools and methodologies. This includes the potential of incorporating more dynamic analysis techniques, fine-tuning the code search mechanism to accommodate larger codebases, and exploring means to reduce the reliance on exhaustive tests for fault localization.

Conclusion

AutoCodeRover represents a significant stride towards realizing the vision of autonomous software engineering. By effectively merging the capabilities of LLMs with structured code search and fault localization techniques, it lays the groundwork for future advancements in automating not only program repair but also feature addition tasks. As we continue to navigate the complexities of software development, tools like AutoCodeRover will undoubtedly play an instrumental role in shaping the methodologies and efficiencies of tomorrow's software engineering landscapes.