SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering (2405.15793v3)

Published 6 May 2024 in cs.SE, cs.AI, cs.CL, cs.HC, and cs.LG

Abstract: LLM (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software engineering, we posit that LM agents represent a new category of end users with their own needs and abilities, and would benefit from specially-built interfaces to the software they use. We investigate how interface design affects the performance of LLM agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively, far exceeding the previous state-of-the-art achieved with non-interactive LMs. Finally, we provide insight on how the design of the ACI can impact agents' behavior and performance.

PDF Abstract

SWE-agent: Enhancing Software Engineering with AI

The paper introduces SWE-agent, a novel system designed to enhance the capabilities of LLM (LM) agents, particularly in the domain of automated software engineering. By leveraging a specialized agent-computer interface (ACI), SWE-agent facilitates more efficient and effective interaction between LMs and computer environments, thereby improving their performance in software engineering tasks.

Overview

SWE-agent addresses the limitations of traditional LMs in handling complex programming tasks by introducing a custom ACI. This interface includes tools and commands tailored specifically for LMs, such as search functionalities, file navigation capabilities, and advanced file editing features. These components are designed to accommodate the unique needs and capabilities of LMs, as opposed to existing human-centered interfaces like the Linux shell.

Key Components of SWE-agent

File Viewer and Editor: The SWE-agent's file viewer allows LMs to view code files in a manageable format, with features such as scrolling and direct line access. The file editor, integrated within this system, supports efficient multi-line edits and provides immediate feedback through syntax checking, reducing the likelihood of errors.
Search and Navigation Tools: The ACI provides search functionalities that allow LMs to locate relevant code efficiently. This includes directory-level and file-level searches, which help in localizing problems within large codebases.
Context Management: SWE-agent employs informative prompts and error messages that guide the LM's actions, ensuring they stay focused on relevant tasks and maintain useful context throughout the interaction.

Evaluation and Performance

SWE-agent was evaluated using SWE-bench and HumanEvalFix, where it demonstrated superior performance compared to non-interactive LMs. On SWE-bench Lite, SWE-agent achieved a resolution rate of 18.00% using GPT-4 Turbo, significantly outperforming previous benchmarks. The system also excelled in code editing tasks in HumanEvalFix, with a pass@\num{1} rate of 87.7%, showcasing its efficacy in addressing real-world software engineering challenges.

Implications and Future Directions

The introduction of SWE-agent marks a notable advancement in the field of automated software engineering. By designing interfaces that cater specifically to LMs, the system enhances their ability to navigate and manipulate codebases, ultimately improving their performance on complex programming tasks.

The theoretical implications of this research suggest a promising direction for future developments in AI. By focusing on ACI design, there is potential for significant improvements in how LMs interact with various digital environments, extending beyond software engineering.

Furthermore, the practical applications of SWE-agent are considerable. With the ability to effectively automate software engineering processes, there are opportunities for increased efficiency in software development workflows, potentially reducing the need for human intervention in routine coding tasks.

Conclusion

SWE-agent represents a significant contribution to the field of automated software engineering, offering a robust framework that enhances the capabilities of LM agents. Through careful design and implementation of a custom ACI, SWE-agent lays the groundwork for future innovations in AI-driven software development. This research not only improves the current state of automated programming but also opens new avenues for exploring the potential of LMs in various domains.