Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement (2411.00622v1)

Published 1 Nov 2024 in cs.SE and cs.AI

Abstract: Recent advancements in LLM-based agents have led to significant progress in automatic software engineering, particularly in software maintenance and evolution. Despite these encouraging advances, current research faces two major challenges. First, SOTA performance primarily depends on closed-source models, which significantly limits the technology's accessibility, and potential for customization in diverse SE tasks. Second, these models are predominantly trained on static code data, lacking a deep understanding of the dynamic interactions, iterative problem-solving processes, and evolutionary characteristics inherent in software development. To address these challenges, our study adopts a software engineering perspective. We recognize that real-world software maintenance and evolution processes encompass not only static code data but also developers' thought processes, utilization of external tools, and the interaction between different functional personnel. Consequently, we introduce the Lingma SWE-GPT series, comprising Lingma SWE-GPT 7B and 72B. By learning from and simulating real-world code submission activities, Lingma SWE-GPT systematically incorporates the dynamic interactions and iterative problem-solving inherent in software development process, thereby achieving a more comprehensive understanding of software improvement processes. We conducted experimental evaluations using SWE-bench Verified benchmark. The results demonstrate that Lingma SWE-GPT 72B successfully resolves 30.20% of the GitHub issues, marking a significant improvement in automatic issue resolution (22.76% relative improvement compared to Llama 3.1 405B), approaching the performance of closed-source models (31.80\% issues of GPT-4o resolved). Notably, Lingma SWE-GPT 7B resolves 18.20% of the issues, highlighting the potential for applying smaller models to ASE tasks.

PDF HTML Abstract

An Evaluation of Lingma SWE-GPT for Automated Software Improvement

The paper presents Lingma SWE-GPT, a series of open-source LLMs targeted at improving the efficiency and effectiveness of automated software improvement tasks. The presented models, Lingma SWE-GPT 72B and Lingma SWE-GPT 7B, represent a notable effort to match the performance of existing closed-source models like GPT-4o and Claude 3.5 Sonnet, while maintaining accessibility and alleviating privacy concerns associated with closed systems.

Lingma SWE-GPT significantly progresses from the reliance on static code data, addressing a key limitation in existing models. It employs a more dynamic approach by adopting a workflow that mimics the real-world software engineering process. The model follows a well-defined development pipeline: repository understanding, fault localization, and patch generation. This method notably allows the model to adapt to dynamic interactions and iterative processes inherent in software development. Such a technique could advance the real-world utility of LLMs in practical software development scenarios where the ability to discern complex project structures and generate context-sensitive solutions is critical.

The paper introduces an extensive evaluation of Lingma SWE-GPT that relies on SWE-bench Verified and Lite benchmarks, setting it vis-à-vis existing open-source and closed-source models. The results reveal that Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, showcasing a 22.76% relative improvement in automatic issue resolution capabilities over Llama 3.1 405B, and nearly equaling the performance of GPT-4o. This finding demonstrates strong potential for open-source models in practical automated software tasks, presenting a more accessible counterpart to the currently dominant closed-source models.

Additionally, the model explores how smaller-scale models, such as Lingma SWE-GPT 7B, can produce competitive results. With a resolution rate of 18.20% on the benchmarks, the 7B model surpasses the 17.20% resolution rate of Llama 3.1 70B. This highlights the utility and efficiency of smaller models, which might appeal to settings with constrained computational resources.

One of the pivotal contributions of the paper is the development process-centric training strategy, which is key to the model's robust performance. This approach efficiently leverages the real-world dynamics of software processes, refining the models through curated development data synthesis, including reasoning patterns, tool interactions, and practical problem resolutions. By incorporating a comprehensive curriculum training strategy, Lingma SWE-GPT shows improved capabilities to handle increasingly complex software tasks with higher reliability.

The implications of this research, both practical and theoretical, invite future exploration in several directions. Practically, Lingma SWE-GPT establishes a framework that democratizes access to high-performing automation tools for software improvement. Theoretically, the results underline the importance of a dynamic and process-oriented training paradigm to enhance the contextual understanding and execution abilities of LLMs in intricate, real-world applications. The authors speculate that further advancements could explore more sophisticated tool usage, reasoning, and verification capabilities, crucial for advancing AI-assisted software engineering into more extensive domains and broader stages of the software lifecycle.

In summary, this paper clearly defines a path towards accessible and efficient models for software engineering tasks, challenging the status quo dominated by closed-source models. The research lays a foundation for future inquiries into enhancing LLMs' comprehension and reasoning capabilities, with profound implications for the automation and quality enhancement of software engineering.

PDF Markdown Bookmark Chat (Pro)

References (73)

Authors (10)

Yingwei Ma (15 papers)
Rongyu Cao (14 papers)
Yongchang Cao (4 papers)
Yue Zhang (618 papers)
Jue Chen (5 papers)
Yibo Liu (34 papers)
Yuchen Liu (156 papers)
Binhua Li (30 papers)
Fei Huang (408 papers)
Yongbin Li (128 papers)

Tweets

https://twitter.com/gm8xx8/status/1853292117939896794

https://twitter.com/jiayi_pirate/status/1871251318842741124

Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement (2411.00622v1)

An Evaluation of Lingma SWE-GPT for Automated Software Improvement

Related Papers

Tweets