SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Published 9 Jan 2025 in cs.CL | (2501.05040v3)

Abstract: LLMs have demonstrated remarkable proficiency across a variety of complex tasks. One significant application of LLMs is in tackling software engineering challenges, particularly in resolving real-world tasks on GitHub by fixing code based on the issues reported by the users. However, many current approaches rely on proprietary LLMs, which limits reproducibility, accessibility, and transparency. The critical components of LLMs for addressing software engineering issues and how their capabilities can be effectively enhanced remain unclear. To address these challenges, we introduce SWE-Fixer, a novel open-source framework designed to effectively and efficiently resolve GitHub issues. SWE-Fixer comprises two essential modules: a code file retrieval module and a code editing module. The retrieval module employs BM25 along with a lightweight model to achieve coarse-to-fine file retrieval. Subsequently, the code editing module utilizes the other model to generate patches for the identified files. To mitigate the lack of publicly available datasets, we compile an extensive dataset that includes 110K GitHub issues along with their corresponding patches and train the two models of SWE-Fixer separately. We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving competitive performance among open-source models with scores of 22.0% and 30.2%. Furthermore, SWE-Fixer reaches state-of-the-art performance (24.7% on Lite and 32.8% on Verified) with PASS_TO_PASS (P2P) filtering. Additionally, our approach requires only two model calls per instance, making it significantly more efficient than existing methods. These results highlight the effectiveness of SWE-Fixer in real-world code-fixing scenarios. We will make our model, dataset, and code publicly available at https://github.com/InternLM/SWE-Fixer.

Abstract PDF Upgrade to Chat

Summary

The paper introduces SWE-Fixer as an innovative open-source LLM that achieves state-of-the-art performance in resolving GitHub issues.
The paper employs a two-module strategy combining BM25-based file retrieval with a lightweight LLM for efficient code editing.
The paper contributes a robust dataset of 110K GitHub issues, enhancing reproducibility and promoting further research in automated issue management.

Evaluation of SWE-Fixer: A Scalable Solution for Real-World GitHub Issue Resolution

The paper "SWE-Fixer: Training LLMs for Real World Github Issue Resolving" introduces an innovative approach to the use of LLMs in software engineering, specifically targeting the automation of resolving issues reported on GitHub repositories. The authors present SWE-Fixer, an open-source LLM framework designed to address the prevalent challenges in the field—reproducibility, accessibility, and transparency—that are often hindered by proprietary systems.

Overview of SWE-Fixer

SWE-Fixer is composed of two primary modules: a code file retrieval module and a code editing module. For the retrieval task, the authors utilize a hybrid approach combining BM25—a probabilistic retrieval function—and a lightweight LLM to narrow down relevant files from an issue report. This coarse-to-fine strategy aims at efficient file retrieval, ahead of using the code editing module to generate patches. A significant highlight of this work is the compilation of a comprehensive dataset comprising 110K GitHub issues and their respective patches, which serves both as a training backbone for SWE-Fixer and a significant community contribution by addressing the scarcity of publicly available datasets.

On benchmark evaluations using SWE-Bench Lite and Verified, SWE-Fixer demonstrated state-of-the-art performance with scores of 23.3% and 30.2%, respectively. This solidifies its standing as the top-performing open-source model in this domain, showcasing that SWE-Fixer achieves superior results, even surpassing some methodologies relying on proprietary models.

Key Contributions

State-of-the-Art Open Source Performance: SWE-Fixer represents a significant advancement by demonstrating that an open-source LLM solution can achieve comparable or better results against solutions based on powerful proprietary models. This addresses a crucial barrier of accessibility in the deployment of LLMs for coding challenges.
Methodological Simplicity and Efficiency: The pipeline-based architecture of SWE-Fixer, comprising only two essential tasks—file retrieval and code editing—demonstrates that complexity does not equate to efficacy necessarily. By reducing the number of reasoning steps required for completion, SWE-Fixer is both time-efficient and computationally less demanding compared to multi-stage, agent-based solutions.
Comprehensive Dataset: The construction of a robust dataset significantly larger than prior offerings attempts to close a critical gap in training resources. This dataset helps to ensure the validity and extensibility of the approach in various real-world applications, thereby serving as a valuable asset to the research community.

Implications and Future Directions

The deployment of SWE-Fixer as an open-source framework paves diverse pathways for future research and practical applications in AI-driven software development. As an accessible tool, it can stimulate further research into refining LLM capabilities relevant to software engineering tasks, potentially leading to even more efficient and intelligent models. As for practical implications, SWE-Fixer presents a scalable, modular approach that can be adapted for a range of issue-tracking systems beyond GitHub, potentially benefiting large-scale software maintenance and development environments.

In the foreseeable future, integration with dynamic execution environments could enhance the tool's capabilities in predictive maintenance and validation of patches. Additionally, expanding SWE-Fixer to accommodate more diverse programming languages and frameworks could widen its applicability, potentially impacting how developers interact with both proprietary and open-source repositories alike.

Conclusion

SWE-Fixer's introduction marks a pivotal step in the evolution of LLM-based tools for real-world software engineering tasks. By bridging the gap between proprietary efficacy and open-source accessibility, it sets a foundation for continued advancements that prioritize transparency and community engagement in AI research related to software maintenance and development. Consequently, SWE-Fixer not only underscores the viability of open-source solutions but also inspires ongoing exploration into cutting-edge AI tools for automated software resolution tasks.

Markdown