Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement (2408.05006v1)

Published 9 Aug 2024 in cs.SE and cs.AI

Abstract: Debugging is a vital aspect of software development, yet the debugging capabilities of LLMs remain largely unexplored. This paper first introduces DEBUGEVAL, a comprehensive benchmark designed to evaluate the debugging capabilities of LLMs. DEBUGEVAL collects data from existing high-quality datasets and designs four different tasks to evaluate the debugging effectiveness, including BUG Localization, BUG Identification, Code Review, and Code Repair. Additionally, to enhance the code debugging ability of LLMs, this paper proposes a CoMmunicative Agent BaSed DaTa REfinement FRamework (MASTER), which generates the refined code debugging data for supervised finetuning. Specifically, MASTER employs the Code Quizzer to generate refined data according to the defined tasks of DEBUGEVAL. Then the Code Learner acts as a critic and reserves the generated problems that it can not solve. Finally, the Code Teacher provides a detailed Chain-of-Thought based solution to deal with the generated problem. We collect the synthesized data and finetune the Code Learner to enhance the debugging ability and conduct the NeuDebugger model. Our experiments evaluate various LLMs and NeuDebugger in the zero-shot setting on DEBUGEVAL. Experimental results demonstrate that these 7B-scale LLMs have weaker debugging capabilities, even these code-oriented LLMs. On the contrary, these larger models (over 70B) show convincing debugging ability. Our further analyses illustrate that MASTER is an effective method to enhance the code debugging ability by synthesizing data for Supervised Fine-Tuning (SFT) LLMs.

PDF HTML Abstract

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

The paper "Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement" introduces DebugEval, a novel benchmark specifically designed to evaluate and enhance the debugging capabilities of LLMs. The authors address a meaningful gap in the current landscape of AI research by focusing on the underexplored area of code debugging in LLMs. This work also proposes the MASTER framework, a sophisticated data refinement approach employing communicative agents to improve the supervised fine-tuning process of LLMs.

DebugEval Benchmark

DebugEval is introduced to provide a comprehensive evaluation of LLMs' debugging abilities. The benchmark consists of four tasks: BUG Localization, BUG Identification, Code Review, and Code Repair, each chosen to challenge and measure specific aspects of debugging capability. These tasks are designed to replicate realistic debugging scenarios and include various programming languages, thereby offering a robust and multi-faceted challenge to LLMs. The emphasis on using both user-written bugs and GPT-4-generated code errors enhances the authenticity and complexity of the evaluation environment.

MASTER Framework

The paper puts forward the CoMmunicative Agent Based DaTa REfinement FRamework (MASTER) to improve LLMs' debugging efficacy through high-quality data synthesis. MASTER leverages three agents: Code Quizzer, Code Learner, and Code Teacher, to iteratively refine debugging problem data, ultimately generating improved datasets for fine-tuning. This process facilitates the learning of debugging nuances, which standard pretraining datasets may not fully encapsulate.

One of the noteworthy experimental outcomes highlighted is that conventional 7B-scale LLMs demonstrate limited debugging capacity, even when code-oriented. In contrast, models exceeding 70B parameters exhibit more convincing debugging skills. Importantly, models fine-tuned with data synthesized through MASTER, like NeuDebugger, showed marked improvements, with enhancements of up to 27.7% in certain configurations. The framework's ability to refine data effectively to bolster LLM capabilities is a pertinent contribution.

Theoretical and Practical Implications

The research underscores a crucial aspect of model scalability and specialization in LLMs, emphasizing that sheer parameter count is not the sole determinant of efficacy in specialized tasks like debugging. The implementation of specialized fine-tuning data tailored through the MASTER framework exemplifies how targeted interventions can significantly elevate model performance in niche areas.

Practically, this work opens avenues for improving automated debugging processes in software development, potentially reducing the time and expertise required for debugging tasks. The insights gained from this paper could inspire further refinement of LLM applications across different specialized tasks beyond debugging.

Future Directions

The paper invites future research directions to explore enhancing LLMs' abilities in other complex and domain-specific tasks. The need for higher-fidelity data synthesis mechanisms, akin to MASTER, remains a promising avenue for broader AI development. Furthermore, the consideration of incorporating richer debugging feedback loops and additional interactive agents could further consolidate the capabilities of such systems.

In conclusion, the paper makes a substantial contribution to the understanding and development of LLMs in the domain of automated code debugging. By presenting both a benchmark and a novel data refinement framework, the authors provide valuable tools and insights for ongoing and future research in enhancing LLM capabilities in specialized software development tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Weiqing Yang (2 papers)
Hanbin Wang (15 papers)
Zhenghao Liu (77 papers)
Xinze Li (34 papers)
Yukun Yan (39 papers)
Shuo Wang (382 papers)
Yu Gu (218 papers)
Minghe Yu (8 papers)
Zhiyuan Liu (433 papers)
Ge Yu (63 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/ComputerPapers/status/1822920924905009456

https://twitter.com/aronchick/status/1825247542281437645

YouTube

Show All Videos