Better patching using LLM prompting, via Self-Consistency (2306.00108v2)

Published 31 May 2023 in cs.SE and cs.LG

Abstract: LLMs can be induced to solve non-trivial problems with "few-shot" prompts including illustrative problem-solution examples. Now if the few-shots also include "chain of thought" (CoT) explanations, which are of the form problem-explanation-solution, LLMs will generate a "explained" solution, and perform even better. Recently an exciting, substantially better technique, self-consistency 1 has emerged, based on the intuition that there are many plausible explanations for the right solution; when the LLM is sampled repeatedly to generate a pool of explanation-solution pairs, for a given problem, the most frequently occurring solutions in the pool (ignoring the explanations) tend to be even more likely to be correct! Unfortunately, the use of this highly-performant S-C (or even CoT) approach in software engineering settings is hampered by the lack of explanations; most software datasets lack explanations. In this paper, we describe an application of the S-C approach to program repair, using the commit log on the fix as the explanation, only in the illustrative few-shots. We achieve state-of-the art results, beating previous approaches to prompting-based program repair, on the MODIT dataset; we also find evidence suggesting that the correct commit messages are helping the LLM learn to produce better patches.

Authors (2)

Toufique Ahmed (26 papers)
Premkumar Devanbu (25 papers)

Citations (16)

View on Semantic Scholar

Summary

LLM Prompting Techniques for Program Repair: An Analysis of Self-Consistency Application

The paper "Better patching using LLM prompting, via Self-Consistency" explores the exploration of LLMs, specifically focusing on utilizing self-consistency in the field of software engineering for program repair tasks. The paper is driven by the need to bridge the gap in sophisticated problem-solving capabilities of LLMs and their application in software engineering tasks, which often lack the necessary explanatory datasets.

Key Contributions

Application of Chain-of-Thought and Self-Consistency: The paper illustrates how the chain-of-thought, a technique that involves breaking down problem-solving into sequential reasoning steps, and self-consistency can be harnessed in software engineering tasks.LLMs, when repeatedly sampled to generate a pool of explanation-solution pairs for a given problem, demonstrate improved accuracy in solutions by selecting the most frequent solution—emphasizing the strength of self-consistency.
State-of-the-Art Results: With the application of commit logs as explanations in few-shot examples, the authors achieved notable results on the MODIT dataset. This empirical evidence suggests that providing such explanations in few-shot learning scenarios significantly enhances the LLM’s ability to generate more accurate program patches.
Importance of Descriptive Explanations: The research indicates that the quality of commit messages plays a crucial role. Accurate commit logs lead to significant performance improvements, whereas random commit messages do not, underscoring the importance of contextually relevant explanations in problem-solving.

Methodological Approach

The authors evaluated the approach by utilizing a dataset derived from the MODIT benchmark, which includes two subsets with different levels of sequence complexity. They used the Code-DaVinci-002 model via OpenAI API to test the efficacy of self-consistency and chain-of-thought in program repair tasks. The experiment employed a high-temperature sampling strategy to ensure diverse reasoning paths and solutions, allowing for marginalization over explanations for consistent solutions.

Results and Statistical Significance

The paper reports a remarkable improvement in the task of program repair when self-consistency is applied, especially when enhanced by BM25-based few-shot retrieval, showing up to 13.08% relative gain over the previous state-of-the-art techniques that employed such techniques alongside greedy decoding. A McNemar test confirmed the statistical significance of these results, with p-values indicating high confidence in the reliability of the improvements observed.

Implications and Future Directions

The implications of this research are notable both in practical and theoretical realms. Practically, it provides a path forward for leveraging the reasoning capabilities of LLMs in tasks that traditionally lack explanatory datasets, offering methodologies to derive explanations from existing inputs such as commit logs. Theoretically, this opens up new avenues in understanding how explanation-based learning can be integrated into LLM frameworks to bolster their performance on code-related tasks.

The authors have also paved the way for further research in improving the quality of explanatory datasets, such as commit logs, to enhance LLM performance. Exploring ways to automatically generate richer commit logs or reasoning paths could be a vital future research direction. Additionally, it would be beneficial to assess self-consistency across a broader spectrum of program repair tasks and LLM configurations to generalize this methodology widely.

In conclusion, the paper makes a substantial contribution by demonstrating the efficacy of self-consistency combined with explanation-based few-shot learning in program repair, establishing a foundation for further exploration of LLM capabilities in software engineering applications.

PDF Markdown

Related Papers

YouTube

Show All Videos