Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating the Transferability of Code Repair for Low-Resource Programming Languages (2406.14867v2)

Published 21 Jun 2024 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs have shown remarkable performance on code generation tasks. A recent use case is iterative code repair, where an LLM fixes an incorrect program by rationalizing about errors and generating new code. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation, but only study their benefits on high-resource languages like Python, and ignore low-resource languages like Perl. To address this gap of knowledge, we investigate the benefits of distilling code repair for both high and low resource languages to determine if the techniques that are effective in a high resource setting are also applicable in a low resource setting. Our evaluation shows that distilling the ability to repair code has language dependent benefits. To explain this behavior, we perform a further analysis and find that contrary to preexisting beliefs, the correlation between reasoning ability and code correction ability is weak. We hypothesize this weak correlation is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair.

DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

The paper "DistiLRR: Transferring Code Repair for Low-Resource Programming Languages" addresses the performance disparities of LLMs on code repair tasks between high-resource programming languages (HRPLs) and low-resource programming languages (LRPLs). Traditional applications of code repair frameworks are predominantly evaluated on HRPLs such as Python, but understanding and improving their efficacy on LRPLs remains underexplored. This paper proposes the Distilling Low-Resource Repairs (DistiLRR) approach to bridge this gap by transferring the reasoning and code generation skills from a teacher model to a student model.

Methodology and Framework

Code Repair Framework: The DistiLRR methodology builds on a standard iterative code repair framework, integrating it with a distillation step where the student model learns from high-quality repairs generated by a teacher model. Specifically, the process involves:

  1. Generating initial incorrect code samples via a base model.
  2. Executing tests to produce error messages from these samples.
  3. Using a repair model to generate rationales and modifications iteratively.

Distillation Process: The core innovation of DistiLRR lies in replacing the base model with a distillation-enhanced student model. The distillation process leverages high-quality rationales and repairs from a larger teacher model (GPT-3.5-Turbo) to train a smaller student model (CodeLlama-7b-Instruct, CodeLlama-7b, Mistral-7b). The dataset construction process involves generating incorrect code, obtaining error feedback, and collecting correct repairs from the teacher model.

Experimental Setup and Baselines

The authors conduct a comprehensive evaluation on three HRPLs (Python, JavaScript, Java) and three LRPLs (Perl, Golang, Swift) across two benchmarks (MBXP and MultiLingual HumanEval). They compare DistiLRR against several baselines including non-repair i.i.d. sampling, basic iterative repair with base models, in-context learning (ICL) where the rationale is provided by the teacher model but code is generated by the base model, and direct use of teacher model for repairs.

Key Findings

  1. Initial vs. Repair Pass Rates: Four rounds of DistiLRR repair consistently outperform initial pass@10 and often pass@5 results, indicating its efficacy in achieving higher pass rates with fewer inference calls compared to non-repair sampling.
  2. DistiLRR vs. Baselines: DistiLRR models achieve superior pass rates on LRPLs compared to ICL and base models. Specifically, DistiLRR improves pass@1 by 99.5% for Perl, 112.8% for Golang, and 144.5% for Swift on HumanEval.
  3. Rationale Quality vs. Code Correctness: The paper reveals a weaker-than-expected correlation between rationale quality and subsequent repairs. Even with good rationales, base models often generate incorrect code, particularly in LRPLs. DistiLRR mitigates this by improving the responsiveness to rationale feedback.
  4. Reduction in Syntax Errors: The DistiLRR models show a marked decrease in syntax errors on LRPLs, suggesting improved model understanding of the programming languages' nuances. For HRPLs, the difference in syntax error reduction is marginal, reflecting better pre-existing model knowledge.

Implications and Future Directions

Implications: The paper highlights the potential of distillation to enable more efficient and accurate code repair frameworks, especially for underrepresented LRPLs. By transferring knowledge from a teacher to a student model, DistiLRR not only enhances repair capabilities but does so without requiring extensive human-annotated datasets.

Future Research: Further investigations could explore scaling the fine-tuning datasets to assess the limits of DistiLRR's improvements. Additionally, evaluating the approach on more complex, reasoning-heavy code benchmarks and extending the distillation methodology to other domains within code generation and repair tasks could provide more general insights.

In sum, "DistiLRR: Transferring Code Repair for Low-Resource Programming Languages" contributes valuable insights into distillation-based approaches for improving LLM performance across diverse programming languages. This work paves the way for broader application and accessibility of high-quality code generation tools, especially benefiting languages with limited training data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kyle Wong (3 papers)
  2. Alfonso Amayuelas (14 papers)
  3. Liangming Pan (59 papers)
  4. William Yang Wang (254 papers)