PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation (2412.12447v2)

Published 17 Dec 2024 in cs.SE, cs.AI, and cs.CL

Abstract: Code generation with LLMs has shown significant promise, especially when employing retrieval-augmented generation (RAG) with few-shot examples. However, selecting effective examples that enhance generation quality remains a challenging task, particularly when the target programming language (PL) is underrepresented. In this study, we present two key findings: (1) retrieving examples whose presented algorithmic plans can be referenced for generating the desired behavior significantly improves generation accuracy, and (2) converting code into pseudocode effectively captures such algorithmic plans, enhancing retrieval quality even when the source and the target PLs are different. Based on these findings, we propose Plan-as-query Example Retrieval for few-shot prompting in Code generation (PERC), a novel framework that utilizes algorithmic plans to identify and retrieve effective examples. We validate the effectiveness of PERC through extensive experiments on the CodeContests, HumanEval and MultiPL-E benchmarks: PERC consistently outperforms the state-of-the-art RAG methods in code generation, both when the source and target programming languages match or differ, highlighting its adaptability and robustness in diverse coding environments.

Summary

The paper introduces PERC, a framework using pseudocode plans as queries to retrieve algorithmically similar code examples across different programming languages.
Evaluations show PERC achieves competitive or superior performance on benchmarks like CodeContests and HumanEval compared to state-of-the-art methods.
PERC offers a robust method for improving code generation quality for diverse and underrepresented languages, highlighting pseudocode's role in cross-language retrieval.

PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation

The paper "PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation" introduces an innovative framework aimed at enhancing code generation using LLMs, particularly for underrepresented programming languages. The framework, known as Plan-As-Query Example Retrieval for code generation (PERC), leverages algorithmic plans, such as pseudocode, to improve the retrieval and selection of few-shot examples for retrieval-augmented generation (RAG).

Core Contributions and Methodology

The primary contribution of PERC is its unique approach to example retrieval through the creation and utilization of algorithmic plans. This method involves converting example code into pseudocode, which aids in capturing the high-level algorithmic logic across different programming languages. By using these plans as a pivot, PERC significantly mitigates the issues of syntactic and structural discrepancies that typically arise when dealing with cross-language example retrieval. This is particularly beneficial for underrepresented programming languages where high-quality examples may be sparse or unevenly distributed.

The methodology of PERC encompasses two main steps:

Plan-Based Retrieval: The framework generates pseudocode as a query plan to identify examples with similar algorithmic logic, regardless of the programming languages involved. This approach focuses on reducing the lexical and syntactic noise that can impede the retrieval of algorithmically relevant examples.
Code Generation with Few-Shot Prompting: Selected examples and their derived pseudocode are used to guide the LLM in generating the target code, either in the same or another language. Importantly, if the retrieved example is in a different language from the target, PERC utilizes LLMs to translate the code into the target language, enhancing generalizability and performance across diverse coding tasks.

Empirical Evaluation

The performance of PERC was rigorously tested across several benchmarks, including CodeContests, HumanEval, and MultiPL-E. Using models such as GPT-3.5-Turbo-16k and Llama-3.1-8B-Instruct, the framework demonstrated competitive and often superior results relative to other state-of-the-art RAG methods. Notably, PERC achieved a Pass@1 of 6.61% on CodeContests and 76.04% on HumanEval, outperforming benchmarks like RepoCoder and CEDAR in situations involving both matching and mismatching source-target language pairs.

Implications and Future Directions

PERC presents significant practical and theoretical implications for the domain of AI-driven code generation. Practically, it offers a robust method for improving code generation quality against the backdrop of linguistic variability and resource constraints. Theoretically, it underscores the potential of pseudocode and algorithmic planning as effective mediators for cross-language code retrieval, opening avenues for further research into semi-supervised retrieval methodologies or integrating semantic understanding into code generation workflows.

Future research could explore enhancing PERC’s framework with more sophisticated reasoning chains or adaptive learning models to further integrate cross-linguistic algorithmic knowledge. The reduction in performance when additional languages are introduced into the retrieval pool hints at underlying challenges that warrant deeper investigation, such as dynamically balancing retrieval complexity with computational efficiency.

Overall, PERC offers a compelling framework for tackling the challenges of example retrieval in code generation, particularly in scenarios involving underrepresented programming languages and cross-lingual code transfer, positioning the approach as a valuable contribution to the field of computational linguistics and AI-driven software development.