Repository-Level Prompt Generation for LLMs of Code
The paper presents a novel framework for enhancing the performance of LLMs of code, particularly focusing on generating effective prompts by utilizing repository-level information. Entitled "Repository-Level Prompt Generation for LLMs of Code," the research proposes a system called the Repo-Level Prompt Generator (RLPG). This system is designed to create prompts that are example-specific by harnessing context from an entire code repository. Such context might include structural elements and relevant details from various files such as imports and parent class files, which are not confined to the file containing the code to be completed.
The proposed framework does not necessitate access to the internal weights of the LLM, which makes it applicable in scenarios where only black-box access to the model is available. This is particularly useful since many state-of-the-art LLMs, such as OpenAI's Codex, only provide API access for generating outputs without exposing model weights.
The authors conducted experiments focusing on the task of single-line code auto-completion using repositories obtained from the Google Code archives. These experiments demonstrated that leveraging the RLPG framework resulted in significant improvements over the baseline performance of Codex. More specifically, an oracle experiment revealed a 36% relative improvement in successful code completions compared to using Codex alone. When trained using their prompt proposal classifier, the framework achieved up to a 17% improvement over Codex and other baseline methods.
Methodology
- Repo-Level Prompt Proposals: The RLPG framework utilizes a set of prompt proposals designed to capture contextual information from a repository. These proposals are composed of various combinations of:
- Prompt Sources: This includes selecting relevant context from the current file, parent class files, import files, sibling files, files with similar names, among others.
- Prompt Context Types: This specifies what to extract, such as identifiers, method names, and bodies, string literals, or field declarations.
The framework incorporates domain-specific knowledge by drawing from these structured prompt proposals, allowing for diverse prompts tailored per example.
- Prompt Proposal Classifier (PPC): RLPG includes a machine learning model that predicts which prompt proposal will most likely produce a successful completion for a given code hole. Two variants of this model were explored: RLPG-H, which uses the hole context representation, and RLPG-R, which includes similarity modeling with a multi-headed attention mechanism.
- Prompt Composer: This component combines the selected prompt proposal context with the default context that Codex uses, adjusting dynamically based on context length constraints.
Implications and Future Directions
The proposed framework provides a mechanism for automatically generating more effective prompts without altering the LLM's weights, highlighting its versatility and practical application, especially in environments that strictly control access to models. The successful integration of repository-level context in prompt generation represents a significant stride in code modeling, suggesting that similar approaches might benefit other domains, such as question answering and multi-document summarization, where structured context retrieval is crucial.
Potential future developments might focus on scaling this framework to handle larger context lengths and experimenting with prompt generation for multi-line code auto-completion tasks. Moreover, exploring ways to incorporate this framework into environments with proprietary software or developing tailored adaptations for unique organizational coding practices could further extend its applicability.
Overall, the research offers a promising avenue for augmenting LLMs of code by systematically harnessing the untapped potential of repository-level information. The proposed prompts can leverage external contexts, making LLMs more effective even in tasks they are not explicitly fine-tuned to perform, thereby advancing the capabilities of AI-assisted programming tools.