Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Repository-Level Prompt Generation for Large Language Models of Code (2206.12839v3)

Published 26 Jun 2022 in cs.LG, cs.AI, cs.PL, and cs.SE

Abstract: With the success of LLMs of code and their use as code assistants (e.g. Codex used in GitHub Copilot), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines. We release our code, data, and trained checkpoints at: \url{https://github.com/shrivastavadisha/repo_level_prompt_generation}.

Repository-Level Prompt Generation for LLMs of Code

The paper presents a novel framework for enhancing the performance of LLMs of code, particularly focusing on generating effective prompts by utilizing repository-level information. Entitled "Repository-Level Prompt Generation for LLMs of Code," the research proposes a system called the Repo-Level Prompt Generator (RLPG). This system is designed to create prompts that are example-specific by harnessing context from an entire code repository. Such context might include structural elements and relevant details from various files such as imports and parent class files, which are not confined to the file containing the code to be completed.

The proposed framework does not necessitate access to the internal weights of the LLM, which makes it applicable in scenarios where only black-box access to the model is available. This is particularly useful since many state-of-the-art LLMs, such as OpenAI's Codex, only provide API access for generating outputs without exposing model weights.

The authors conducted experiments focusing on the task of single-line code auto-completion using repositories obtained from the Google Code archives. These experiments demonstrated that leveraging the RLPG framework resulted in significant improvements over the baseline performance of Codex. More specifically, an oracle experiment revealed a 36% relative improvement in successful code completions compared to using Codex alone. When trained using their prompt proposal classifier, the framework achieved up to a 17% improvement over Codex and other baseline methods.

Methodology

  1. Repo-Level Prompt Proposals: The RLPG framework utilizes a set of prompt proposals designed to capture contextual information from a repository. These proposals are composed of various combinations of:
    • Prompt Sources: This includes selecting relevant context from the current file, parent class files, import files, sibling files, files with similar names, among others.
    • Prompt Context Types: This specifies what to extract, such as identifiers, method names, and bodies, string literals, or field declarations.

The framework incorporates domain-specific knowledge by drawing from these structured prompt proposals, allowing for diverse prompts tailored per example.

  1. Prompt Proposal Classifier (PPC): RLPG includes a machine learning model that predicts which prompt proposal will most likely produce a successful completion for a given code hole. Two variants of this model were explored: RLPG-H, which uses the hole context representation, and RLPG-R, which includes similarity modeling with a multi-headed attention mechanism.
  2. Prompt Composer: This component combines the selected prompt proposal context with the default context that Codex uses, adjusting dynamically based on context length constraints.

Implications and Future Directions

The proposed framework provides a mechanism for automatically generating more effective prompts without altering the LLM's weights, highlighting its versatility and practical application, especially in environments that strictly control access to models. The successful integration of repository-level context in prompt generation represents a significant stride in code modeling, suggesting that similar approaches might benefit other domains, such as question answering and multi-document summarization, where structured context retrieval is crucial.

Potential future developments might focus on scaling this framework to handle larger context lengths and experimenting with prompt generation for multi-line code auto-completion tasks. Moreover, exploring ways to incorporate this framework into environments with proprietary software or developing tailored adaptations for unique organizational coding practices could further extend its applicability.

Overall, the research offers a promising avenue for augmenting LLMs of code by systematically harnessing the untapped potential of repository-level information. The proposed prompts can leverage external contexts, making LLMs more effective even in tasks they are not explicitly fine-tuned to perform, thereby advancing the capabilities of AI-assisted programming tools.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Allamanis, M. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. Association for Computing Machinery, 2019.
  2. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022.
  5. FLEX: Unifying evaluation for few-shot NLP. In Advances in Neural Information Processing Systems, 2021.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  9. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155, 2020.
  10. Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999, 2022.
  11. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021.
  12. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366, 2020.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  14. Are deep neural networks the best choice for modeling source code? In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017.
  15. Discovering the syntax and strategies of natural language programming with generative language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022.
  16. A probabilistic model of information retrieval: development and comparative experiments - part 1. Inf. Process. Manag., 2000.
  17. Learning and evaluating contextual embedding of source code. In Proceedings of the 37th International Conference on Machine Learning, 2020.
  18. Generalization through memorization: Nearest neighbor language models. In International Conference on Learning Representations, 2020.
  19. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  20. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
  21. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021.
  22. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021.
  23. Competition-level code generation with alphacode. Science, 2022.
  24. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, 2022.
  25. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023.
  26. Gpt understands, too. arXiv:2103.10385, 2021.
  27. Embedding api dependency graph for neural code generation. Empirical Softw. Engg., 2021.
  28. Learning to walk over relational graphs of source code. In Deep Learning for Code Workshop, 2022a.
  29. Codetrek: Flexible modeling of code using an extensible relational representation. In International Conference on Learning Representations, 2022b.
  30. Do users write more insecure code with ai assistants? arXiv preprint arXiv:2211.03622, 2022.
  31. Learning how to ask: Querying LMs with mixtures of soft prompts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
  32. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  33. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  34. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021.
  35. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  36. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021.
  37. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Empirical Methods in Natural Language Processing (EMNLP), 2020.
  38. On-the-fly adaptation of source code models. In NeurIPS 2020 Workshop on Computer-Assisted Programming, 2020.
  39. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014.
  40. Investigating explainability of generative ai for code through scenario-based design. In 27th International Conference on Intelligent User Interfaces, 2022.
  41. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
  42. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, 2021.
  43. No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022.
  44. Cocosum: Contextual code summarization with multi-relational graph neural network. arXiv preprint arXiv:2107.01933, 2021a.
  45. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021b.
  46. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  47. Memorizing transformers. In International Conference on Learning Representations, 2022.
  48. PRIMERA: Pyramid-based masked sentence pre-training for multi-document summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022.
  49. A systematic evaluation of large language models of code. arXiv preprint arXiv:2202.13169, 2022a.
  50. Capturing structural locality in non-parametric language models. In International Conference on Learning Representations, 2022b.
  51. Learning to generate code comments from class hierarchies. arXiv preprint arXiv:2103.13426, 2021.
  52. Docprompting: Generating code by retrieving the docs. In International Conference on Learning Representations, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Disha Shrivastava (15 papers)
  2. Hugo Larochelle (87 papers)
  3. Daniel Tarlow (41 papers)
Citations (109)
Youtube Logo Streamline Icon: https://streamlinehq.com