Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation (2405.02355v3)

Published 3 May 2024 in cs.SE and cs.AI

Abstract: Utilizing LLMs to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general LLMs, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In this paper, we propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language, which can facilitate natural language based LLMs for better understanding of code syntax and serve as a bridge among different programming languages. To take the extracted structural knowledge into the foundation models, we propose 1) a hard meta-graph prompt template to transform the challenging graphical representation into informative knowledge for tuning-free models and 2) a soft prompting technique that injects the domain knowledge of programming languages into the model parameters via finetuning the models with the help of a pretrained GNN expert model. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert. CodeGRAG improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation. Code is available at https://anonymous.4open.science/r/Code-5970/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Kounianhua Du (17 papers)
  2. Renting Rui (5 papers)
  3. Huacan Chai (4 papers)
  4. Lingyue Fu (8 papers)
  5. Wei Xia (147 papers)
  6. Yasheng Wang (91 papers)
  7. Ruiming Tang (171 papers)
  8. Yong Yu (219 papers)
  9. Weinan Zhang (322 papers)
  10. Jizheng Chen (7 papers)
Citations (2)