GraphCodeAgent: Dual-Graph Code Generation
- GraphCodeAgent is a dual-graph framework that bridges natural language requirements to code implementations using a Requirement Graph and a Structural-Semantic Code Graph.
- It deploys a ReAct-style multi-hop reasoning process with dedicated tools like RGRetrieval and SSCGTraverse to enhance retrieval and code synthesis.
- Experimental evaluations show significant improvements over baselines, including up to a 94.3% gain in cross-file dependency tasks.
GraphCodeAgent is a dual graph-guided LLM agent framework for retrieval-augmented repository-level code generation. It leverages structured graph representations to enable LLMs to bridge natural language (NL) requirements to corresponding programming implementations embedded in large codebases. The method formalizes agent workflow as the construction and traversal of two intertwined graphs: a Requirement Graph (RG) modeling repository-specific requirements and their relations, and a Structural-Semantic Code Graph (SSCG) encoding code-level dependencies. Guided by these representations, the agent performs multi-hop retrieval and reasoning, considerably outperforming existing repo-level code synthesis baselines in both correctness and coverage (Li et al., 14 Apr 2025).
1. Dual Graph Construction: Requirement Graph and SSCG
At the foundation of GraphCodeAgent are two heterogeneous graphs, each encapsulating distinct repository knowledge structures.
Requirement Graph (RG) is defined as:
- : nodes encoding function-, method-, or class-level requirements, extracted via tree-sitter and augmented (if absent) using a code-understanding LLM (DeepSeek-V2.5).
- : node type assignments, e.g., .
- : edges between requirements, labeled as parent-child or semantic-similar based on LLM and manual verification. Semantic similarity edges rely on cosine-based thresholding of requirement embeddings: for , add edge if , typically .
Structural-Semantic Code Graph (SSCG) is formalized as:
- : code-element nodes (files, classes, methods, functions), parsed syntactically from all repository files.
- : assigns each node a code granularity.
- : directed edges corresponding to import, containment, inheritance, method invocation, and semantic similarity.
- : relation types; semanticSim edges added using code-embedding cosine similarity.
A mapping function links RG nodes to corresponding SSCG elements, aligning repository requirements to the implemented code units.
2. LLM Agent Multi-Hop Reasoning
GraphCodeAgent operationalizes retrieval and reasoning via a ReAct-style agent loop equipped with dedicated tools:
- RGRetrieval: extracts subrequirement and semantic neighbor nodes from RG for a given NL requirement.
- DualGraphMapping: maps selected requirement nodes to code nodes in SSCG.
- SSCGTraverse: enables multi-hop traversal over code dependency edges to gather all directly and indirectly relevant code units.
- WebSearch: external query for domain patterns or API documentation.
- CodeTesting: runs static and formatting checks (using “black”) to validate generated code.
The retrieval workflow proceeds as an algorithmic loop: starting from an NL requirement , the agent alternates between RG querying, dual-graph mapping, SSCG multi-hop traversal, and iterative context expansion:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
\begin{algorithm}[H]
\caption{GraphCodeAgent Multi-Hop Retrieval}
\begin{algorithmic}[1]
\Require NL requirement %%%%15%%%%, RG %%%%16%%%%, SSCG %%%%17%%%%
\State %%%%18%%%%, %%%%19%%%%
\While{not %%%%20%%%%}
\State {\tt AgentPredict} triggers an \emph{Action} and \emph{Args}
\If{\emph{Action} = RGRetrieval}
\State %%%%21%%%% RGRetrieval%%%%22%%%%
\State add %%%%23%%%% to %%%%24%%%%
\ElsIf{\emph{Action} = DualGraphMapping}
\State %%%%25%%%% DualGraphMapping%%%%26%%%%
\State add %%%%27%%%% to %%%%28%%%%
\ElsIf{\emph{Action} = SSCGTraverse}
\State %%%%29%%%% SSCGTraverse%%%%30%%%%
\State add %%%%31%%%% to %%%%32%%%%
\ElsIf{\emph{Action} = WebSearch}
\State %%%%33%%%% WebSearch%%%%34%%%% ; add %%%%35%%%% to %%%%36%%%%
\ElsIf{\emph{Action} = CodeTesting}
\State %%%%37%%%% CodeTesting%%%%38%%%%
\If{%%%%39%%%%} continue
\Else %%%%40%%%%
\EndIf
\ElsIf{\emph{Action} = {\tt GenerateCode}}
\State generate final code using full %%%%41%%%%
\State %%%%42%%%%
\EndIf
\EndWhile
\end{algorithmic}
\end{algorithm} |
3. Bridging NL Requirements and Code Implementation
GraphCodeAgent’s dual-graph approach exposes repository-specific subrequirements not explicitly present in NL prompts, allowing for fine-grained alignment between user intent and coded artifacts. The RG enables explicit enumeration and traversal of conceptual requirements, while the SSCG enables systematic discovery of implementation-level dependencies, both explicit and implicit.
Tool-based agent prompts specify available graph operations, edge types, and traversal primitives, instructing the LLM to reason in graph-centric modalities rather than generic text completion. The ReAct agent enforces a “think-and-act” workflow, critically separating reasoning and tool invocation to facilitate explicit multi-step mapping from NL intent to synthesized code.
4. Optimizable Computational Graph Formalism
GraphCodeAgent can also be formulated within the optimizable computational graph paradigm as outlined in “Language Agents as Optimizable Graphs” (Zhuge et al., 26 Feb 2024). In this view, each agent is a directed acyclic graph :
- Nodes perform atomic operations (e.g., LLM code generation, interpreting code, validation).
- Edges realize information flow; output of node becomes input context for .
- Routines map context and task input to output; nodes invoking LLMs are parameterized by prompts .
- Node-level prompt optimization: employs a gradient-free prompt improver over execution history for each node, e.g., OPRO or PromptBreeder.
- Edge optimization: parameterizes potential edges with independent probabilities , learns orchestration using REINFORCE gradient estimation subject to acyclicity constraints.
Hierarchical composition enables assembling multi-agent "swarms" via recursive graph merging, with inter-agent information flow learned and optimized in the same algebraic formalism.
5. Experimental Evaluation and Benchmarking
GraphCodeAgent has been evaluated on DevEval [Li et al. 2024] and CoderEval [Yu et al. 2024], which comprise repo-level tasks across multiple domains with annotated reference code and dependency structures.
Pass@1 comparison (correctness on first sample):
| Method | DevEval (GPT-4o) | DevEval (Gemini-1.5-Pro) | CoderEval (GPT-4o) | CoderEval (Gemini-1.5-Pro) |
|---|---|---|---|---|
| Dense RACG | 40.43% | 39.34% | 38.26% | 40.43% |
| GraphCodeAgent | 58.14% | 54.74% | 53.91% | 45.65% |
Ablation highlights a 12.2 pp drop upon removing SSCGTraverse, confirming the centrality of graph-based multi-hop reasoning.
Dependency-type breakdown (DevEval, GPT-4o):
| Scenario | Best Baseline | GraphCodeAgent | Δ (%) |
|---|---|---|---|
| Standalone | 50.19% | 60.16% | +19.9% |
| Local-file dependencies | 46.81% | 69.67% | +48.9% |
| Cross-file dependencies | 22.29% | 43.31% | +94.3% |
| Mixed | 32.07% | 45.18% | +40.9% |
On the QwQ-32B reasoning LLM, RepoCoder baseline reaches 48.93%, whereas GraphCodeAgent achieves 54.14% (+10.7%).
6. Implementation Details and Limitations
Implementation relies on:
- Tree-sitter for parsing and extracting code units.
- DeepSeek-V2.5 for generating missing NL requirements.
- Code embedding models for vectorizing both requirements and code elements (used in similarity-based edge construction).
- Standard Python and LLM API clients for orchestration.
Limitations:
- RG construction depends on LLM-generated and manually verified requirement annotations, which may not scale in the absence of high-quality human annotation.
- SSCG is based on static call graphs and similarity metrics; richer, dynamic analyses (data-flow, control-flow) remain unexplored.
- The ReAct workflow may incur high LLM API call volume; optimizing retrieval policy or tool selection is proposed for future work.
Prospective directions involve fine-tuning retrieval policies with RL, extending RG to encompass configuration and build artifacts for deeper NL–code alignment, and scaling through hierarchical summarization to very large repositories (Li et al., 14 Apr 2025).
7. Context and Significance
GraphCodeAgent advances retrieval-augmented code generation by formalizing the duality between conceptual requirements and code implementation dependencies. Prior work on RACG approaches, vector-similarity retrieval, and static code-context graphs has proven insufficient for repo-level tasks involving implicit dependencies and multi-step reasoning. By tightly integrating structured graph representations and multi-hop LLM agent reasoning, GraphCodeAgent demonstrates a marked increase in functional correctness, especially for tasks requiring context aggregation across files and modules. This suggests a promising path for future retrieval-augmented agent designs and meta-optimization frameworks (Li et al., 14 Apr 2025, Zhuge et al., 26 Feb 2024).