CodexGraph: LLM-Driven Code Graphs

Updated 27 November 2025

CodexGraph is a system that integrates LLM agents with property graph databases to enable multi-hop, structure-aware code retrieval and reasoning.
It employs a two-stage static analysis and a dual-agent workflow to construct and traverse a comprehensive code property graph.
Empirical results show CodexGraph outperforms traditional token similarity methods, delivering higher recall and precision in complex code queries.

CodexGraph is a system that integrates LLM agents with structural code graph databases, enabling multi-hop, structure-aware code retrieval, reasoning, and manipulation at the scale of real-world repositories. Unlike conventional approaches based on token similarity or hand-tuned task-specific APIs, CodexGraph exposes the full symbol and relation graph of codebases via property graph query languages, enabling agents to traverse, retrieve, and synthesize complex interrelations among modules, classes, functions, and variables. CodexGraph builds on prior work in code knowledge graph construction (Abdelaziz et al., 2020), leveraging graph-based representations for both program semantics and connections to documentation or human discussion.

1. Motivation and Problem Setting

The prevailing paradigms for code retrieval and automated codebase interaction—retrieval-augmented generation (RAG) via BM25/embeddings, and manual tool APIs—demonstrate serious constraints for large repositories. Similarity-based retrieval lacks high recall for multi-hop reasoning (e.g., tracing method overrides, cross-module dependencies), while manual tools are engineered for single tasks and lack generalizability (Liu et al., 7 Aug 2024). CodexGraph addresses these gaps by supplying a unified, task-agnostic interface: a code property graph where all symbols (MODULE, CLASS, FUNCTION, METHOD, FIELD, GLOBAL_VARIABLE) and their interrelations (CONTAINS, HAS_METHOD, INHERITS, USES, CALLS) are available as traversable nodes and edges. This enables LLM agents to pose arbitrarily complex, compositional retrievals using a fully expressive graph query language.

2. System Architecture and Workflow

CodexGraph's architecture consists of the following main components:

Code Graph Extractor: Performing static analysis over repository files, it constructs the code property graph in two stages—shallow intra-file indexing (extracting nodes and simple edges from each Abstract Syntax Tree) and cross-file analysis (resolving imports, inheritance, and re-exports).
Graph Database: The extracted code graph is materialized in a property graph database (Neo4j), exposing a Cypher query interface.
LLM Agent Layer: CodexGraph employs a dual-agent LLM approach. The primary agent decomposes the input task into natural-language queries and deduces reasoning chains; the translation agent maps each to executable Cypher queries. Queries are executed against Neo4j, and results are iteratively processed, forming context windows for code synthesis or explanations.
Iterative Query Loop: The workflow follows an iterative reasoning cycle, where the LLM agent can issue follow-up queries based on retrieved results, supporting multi-hop compositional reasoning. Each round integrates new context, and the agent determines when to terminate and output final code or answers.

CodexGraph's architecture is explicitly designed for modularity and extensibility. Symbol extraction and indexing operate on Python exclusively in the current release, but the schema admits future extensions to polyglot or federated codebases.

3. Graph Schema Formalization

The underlying code graph is defined formally as

$G = (V, E), \qquad V = \{v_i\}_{i=1}^n, \qquad E \subseteq V \times R \times V$

where $R$ is the set of edge types: CONTAINS, HAS_METHOD, HAS_FIELD, INHERITS, USES, and CALLS. Node and edge attributes encode identifiers (file_path, code_index), signatures, type information, and structural associations.

Node Types are:

MODULE: $\{\mathtt{name, file\_path}\}$
CLASS: $\{\mathtt{name, file\_path, signature, code\_index}\}$
FUNCTION and METHOD: with signature and parent class (for methods)
FIELD, GLOBAL_VARIABLE: $\{\mathtt{name, file\_path, code\_index}\}$

Edge Types include:

CONTAINS: MODULE $\rightarrow$ (CLASS | FUNCTION | GLOBAL_VARIABLE)
HAS_METHOD: CLASS $\rightarrow$ METHOD
HAS_FIELD: CLASS $\rightarrow$ FIELD
INHERITS: CLASS $\rightarrow$ CLASS
USES: (FUNCTION | METHOD) $\rightarrow$ (GLOBAL_VARIABLE | FIELD)

Edges in the graph are typed and allow semantic constraints in traversal, such as multi-hop tracing of inheritance, containment, or use-def relations. Cypher, a declarative property graph query language, is used to formulate and execute queries, supporting filters, aggregations, and multi-branch traversals.

4. Query Construction and Reasoning

The LLM agent layer enables explicit, context-aware graph query synthesis. The primary agent interprets the user prompt and decomposes tasks (e.g., "Find all subclasses of X that override method Y and use variable g"). Each sub-task is mapped to a Cypher query:

1
2
3

MATCH (cl:CLASS)-[:INHERITS*]->(base:CLASS), (cl)-[:HAS_METHOD]->(m:METHOD)
WHERE base.name = 'X' AND m.name = 'Y'
RETURN cl.name, cl.file_path

The translation agent ensures syntactic correctness of Cypher queries. Dynamic pipelines allow refinement: after receiving result tuples, the primary agent can issue additional queries or perform reasoning until output conditions are met. This agent-based approach separates high-level reasoning from syntactic query generation, improving reliability and expressivity, particularly when compared to prompt-only LLM methods or fixed retrieval templates.

5. Empirical Performance and Applications

CodexGraph demonstrates empirical advantages over prior retrieval and code navigation systems on multiple academic and real-world benchmarks (Liu et al., 7 Aug 2024):

CrossCodeEval (Lite): CodexGraph achieved an exact match (EM) of 27.9% and edit similarity (ES) of 67.98%, outperforming BM25- and embedding-based baselines (best EM: 21.2%).
SWE-bench (Lite): Pass@1 rate of 22.96%, on par with AutoCodeRover.
EvoCodeBench: Pass@1 of 36.02%, Recall@1 of 11.87%, exceeding other toolkits.

These results confirm high precision and generality on cross-file code completion, issue resolution, and evolutionary code generation tasks, with a trade-off in increased token cost (1.5×–4× over baseline) due to richer dialogue.

CodexGraph is deployed via the ModelScope-Agent framework and powers five end-user applications:

Application	Role	Performance/Feedback
Code Chat	Ad hoc Q&A	95% user satisfaction, structure-aware answers
Code Debugger	Bug localization, patch suggestion	Correct argument propagation mutation on real PRs
Code Unittestor	Pytest suite generation	85% method coverage (vs. 60% retrieval-only baseline)
Code Generator	Code feature synthesis	Zero parameter hallucinations, compile-ready output
Code Commentor	Docstring insertion	User rating: 4.3/5 vs. 3.2/5 for summarization baseline

6. Technical Advantages and Limitations

Strengths:

CodexGraph's unified schema supports all symbol and relation types necessary for complex repository-scale reasoning, eliminating the need for per-task API or tool engineering.
Flexible, structure-aware, multi-hop queries enable superior recall and precision on compositional code tasks.
Modular agent-mediated workflow allows high-level reasoning to operate independently of query syntax, improving extensibility.

Limitations:

The current system is restricted to Python. Polyglot and incremental-update scenarios require schema extension and novel language-adapter modules.
Indexing and graph construction incur substantial computational cost for large or highly volatile repositories.
LLM agents' success rates are correlated with backbone model quality; weaker models are less adept at query synthesis and pipeline orchestration.

7. Relation to Prior Work and Future Directions

The CodexGraph framework is grounded in preceding research on code knowledge graph extraction, particularly GraphGen4Code ("A Toolkit for Generating Code Knowledge Graphs"), which pioneered scalable, RDF-based code and documentation graphs spanning inter-procedural control/data flow and social artifacts such as StackOverflow threads (Abdelaziz et al., 2020). CodexGraph reifies these insights in a property graph model with a focus on real-time, agent-driven retrieval and automation, targeting repository-scale LLM applications.

Future research directions include:

Cross-language schema federation with support for Java, C++, and multi-database architectures.
Live incremental update for continuous integration/continuous deployment workflows.
Multi-agent orchestration for specialized sub-tasks in static analysis, testing, or documentation synthesis.
Integration of richer ontologies, type inference, and embedding-based symbol representations.

Taken together, CodexGraph positions graph-based code representation as the central interface for advanced LLM-driven software engineering, bridging retrieval, synthesis, and analysis across the software development lifecycle (Liu et al., 7 Aug 2024, Abdelaziz et al., 2020).

PDF Markdown Chat (Pro)

References (2)

A Toolkit for Generating Code Knowledge Graphs (2020)

CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases (2024)

CodexGraph: LLM-Driven Code Graphs

1. Motivation and Problem Setting

2. System Architecture and Workflow

3. Graph Schema Formalization

4. Query Construction and Reasoning

5. Empirical Performance and Applications

6. Technical Advantages and Limitations

7. Relation to Prior Work and Future Directions

Whiteboard

Follow Topic

Continue Learning

CodexGraph: LLM-Driven Code Graphs

1. Motivation and Problem Setting

2. System Architecture and Workflow

3. Graph Schema Formalization

4. Query Construction and Reasoning

5. Empirical Performance and Applications

6. Technical Advantages and Limitations

7. Relation to Prior Work and Future Directions

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics