CodexGraph: Bridging LLMs and Code Repositories via Code Graph Databases
The paper introduces CodexGraph, a system designed to enhance the interaction of LLMs with extensive code repositories through the utilization of graph database interfaces. CodexGraph effectively mitigates the limitations observed in current methods that rely heavily on similarity-based retrieval or manual tools and APIs.
Problem and Motivation
LLMs have exhibited remarkable proficiency in handling standalone code tasks, such as those found in HumanEval and MBPP. However, these models struggle significantly when faced with tasks that span entire code repositories, mainly due to the complex dependencies and extensive context these repositories encompass. Existing solutions, like similarity-based retrieval methods, often suffer from low recall rates in complex tasks, while tools and API-based approaches necessitate extensive domain expertise and task-specific adaptations, limiting their general applicability.
Proposed Solution: CodexGraph
CodexGraph uniquely integrates LLM agents with structured graph database interfaces derived from code repositories. This integration harnesses the inherent structural properties of graph databases and the versatility of graph query languages, enabling LLM agents to construct and execute precise, code-aware queries. The core components of CodexGraph include:
- Graph Database Schema: CodexGraph employs a schema that abstracts code repositories into code graphs, where nodes symbolize code elements (e.g., modules, classes, functions) and edges represent relationships (e.g., inheritance, usage). This schema facilitates a structured representation of the codebase, supporting multi-granular searches and topological analysis.
- Shallow Indexing and Edge Completion: The system initiates with a shallow indexing phase, performing a singular pass to capture symbols and their meta-information within the codebase. This is followed by an edge completion phase, employing depth-first search (DFS) to resolve cross-file relationships, thereby constructing a comprehensive code graph database.
- LLM Interface Using Graph Query Language: CodexGraph enables LLM agents to generate and translate natural language queries into graph database queries. This
write then translate
strategy divides the task between generating the high-level understanding of queries and ensuring their syntactic correctness through a translation LLM agent. This enhances query generation accuracy and retrieval efficiency. - Iterative Pipeline for Task Execution: CodexGraph adopts an iterative query and retrieval approach, allowing LLM agents to progressively refine their queries based on the information retrieved. This iterative method enhances the system's capability to handle complex and multi-hop code reasoning tasks effectively.
Experimental Evaluation
CodexGraph's efficacy was assessed using three comprehensive and challenging benchmarks: CrossCodeEval, SWE-bench, and EvoCodeBench. The key findings from the experimental analysis include:
- Performance Across Tasks: CodexGraph showcased competitive performance across these benchmarks. For example, on the CrossCodeEval Lite (Python) dataset, it demonstrated superior results (EM: 27.90%) compared to similarity-based retrieval (BM25) and other RACG methods (AutoCodeRover).
- Query Strategy Effectiveness: Optimal querying strategies were found to differ across tasks. For CrossCodeEval Lite (Python), multiple queries per round enhanced recall, whereas, for SWE-bench, focusing on precision through single queries per round yielded better results.
- Advancements with LLMs: CodexGraph's performance significantly improved with the use of more advanced LLMs, such as GPT-4. This suggests that CodexGraph's structured and flexible interface can effectively leverage the evolving capabilities of LLMs.
- Token Consumption: The more extensive and complex queries generated by CodexGraph, while improving retrieval accuracy, do incur higher token costs compared to other RACG methods.
Implications
Practical Applications: CodexGraph's integration into real-world software development was demonstrated by developing five practical applications (Code Chat, Code Debugger, Code Unittestor, Code Generator, and Code Commentor) using the ModelScope-Agent framework. These applications showcased CodexGraph’s utility in enhancing code comprehension, debugging, automated testing, generation, and documentation in practical scenarios.
Future Developments: The success of CodexGraph opens up several avenues for future research and applications. Extending the schema to support additional programming languages and optimizing the indexing efficiency are immediate next steps. Furthermore, integrating advanced multi-agent collaboration techniques could further enhance the system's flexibility and performance across diverse code tasks.
Conclusion
CodexGraph represents a significant stride towards bridging the interaction between LLMs and code repositories. By leveraging graph database interfaces and advanced querying strategies, CodexGraph not only addresses the limitations of existing RACG methods but also paves the way for more versatile and powerful solutions in automated software engineering. The presented work underscores the potential of structured graph-based approaches in enhancing the scalability and effectiveness of LLMs in handling complex, real-world code repositories.