Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 29 tok/s Pro

GPT-4o 88 tok/s Pro

Kimi K2 138 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

ZeroRepo: Graph-Guided Repo Generation

Updated 25 September 2025

ZeroRepo is a graph-driven framework that unifies high-level planning and low-level implementation via a persistent Repository Planning Graph (RPG).
It orchestrates proposal-level planning, implementation refinement, and graph-guided code synthesis in a sequential, test-driven workflow.
Empirical results on the RepoCraft benchmark highlight its scalability by generating repositories over 36,000 LOC with high functional coverage and improved test pass rates.

ZeroRepo is a graph-driven framework for generating large-scale, coherent software repositories from scratch, leveraging LLMs and a persistent graph representation known as the Repository Planning Graph (RPG). By explicitly modeling both the high-level functionality and low-level structural dependencies of repositories, ZeroRepo enables LLM-based agents to conduct unified, scalable planning and test-driven code generation, systematically advancing the state of automated end-to-end software synthesis.

1. Architectural Overview

ZeroRepo’s pipeline consists of three sequential stages, each governed by the RPG as the central planning substrate:

Proposal-Level Planning: The system ingests high-level user specifications and queries a large-scale feature tree (e.g., the EpiCoder Feature Tree) to extract a relevant, repository-aligned subtree. Using LLM-driven exploration and exploitation rules, this subtree is reorganized and refined into a modular functionality graph articulating “what to build.”
Implementation-Level Refinement: The preliminary functionality graph is enriched with concrete implementation metadata, including file and folder partitioning, and explicit data flows. This enrichment “grounds” abstract modules into a repository skeleton, indicating which functionalities are collocated, the flow of data between modules, and critical ordering constraints (e.g., file precedence, intra-file dependencies).
Graph-Guided Code Generation: The fully specified RPG is traversed in a (typically topologically sorted) order, with the LLM generating code for each node. Leaf nodes representing functions or classes are synthesized using test-driven development: specification-derived tests are generated, and code is refined iteratively until test suite passage, ensuring functional correctness. Dependency ordering is enforced such that all required upstream components are synthesized before their dependents, maintaining global consistency across the repository.

This continuous transition from coarse functionality to implementation detail, encoded within the RPG, establishes a reproducible, explicit, and persistent plan for end-to-end repository generation.

2. Repository Planning Graph (RPG): Structure and Semantics

The RPG is a directed graph that encodes both proposal-stage and implementation-stage planning via its nodes and edges:

Dual Node Semantics:
- Root nodes: Folder-level regions
- Intermediate nodes: Source code files or groupings
- Leaf nodes: Functions or classes
Dependency Edges:
- Inter-module (solid/black) edges represent executable data flows or module-level dependencies (e.g., output of a data-loading module consumed by algorithmic processing).
- Intra-module (dashed gray) edges reflect ordering or containment (e.g., one file must be generated before another; a function may invoke another).
Unification of Planning Levels:

The RPG replaces ambiguous natural language plans by unifying “what to build” (capabilities) and “how to build” (structure, dependencies) in a single, linearly updatable blueprint. This enables controlled, long-horizon planning cycles and iterative repository refinement.

3. Empirical Performance on the RepoCraft Benchmark

ZeroRepo is evaluated on RepoCraft, a benchmark comprising six real-world projects across domains (machine learning libraries, symbolic computation, data analysis, and web frameworks) and encompassing 1,052 discrete evaluation tasks. Key empirical findings include:

Repository Scale:

ZeroRepo-generated repositories achieve an average size of nearly 36,000 lines of code (LOC), approximately 3.9 times larger than the strongest baseline (Claude Code) and 64 times the average of other leading baselines.

Functionality Coverage:

The proportion of functional taxonomy categories realized in the synthesized repository is given by the coverage metric:

$\text{Coverage} = \frac{1}{|\mathcal{C}|} \sum_{j=1}^{K} \mathbbm{1}\left[\exists g_i \in \mathcal{G} \text{ such that } f(g_i) = c_j \right]$

where $|\mathcal{C}|$ is the number of reference categories, $\mathcal{G}$ is the set of generated functionalities, and $f$ is the category mapping. ZeroRepo attains 81.5% coverage, indicating robust representation of the functional landscape.

Functional Correctness:

In test-driven evaluation, ZeroRepo yields a 69.7% test pass rate, exceeding Claude Code by 27.3 percentage points and delivering a 35.8 point improvement in voting rate measures of verification consistency. This signals not only quantity but substantial utility and correctness in the output.

4. Advantages of RPG-based Repository Generation

The RPG framework confers several key advantages over prior approaches relying primarily on free-form natural language planning:

Explicit Complexity Modeling:

The RPG’s explicit edges for data flow and ordering constraints support the encoding of sophisticated inter- and intra-module relationships. This enables near-linear scaling of repository size and complexity as the planning horizon expands.

Improved Coherence and Agent Localization:

By structuring functional and structural dependencies, the RPG enhances the LLM’s repository “understanding,” facilitating rapid agent localization. Systematic graph search methods permit efficient identification of regions for correction, refinement, or extension.

Persistent Unified Planning Substrate:

Unlike fragmented, two-phase (proposal/implementation) workflows, the RPG-based approach stores the evolving plan persistently, permitting iterative refinement and ensuring consistency between high-level intent and low-level structure throughout generation cycles.

Fostering Innovation Beyond Reference Taxonomies:

RPG-based planning drives not only reproducibility of known functionalities, but also functional novelty: by systematically exploring and integrating subtrees from global feature spaces that lie outside reference taxonomies, the system can realize new capability compositions without loss of global coherence.

5. Comparison with Disappearing Frameworks and Implications

Insights from the state of disappearing frameworks provide indirect conceptual context for the ZeroRepo philosophy:

Minimal Client/Runtime Footprint (Analogy):

Just as disappearing frameworks (e.g., Astro, Marko, Qwik) aim to reduce the shipped client-side JavaScript ( $C_\textrm{framework} \to 0$ ), ZeroRepo pursues “invisibility” at the planning/infrastructure level by minimizing non-essential artifacts and dependencies.

Compiler-Centric Optimizations and Static Building:

In both paradigms, major advancements are achieved via aggressive compile-time transformations: disappearing frameworks eliminate runtime overhead, and ZeroRepo eliminates ambiguity by translating free-form specifications into explicit, persistent graph plans.

Modularity and Interoperability:

Disappearing frameworks employ modular “islands” and flexible integration with diverse UI libraries. By analogy, ZeroRepo’s RPG supports modularity via graph partitioning and can, in principle, facilitate ecosystem-agnostic repository composition.

This suggests that principles from minimal-runtime web frameworks could continue to influence forward-looking directions in automated repository generation—particularly concerning artifact minimization, code splitting, and partitioned enhancement at scale.

6. Mathematical Formulation of Metrics

ZeroRepo’s evaluation leverages LaTeX-formulated metrics to support rigorous reproducibility:

Coverage Metric (cited above):

Quantifies functional representation:

$\text{Coverage} = \frac{1}{|\mathcal{C}|} \sum_{j=1}^{K} \mathbbm{1}\left[\exists g_i \in \mathcal{G} \text{ such that } f(g_i) = c_j \right]$

Novelty and Additional Metrics:

Similar indicator functions formalize measures of functional novelty (comparing generated nodes to taxonomic baselines) and verification rates (“voting rates”). These definitions ensure unambiguous interpretation of coverage and correctness statistics across the benchmark suite.

7. Significance and Future Directions

ZeroRepo demonstrates that explicit, graph-based planning unified across proposal and implementation stages enables scalable, high-fidelity repository generation. The RPG’s capacity to model complex dependencies, promote coherent expansion, and support robust test-driven synthesis permits the construction of repositories an order of magnitude larger and functionally richer than prior LLM-driven systems.

A plausible implication is that, as LLMs and feature-tree databases scale, RPG-style planning could become a critical abstraction for agent-based software engineering, supporting both reproduction of canonical projects and systematic innovation beyond existing functional landscapes. Further investigation could refine graph schemas for broader application domains and extend test-driven validation schemas, ensuring sustained correctness at scale.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to ZeroRepo Framework.