VerCore: Autonomous RISC-V CPU
- VerCore is a Linux-capable RISC-V CPU that demonstrates complete autonomous chip design from specification to tape-out within a 12-hour window.
- It features a classic in-order five-stage pipeline for RV32I with ZMMUL extension, optimized through early forwarding and strategic microarchitectural refinements.
- Its performance metrics, including a CoreMark score of 3261 and a maximum frequency of 1.48 GHz on a 7nm node, highlight its efficiency compared to legacy commercial CPUs.
VerCore refers to a Linux-capable RISC-V CPU designed and implemented entirely by the autonomous agent Design Conductor (DC), achieving a complete end-to-end semiconductor build—from concise specification to GDSII tape-out—without human intervention over a 12-hour window. VerCore exemplifies autonomous hardware design at the intersection of frontier LLMs and modern EDA/PDK toolchains, producing a fully functional, high-frequency microprocessor that rivals historical commercial CPUs in benchmarked performance (Team et al., 6 Feb 2026).
1. Architectural Overview
VerCore is characterized by a classic in-order, five-stage RISC-V processor pipeline targeting the RV32I + ZMMUL instruction subset. The stages are:
- IF (Instruction Fetch): Includes program counter (PC) register, PC+4 adder, and icache interface.
- ID (Instruction Decode): Features a 32×32-bit register file (asynchronous read, synchronous-write), decoder, immediate generator, and branch/jump comparator.
- EX (Execute): Contains a full integer ALU (ADD/SUB/SLL/SRL/SRA/SLT/SLTU/AND/OR/XOR), AUIPC/LUI unit, and a handshake-driven, 4-stage Booth–Wallace multiplier implementing the ZMMUL extension.
- MEM (Memory Access): Implements data-cache interface, byte-enable logic for loads/stores, and address generation.
- WB (Write Back): Multiplexes ALU/multiplier outputs, load data, and PC+4 back to the register file.
The pipeline integrates stage-to-stage pipeline registers, with early (ID-stage) forwarding and hazard detectors (encompassing load-use, branch, and multiply-stall hazards) to enforce CPI ≤ 1.5. Branch/jump resolution occurs in ID, introducing a one-cycle penalty managed via IF/ID flush control. The critical timing path spans ID-stage comparator, ALU, and write-mask logic, despite the multiplier alone being operable at higher frequencies.
2. Autonomous Design Methodology
The DC agent executed a seven-stage design methodology to autonomously produce VerCore:
- Specification Ingestion: The agent parsed a carefully defined, 219-word requirements document into a formal “Design Proposal,” explicitly itemizing pipeline structure, register file semantics, cache protocols, reset/clock strategies, constrained CPI, targeted frequency, and verification strategy.
- Microarchitectural Review: Subagents performed expert-level review of each microarchitectural unit, refining the design before any hardware description language was produced.
- RTL and Testbench Generation: DC authored Verilog modules and corresponding self-checking testbenches for each functional unit, iterating until all passed isolated validation.
- Integration Verification: A top-level harness (vercore_tb.v), driven via Spike-simulated RISC-V binaries, enabled real program execution with exhaustive register and memory diffs to diagnose and correct pipeline flush and stall issues.
- RTL-to-Backend Transition: Once behaviorally correct, the design was synthesized with Synopsys Design Compiler and placed-and-routed with OpenROAD using ASAP7 (7 nm) PDK. Timing was continuously monitored.
- PPA (Power-Performance-Area) Closure: DC employed aggressive architectural and physical optimizations, including early ID forwarding, logic restructuring, multiplier re-pipelining, and micro-floorplan adjustments, iterating until worst-case fₘₐₓ ≥ 1.48 GHz was reliably achieved.
- GDSII and Physical Verification: Final GDSII layout was generated, area measured, parasitic extraction performed, and post-layout timing closure confirmed.
3. Performance Characteristics
VerCore’s principal silicon characteristics are summarized as follows:
| Metric | Value | Notes |
|---|---|---|
| CoreMark score | 3261 | Equivalent to Intel Celeron SU2300 |
| Area (ex-cache) | 2809 µm² | ASAP7 7 nm node |
| Maximum frequency | 1.48 GHz | T(clk) ≈ 0.676 ns |
| Performance/Area | ≈ 1.16 | CoreMark/µm² |
For context, at similar CoreMark, the Intel Celeron SU2300 operated at 1.2 GHz (45 nm, ~2011), whereas VerCore, at 1.48 GHz on 7 nm, delivers a ~23% higher clock rate within a vastly smaller area and fully automated tool flow. Active power at these area scales and frequencies is typically in the 1–5 mW range under light computational load (estimate; actual figures were not reported) (Team et al., 6 Feb 2026).
4. Key Optimizations and Technical Innovations
Critical to achieving the frequency and performance goals were several DC-devised strategies:
- Early ID-Stage Forwarding and Branch Resolution: Operand comparison and result forwarding from EX/MEM/WB into the branch logic in ID, minimizing penalty for control hazards without exacerbating the critical path.
- Balanced, Pipelined Booth–Wallace Multiplier: The 4-stage multiplier was systematically optimized for recoding, compressor tree placement, and internal registering, achieving 2.57 GHz at the unit level while keeping the global critical path in the integer datapath.
- ALU and Comparator Logic Restructuring: Detailed gate-level decomposition and redistribution of fan-out to minimize logic depth and overall cycle time, especially across IF–ID–EX transitions.
- Automated Floorplan Adjustments: Strategic placement of delay-sensitive registers and clustering of high-fan-out nets reduced global wiring overhead, RC delay, and congestion, reinforced by block boundary rebalancing during physical design.
5. Verification and Debugging Protocol
Functional verification was anchored in cycle-accurate comparison against Spike simulation using actual RISC-V program binaries. Testbench mismatches were localized by automated VCD (value change dump) to CSV conversion and further analyzed in Python, expediting the diagnosis of flush/stall protocol errors. Design iterations continued until bitwise correctness across all significant test programs was validated.
6. Lessons from Agentic Chip Design
Several key insights emerged from the fully autonomous design process:
- Architectural Reasoning Limitations: While DC excelled at logic optimization, the architectural cost/benefit of major pipeline changes (e.g., deeper pipelining vs. early forwarding) demanded numerous iterations. Embedding explicit architectural heuristics within future agents is identified as a critical target for capability improvements.
- Differential RTL–Hardware Mental Models: The agent occasionally misapplied sequential software logic when debugging hardware (e.g., Verilog), slowing resolution of backend issues. Enhanced model awareness of hardware concurrency and static timing could improve fix generation.
- Specification Precision: Tightly measurable, rigorously stated specifications (e.g., CPI constraints, explicit timing targets) directly induced optimal microarchitectural features like forwarding, while vagueness led to subpar pipelines.
Future research directions highlighted include scaling to multi-million-gate designs (with preliminary results on a 13-stage OoO core), redefining the architect’s role toward macro-guidance, increased reliance on golden integration tests for up-front verification, and a shift by EDA vendors towards algorithm quality over UI/UX, presuming agent-centric flow management.
7. Significance and Outlook
VerCore demonstrates that highly capable autonomous agents, given concise specifications and access to modern EDA/PDK workflows, can produce competitive, tape-out-ready CPUs in hours, matching or exceeding legacy commercial metrics in compact, efficiently optimized silicon. The case establishes a precedent for agentic chip design as a viable, scalable paradigm for future semiconductor development, potentially encompassing the full spectrum from simple microcontrollers to complex SoCs as frontier model capability continues to advance (Team et al., 6 Feb 2026).