- The paper introduces a multi-agent architecture combining dynamic planning, co-evolutionary verification, and spec anchoring to enhance SystemC model generation reliability.
- It demonstrates empirical success with up to 95% pass rates and a remarkable 71% token reduction while maintaining full specification recall.
- The approach bridges automated LLM code generation with rigorous hardware verification, reducing engineering overhead and inspiring future SoC validation research.
RefEvo: Agentic Design with Co-Evolutionary Verification for Agile SystemC Reference Model Generation
Motivation and Problem Statement
The increasing complexity in SoC development, coupled with industry demands for “shift-left” methodologies, mandates the rapid, reliable generation of high-fidelity reference models (notably SystemC) for early-stage architectural exploration and verification. Conventional manual workflows are inefficient and error-prone, and while LLMs have shown promise in code generation, their application to hardware modeling reveals significant deficiencies. These include (i) rigid generation workflows that inadequately adapt to varying design complexities, (ii) context overflow resulting in catastrophic forgetting of critical constraints across extended iterative sessions, and (iii) the “Coupled Validation Failure” phenomenon where naïve dual LLM generation of both DUT and TB leads to shared semantic flaws—compromising verification fidelity.
Figure 1: Performance capability of state-of-the-art LLMs using optimized prompt engineering and structured generation workflows reveals significant struggle in generating correct SystemC models, especially in memory and control logic domains.
RefEvo Architecture and Key Innovations
RefEvo introduces a hierarchical, multi-agent framework tailored to the demands of hardware modeling and verification. The logical architecture comprises three core innovations:
- Dynamic Design Planner: Agent 1 acts as the central orchestrator, analyzing the specification’s semantic complexity (interface type, state space, concurrency) and leveraging legacy assets for decomposition and workflow selection.
- Co-Evolutionary Verification Loop: The parallel generation/verification paradigm is realized through the Modeler (Agent 2) and Verifier (Agent 3), with a Dialectical Arbiter (Agent 4) overseeing verification against an anchored specification oracle. This enables rigorous cross-verification and targeted repair, breaking symmetry in TB/DUT hallucinations.
- Spec Anchoring Context Management: To prevent catastrophic forgetting, the context window is partitioned into immutable specification, compressed summary, and dynamic workspace, ensuring full recall and token efficiency.
Figure 2: Logical architecture of RefEvo detailing the dynamic planning phase and the symbiotic, dialectical verification loop.
Experimental Results and Analysis
End-to-End Success Rates
RefEvo demonstrates marked improvement in SystemC model generation success rates, evaluated across 20 module benchmarks and five SOTA LLMs (Gemini-2.5-Pro, GPT-5.1, GPT-4.1, Qwen3, Claude-Opus-4.1). Baseline approaches (Naive, Flow_Only, FixedTB) are consistently outperformed by RefEvo, which achieves up to 95% end-to-end pass rates (Gemini-2.5, GPT-5.1), resolving previously intractable functional failures in complex modules (e.g., fpu_div, keccak).
Figure 3: End-to-end success rate across models and workflow variants; RefEvo outperforms all baselines.
Mechanism Effectiveness and Failure Modes
An ablation on RefEvo’s verification mechanism reveals two critical insights. The transition from static flows to iterative refinement eliminates compile errors but exposes latent functional mismatches. RefEvo’s co-evolutionary loop further reduces functional failure by actively repairing TB logic—a capability absent in FixedTB-mode agents.
Figure 4: Failure distribution analysis highlights the elimination of compile and functional errors in RefEvo via co-evolutionary verification.
Consistent upward pass rate trends across diverse LLMs confirm that RefEvo's improvements are largely independent of base model capability, offering robust augmentation for LLM-aided hardware generation, particularly in high-complexity, high-fidelity scenarios.
Figure 5: Methodological robustness analysis; RefEvo reliably enhances generation quality across different LLMs.
Context Efficiency and Spec Recall
RefEvo’s spec anchoring strategy achieves substantial token savings—averaging a 71.04% reduction—particularly prominent for complex designs where absolute savings exceed 73,900 tokens per session. Importantly, this efficiency is not achieved at the expense of specification recall, which remains at 100%.
Figure 6: Token consumption comparison by design scale; Spec anchoring achieves substantial reduction, scaling with complexity.
Practical and Theoretical Implications
RefEvo addresses fundamental constraints in LLM-aided EDA: it not only automates high-level model and TB generation but also embeds verification rigor traditionally attainable only by skilled human practitioners. The co-evolutionary dialectic between Modeler and Verifier, mediated by Arbiter, systemically resolves coupled validation failures—mitigating a class of false positives prevalent in naïve workflows. Spec anchoring ensures lossless context management, facilitating scalable, long-turn agentic workflows.
Practically, RefEvo enables automated, agile golden model generation for SoC verification, drastically reducing engineering overhead. Theoretically, it advances agentic system design in domains requiring persistent truth propagation, symbiotic multi-agent repair, and dynamic task decomposition.
Future Directions
Potential future research directions include:
- Extension to more expressive hardware modeling domains (e.g., TLM+, formal verification).
- Incorporation of external toolchains and cross-modal asset reusability leveraging agentic orchestrators.
- Scaling to multi-agent ecosystems with expanded dialectical arbitration for heterogeneous verification scenarios.
- Integration with retrieval-augmented reasoning frameworks (e.g., ChipMind (Xing et al., 5 Dec 2025)) for even broader context support.
Conclusion
RefEvo systematically bridges the gap between pure LLM code generation and domain-constrained hardware verification. The hierarchical agentic framework—combining dynamic planning, co-evolutionary verification, and spec anchoring context management—demonstrates high efficacy and robustness in SystemC reference model generation, addressing key challenges of semantic complexity, verification reliability, and context scalability. By achieving strong numerical gains in pass rates and token efficiency, RefEvo sets a practical and theoretical foundation for future, fully automated SoC verification workflows (2604.24218).