QiMeng System: Automated Chip Co-Design

Updated 30 June 2025

QiMeng system is a fully automated framework for processor chip co-design that integrates a domain-specific large language model, agent workflows, and rigorous feedback-driven verification.
It features a three-layer hierarchical architecture that combines multimodal data representation, dual-loop design optimization, and cross-stage model integration for complex design tasks.
Empirical benchmarks demonstrate enhanced HDL generation, improved OS auto-tuning, and superior performance in compiler backend and tensor operator synthesis.

QiMeng is a fully automated system for processor chip hardware and software co-design, built around a hierarchical architecture that integrates a domain-specialized LLM, agent-based design workflows, and rigorous feedback-driven verification. It represents a comprehensive framework aiming to address the core challenges of processor chip design in contemporary and emerging computing ecosystems, including knowledge representation, data scarcity, functional correctness, and vast solution complexity (Zhang et al., 5 Jun 2025).

1. Hierarchical System Architecture

QiMeng is structured in three hierarchical layers:

Bottom Layer: Large Processor Chip Model (LPCM) The LPCM is a domain-specific LLM that encodes processor design knowledge in both text (e.g., HDL, specs, software code) and graph (e.g., Abstract Syntax Trees, Data Flow Graphs) modalities. Architecturally, it integrates standard transformer layers with graph neural network (GNN) modules, enabling effective cross-modal representation and reasoning.
Middle Layer: Design Agents
- Hardware Design Agent automates hardware development from specification to netlist and layout, leveraging the LPCM’s generative and reasoning abilities.
- Software Design Agent automates software adaptation and optimization, including OS configuration, compiler backend extension, and tensor operator generation, using a similar mechanism. Both agents employ dual-loop search paradigms: an outer loop for performance-driven hierarchical decomposition (e.g., tree search, pruning), and an inner loop for correctness via program synthesis, simulation, and symbolic/solver-based repair.
Top Layer: Application Deployments Real-world applications include automated front-end HDL design, superscalar CPU synthesis, OS/RTOS auto-tuning, cross-ISA compiler generation, and generation of high-performance tensor operators for diverse chip architectures.

System Diagram

-----------------------------------
|   Top Layer: Applications       |
|---------------------------------|
| Middle Layer: Design Agents     |
|---------------------------------|
|    Bottom Layer: LPCM           |
-----------------------------------

2. Large Processor Chip Model (LPCM): Architecture and Methodology

The LPCM introduces several innovations:

Multimodal Data Support: Unlike conventional LLMs, LPCM is trained on both text and custom-formatted graphs using GNN encoders and contrastive learning to align text-graph representations:

$e_g = \mathrm{GNN}(x_g); \qquad e_t = \mathrm{Embed}(x_t); \qquad \mathrm{ContrastiveLoss}(e_g, e_t)$

Graph Generation and Decoding: Supports generation of hardware graph objects (e.g., BDDs, BSDs) via diffusion models or generative GNN decoders for downstream tasks like circuit or dependency predictor synthesis.
Cross-Stage Collaborative Training: Data is collected and aligned across multiple design abstraction stages. Models are first trained at each stage, then cascaded to synthesize aligned data spanning the library, OS, compiler, logic, and physical domains.
Chain-of-Thought Imitation and RL: Models are trained using chain-of-thought imitation learning and further refined with reinforcement learning to enhance stepwise, hierarchical design reasoning.
Feedback-Driven Inference: Dual feedback loops ensure robustness:
- Correctness Loop: Integrates automated verification (simulation, formal checks); incorrect generations are reverted and revised until 100% functional correctness is achieved.
- Performance Loop: Hierarchical coarse-to-fine search, using real performance analytics to prune design branches and focus exploration.

3. Hardware and Software Design Agents

Hardware Design Agent

Transforms natural language specifications to RTL and through to physical layouts, invoking LPCM for each refinement stage.
Employs dual-loop optimization: the outer loop decomposes modules for global performance, while the inner loop generates and formally verifies HDL fragments—typically via symbolic analysis using Binary Speculation Diagrams (BSD).
Correctness is maintained by iterative sampling, error localization, and logical repair (e.g., applying Shannon expansion for Boolean circuit synthesis).

Software Design Agent

Responsible for automating software stack adaptation and optimization relative to target processor cores, including:
- OS configuration (AutoOS): systematically prunes and infers optimal kernel settings using context expansion and performance feedback.
- Compiler backends: automatically migrates and verifies backend code for new ISAs, employing both LLM code synthesis and symbolic analysis (e.g., leveraging Z3 for program repair).
- Operator transcompilation (e.g., QiMeng-Xpiler): translates tensor programs and operator primitives between hardware-specific programming models with correctness and performance constraints.

4. Top-Layer Applications and Empirical Advancements

QiMeng delivers a range of state-of-the-art applications:

Application	Performance/Scope
Automated CPU design (QiMeng-CPU-v2)	17M gates, superscalar, matches ARM Cortex A53 performance, 380× v1 boost
HDL Generation (CodeV)	81.9% pass@1 (VerilogEval-Machine), surpassing GPT-4/RTLCoder
OS/RTOS Auto-configuration	Up to 25.6% improvement over vendor expert-tuned defaults
Compiler Backend Generator	>70% accuracy, neural compiler >99% on ExeBench
Tensor Operator Generation	AutoGEMM up to 251% OpenBLAS (RISC-V CPU); AutoTensorOp 124% cuBLAS (GPU)

Feedback-Driven Improvement Cycle

The deployment of these applications generates real-world data, which is used to further train and refine the LPCM and associated agents, establishing an iterative improvement loop. This process supports a self-evolving system: top-down (deploy → data) and bottom-up (retrain → redeploy) cycles.

5. Integration of Domain Knowledge, Verification, and Feedback

The system resolves key bottlenecks in prior processor design automation efforts:

Knowledge Representation Gap: The LPCM’s architectural extension to handle graph and text, supported by contrastive and generative learning, bridges the knowledge gap between human-readable specifications and silicon-ready implementation.
Data Scarcity: Synthetic, cross-stage multi-modal data generation, facilitated by cascading models, addresses domain-specific data shortage for supervised learning.
Correctness Assurance: Embedded verification, symbolic repair, and feedback-driven loops ensure that generated designs meet strict functional standards at each level.
Efficient Search: Tree search, curriculum learning, pruning based on performance feedback, and reinforcement policy guidance are combined to manage design space complexity.

6. Research Directions and Prospects

QiMeng is currently in progressive deployment, with many core modules (e.g., CPU synthesis, OS configuration, tensor operator automation) completed; full integration and orchestration of all hierarchical layers are ongoing.

Future development directions include:

Deeper agent integration for end-to-end hardware–software co-optimization, including transistor-level and non-Boolean circuits.
Expansion to richer graph-structured software understanding and migration capabilities to enable systematic ecosystem re-targeting.
Incorporation of continual learning, reinforcement learning, and evolutionary strategies to enable lifelong self-improvement and adaptation.
Further automation of cross-layer reasoning, closing the loop between application deployment, domain data generation, and underlying model retraining.

7. Significance and Impact

QiMeng establishes a new paradigm, advancing from traditional EDA-based automation to a domain-specialized, LLM-centric approach encompassing the full hardware–software stack. Through a combination of multimodal pre-training, feedback-driven inference, and agent orchestration, QiMeng demonstrates the viability—empirically validated at industrial scale—of fully automated processor chip design matching or surpassing expert-engineered benchmarks. The system’s capacity for self-improvement and cross-domain adaptation suggests broad implications for processor design efficiency and ecosystem evolution in the face of hardware diversity and escalating complexity.

PDF Markdown Chat (Upgrade)

References (1)

1.

QiMeng: Fully Automated Hardware and Software Design for Processor Chip (2025)

Follow-up Questions

We haven't generated follow-up questions for this topic yet.

Generate Now