Hybrid gem5-Verilator Co-simulation

Updated 9 August 2025

Hybrid gem5-Verilator co-simulation flow is a method that integrates gem5’s architectural simulation with Verilator's cycle-accurate RTL models to validate realistic workloads and operating systems.
The approach employs a unified discrete-event and cycle-accurate framework using semantic adaptors and synchronization protocols to manage cache-coherent memory hierarchies.
Experimental results indicate scalability improvements with slowdown factors ranging from 1.6× to 2.7×, balancing accuracy with practical performance in multi-core designs.

Hybrid gem5-Verilator co-simulation flow denotes a method in which the gem5 system-level simulator and Verilator’s cycle-accurate RTL simulation are integrated, enabling joint execution of realistic workloads and operating systems with actual RTL hardware models, particularly for complex subsystems such as cache-coherent memory hierarchies. This approach merges the high-level architectural modeling capabilities of gem5 with the fidelity of Verilator’s RTL-based device under test (DUT), allowing cross-domain validation and rapid design iteration for modern multi-core system-on-chip (SoC) architectures (Zoni et al., 5 Aug 2025, Gomes et al., 2017, Lowe-Power et al., 2020).

1. Principles of Hybrid Co-Simulation

Hybrid co-simulation unifies discrete-event (DE) and cycle-accurate event-driven paradigms. In the state-of-the-art, as outlined by the taxonomy in (Gomes et al., 2017), a "master" orchestrator synchronizes gem5’s system simulation domain (typically DEVS-driven) with the cycle-stepped RTL simulation produced by Verilator. The communication protocol between modules relies on encapsulating interface events, checkpoints, and data through stable APIs.

Generic hybrid approaches include:

Treating Verilator as a DE unit and gem5 as a CT simulator with appropriate adaptation.
Wrapping outputs/inputs via semantic adaptors so that abrupt RTL events synchronize seamlessly with gem5’s architectural state.
Implementing a commensurate time-stepping strategy ( $T_\text{verilator} = T_\text{gem5} + \Delta T$ ), ensuring simulation time alignment per cycle or macro-step (Lowe-Power et al., 2020, Gomes et al., 2017).

2. Co-Simulation Workflow and Protocol

The workflow implemented in frameworks like Rhea (Zoni et al., 5 Aug 2025) involves the following sequence:

Initialization: gem5 instantiates the Verilator-compiled model, initializes input signals, and steps the clock.
Synchronization: At every gem5 clock cycle, the framework invokes Verilator’s eval() function, exchanges packets (e.g., memory accesses), and polls for RTL output signals such as "ack".
Scoreboarding and Validation: Each memory transaction triggers comparisons between gem5’s Ruby memory model and the RTL results for correctness.
Queue Management: Requests are enqueued at both the gem5 and Verilator interface; valid acknowledgments pop these requests and commit results.

System time ( $T_\text{co-sim}$ ) is expressed as:

$T_\text{co-sim} = T_\text{gem5} + T_\text{sync}$

where $T_\text{sync}$ is the overhead of synchronization and data exchange.

3. Performance Metrics and Scalability

Experimental results show that the hybrid co-simulation incurs a simulation overhead, quantified by the slowdown factor $S(n)$ :

$S(n) = \frac{T_{\text{co-sim}}(n)}{T_{\text{MI}}(n)}$

where $n$ denotes the number of cores, $T_\text{co-sim}$ the hybrid simulation time, and $T_\text{MI}$ the time for gem5 Ruby’s MI model. Observed values:

$S \approx 2.7\times$ for dual-core scenarios
$S \approx 1.6\times$ for sixteen-core systems

A plausible implication is that synchronization overhead ( $T_\text{sync}$ ) is amortized across higher core counts, improving scalability.

Performance also includes application speedup, compared to MI:

$\text{Speedup} = \frac{T_{\text{MI}}}{T_{\text{design}}}$

Single-level RTL cache designs achieve intermediate speedups, and two-level designs reduce execution time by up to 43% for sixteen-core workloads (Zoni et al., 5 Aug 2025).

4. Taxonomy of Co-Simulation Methods

According to (Gomes et al., 2017), co-simulation taxonomies address:

Non-Functional Requirements (NFRs): Performance, scalability, accuracy, IP protection, parallelism.
Simulation Unit Requirements (SRs): Time relations, availability, Jacobian/derivative exposure, rollback capability.
Framework Requirements (FRs): Domain coupling, communication strategy (Jacobi vs. Gauss-Seidel), step sizing.

Hybrid flows demand semantic interface adaptors, accurate time synchronization, and robust handling of algebraic constraints (e.g., via predictor–corrector iterations using

$\Delta F_e = -(\partial g/\partial F_e )^{-1} g(F_e(nH))$

for iterative coupling variable correction).

5. Implementation Challenges and Solutions

Key challenges:

Event Ordering & Causality: Simultaneous events necessitate strict global ordering (DE orchestrator "Select" function); optimistic approaches require rollback capability.
Accuracy and Error Control: Communication step ( $H$ ) must be harmonized with local integrator error and input extrapolation error. Adaptive step sizing may use error estimators (e.g., Richardson extrapolation).
Algebraic Loops and Constraints: Instantaneous bidirectional dependencies demand iterative convergence mechanisms, possibly requiring simulators to expose derivative/Jacobian information.
Semantic Adaptation: Wrappers must synchronize the discrete gem5 events with RTL transitions from Verilator, translating clocked outputs and state changes (possibly via zero-order hold or interpolation).
Real-Time and Multi-Rate Operation: Synchronization for real/simulated hardware must track worst-case execution time (WCET); step sizing may need dynamic adjustment to balance accuracy and computational cost.

Adaptive communication and modular orchestrator design (supporting multiple simulation units) enhance reusability, scalability, and protect intellectual property boundaries (Gomes et al., 2017).

6. Comparative Analysis and Use Cases

Compared to pure gem5 simulation using MI, the flow with Verilator:

Achieves higher fidelity by simulating the microarchitectural details of real RTL models, making it possible to detect subtle bugs and validate low-level timing/power characteristics.
Enables full-system execution of realistic benchmarks, including complete operating systems.
Delivers intermediate performance—slower than MI models but more accurate—providing valuable trade-offs during hardware/software co-design and validation (Zoni et al., 5 Aug 2025).

Multi-threaded Verilator simulation allows for parallelization of the DUT, further mitigating synchronization overhead.

Use cases in Rhea include rapid design and validation of cache-coherent memory subsystems with MSI protocol, scaling from one to sixteen cores, and benchmarking with standard workloads for quantitative hardware/software co-design analysis.

7. Future Directions and Research Opportunities

The state-of-the-art co-simulation taxonomy reveals promising research directions:

Generic Semantic Adaptors: Development of more sophisticated wrappers for bridging different computation models.
Adaptive Strategy and Parallel/Distributed Frameworks: Using parallel co-simulation to balance load across multicore/cluster environments and optimizing communication step size selection dynamically.
Robust Coupling and Modularity: Enhancement of coupling frameworks to handle complex algebraic, functional, and timing dependencies with predictor–corrector schemes and full rollback support.

Further research may focus on scaling hybrid co-simulation to tens or hundreds of cores while retaining manageable overhead and propagating improvements in both accuracy and practical simulation throughput.

Hybrid gem5-Verilator co-simulation flow represents a modular, robust methodology for combining high-level system simulation with cycle-accurate RTL validation. Through frameworks such as Rhea, it achieves quantifiable performance and scalability benefits, while addressing intrinsic synchronization and adaptation challenges outlined in the co-simulation taxonomy. These techniques enable informed hardware/software co-design for modern SoCs and embody current research imperatives in modular, stable, and accurate simulator coupling (Zoni et al., 5 Aug 2025, Gomes et al., 2017, Lowe-Power et al., 2020).

PDF Markdown Chat (Pro)

References (3)

Rhea: a Framework for Fast Design and Validation of RTL Cache-Coherent Memory Subsystems (2025)

Co-simulation: State of the art (2017)

The gem5 Simulator: Version 20.0+ (2020)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Hybrid gem5-Verilator Co-simulation Flow.