Coverage-Guided HW-SW Contract Fuzzing

Updated 18 November 2025

The paper shows that symbolic leakage contracts combined with self-compositional testing effectively expose subtle microarchitectural side-channel vulnerabilities.
It introduces a Self-Composition Deviation (SCD) coverage metric that captures dynamic register divergence to guide test prioritization with measurable feedback.
Experimental evaluations on Rocket and BOOM cores confirm that adaptive fuzzing strategies significantly accelerate leak detection while reinforcing contract security.

Coverage-guided hardware-software contract fuzzing is a pre-silicon methodology for systematically uncovering side-channel vulnerabilities in processor microarchitecture by combining symbolic leakage contract specifications, a self-compositional test harness, and a novel feedback-driven coverage metric. Whereas traditional hardware fuzzing targets functional bugs, coverage-guided contract fuzzing addresses the subtler problem of verifying that a microarchitecture does not violate its information flow security guarantees as formally encoded in architectural-level contracts. This approach makes security-relevant microarchitectural divergences observable and efficiently directs the search toward states most likely to expose contract violations, particularly in the presence of optimizations such as out-of-order or speculative execution (Geier et al., 11 Nov 2025).

1. Hardware-Software Leakage Contracts and Self-Composition

A hardware-software leakage contract augments the instruction set architecture (ISA) semantics with explicit observations intended to formally characterize the information an attacker may legitimately learn. Define architectural states as $\sigma = \langle m, a \rangle$ (memory $m$ and register file $a$ ), with transitions $\sigma \rightarrow_p \sigma'$ under a program $p$ . Observations $\ell \in \text{Obs}$ describe leakage, e.g., instruction fetch address or data cache access.

The contract thereby induces a ternary relation on states: $\sigma \xrightarrow{(\ell)}_p \sigma'$ A contract trace for a program $p$ from initial state $\sigma_0$ is the sequence $p(\sigma_0) = \ell_1,\dots,\ell_n$ with each transition labeled by its observation. The attacker model is expressed as a mapping $\text{AtkObs}$ from a hardware execution trace to observable information; a minimal but effective choice is cycle count, $\text{AtkObs}(\tau) = |\tau|$ . Security holds if, whenever two programs yield indistinguishable contract traces, their attacker observations are identical: $\forall p, \mu, \sigma_0, \sigma_1. \ p(\sigma_0) = p(\sigma_1) \implies \text{AtkObs}((\sigma_0,\mu) \Rightarrow_p^*) = \text{AtkObs}((\sigma_1,\mu) \Rightarrow_p^*)$ Self-compositional simulation runs two copies of the core in lockstep, one holding secret input $A$ , one $B$ , initializing both under the same clock/reset and monitoring per-cycle divergence across all registers and stateful elements via a bit-vector $\Delta_t = \{r \mid s_A(t)[r] \neq s_B(t)[r]\}$ .

2. Self-Composition Deviation (SCD) Coverage Metric

Naive fuzzing suffers from coverage blindness regarding side-channel vulnerabilities. Self-Composition Deviation (SCD) coverage is introduced as a metric tailored to side-channel analysis. It tracks not only which architectural registers differ between two executions but also transitions in these divergence patterns.

Given the current divergence $\Delta_t$ and previous $\Delta_{t-1}$ , SCD forms a hashed index into a bit-vector $\text{cov}$ (e.g., 2 MiB in size): $\text{cov}[\text{hash}(\Delta_t) \oplus (\text{hash}(\Delta_{t-1}) \gg 1)] \leftarrow 1$ This per-cycle update records the occurrence of new or shifting divergence patterns. Each test case has its own coverage vector $\text{cov}_{tc}$ ; the global search maintains a cumulative coverage $\text{cumulative\_cov}[i] = \bigvee_t \text{cov}_t[i]$ . The process is thus guided to maximize the discovery of previously unseen divergence transition patterns, which are crucial for exposing microarchitectural side-channels.

3. Fuzzing Workflow and Feedback Strategies

The contract fuzzing pipeline consists of four stages: program generation/mutation, contract simulation, RTL self-composition, and coverage-driven prioritization.

A. Generation & Mutation: Programs are random RISC-V instruction sequences (words), up to four instructions each, generated to ensure realistic termination/data dependency. Two data sections (for $A$ and $B$ ) are paired per program and manipulated using mutation (deletion, insertion, retention) and splicing, governed by an LRU policy for seed management.
B. Contract Simulation: Each $(\text{program}, \text{data}_A, \text{data}_B)$ is compiled and run through a Sail-based contract simulator. If contract traces differ, the input is contract-distinguishable and discarded; only contract-indistinguishable pairs proceed.
C. RTL Self-Composition: At RTL, the core is duplicated and instrumented via FIRRTL to compare all register states cycle-wise. Parallel simulation (Verilator + cocotb) ensures efficiency. On termination, any difference in cycle count between instances signals a leak.
D. Prioritization: Test cases that contribute new coverage bits to cumulative SCD coverage are prioritized. Under weighted feedback, seeds are selected proportionally to their score, defined by the rarity of their discovered bits:

$\text{Score}(tc) = \sum_{i \in C(tc)} \frac{1}{n_i}$

where $C(tc)$ are indices set by $\text{cov}_{tc}$ , $n_i$ is the number of corpus entries previously covering $i$ .

Pseudocode for the overall fuzzing loop organizes these steps, emphasizing contract-indistinguishability gating, RTL-based leak discovery, and adaptive corpus expansion by SCD novelty.

4. Implementation Details: Rocket and BOOM Cores

The approach was implemented atop DifuzzRTL, extended with:

FIRRTL pass to duplicate all registers/wires and inject comparators for $\Delta_t$
cocotb test harness driving two TileLink channels and memory models with unified control
Cycle-accurate polling of the RISC-V HTIF tohost flag for robust termination detection
Parallel ProcessPoolExecutor to batch Verilator jobs, amortizing startup costs

For out-of-order pipelines (BOOM), implementation duplicated large reorder buffers, issue queues, and rename tables. Comparisons focused on architectural register file writes per cycle, obviating the need to handle speculation metadata.

Reset refinement extended beyond architectural registers to memory and cache invalidation, ensuring execution reproducibility.

5. Experimental Evaluation

Experiments comprised $5\times 10^4$ -iteration campaigns on Rocket (in-order) and BOOM (out-of-order) using four prioritization strategies: Pass Feedback (corpus=1000), Pass 100 (corpus=100), New Coverage, and Weighted Feedback. Key findings:

SCD Coverage: On Rocket (seq-ct-b contract), median indexed bits after $10^4$ iterations were 26,587 (Pass), 46,256 (NewCoverage), 54,325 (Weighted), indicating substantial SCD improvement under guided feedback (Mann-Whitney $U: p<0.01$ ). On BOOM (seq-arch), Weighted achieved ~2.05M bits vs. ~1.88M for other approaches.
Leak Discovery: On the Rocket core, no contract violations were found after $10^6$ iterations, confirming robustness under tested contracts. On BOOM, out-of-order speculative leaks were found; e.g., load reordering led to timing differences in violation of the contract. The number of test cases to first leak (median of 100 runs): Pass 338, Pass100 1077, NewCoverage 326, Weighted 279 ( $p=0.003$ for Weighted vs. Pass100), demonstrating that increased SCD coverage accelerates leak discovery.

6. Best Practices, Limitations, and Extensions

Best Practices:

Instrument architectural registers (optionally cache lines) for divergence tracking
Integrate coverage hash/update logic into the RTL harness to avoid large trace logs
Leverage parallel simulation to maximize throughput
Maintain precise data-section distinctions to enforce contract-indistinguishability

Limitations:

RTL simulation is slow; industrial-scale designs may necessitate hardware emulation or selective sampling
Cycle-count attacker model is imprecise for certain side-channels (e.g., cache-line or power analysis)
Corpus size impacts coverage-guided biasing; excessive size may dilute feedback efficacy

Extensions:

Expanding scope to SoCs by instrumenting interconnects (AXI/TileLink) would expose cross-component leaks
Refined attacker models (cache-set, power, ML-based) could increase sensitivity and precision
Hybrid approaches using incremental synthesis or verification loops allow fuzzing to efficiently eliminate candidate contracts beyond the reach of bounded model checking

Coverage-guided hardware-software contract fuzzing, enabled by self-composition and SCD coverage, enables effective, feedback-driven exploration of microarchitectural state space, accelerating exposure of timing and speculative vulnerabilities in modern, complex processor cores (Geier et al., 11 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Coverage-Guided Pre-Silicon Fuzzing of Open-Source Processors based on Leakage Contracts (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Coverage-Guided Hardware-Software Contract Fuzzing.

Coverage-Guided HW-SW Contract Fuzzing

1. Hardware-Software Leakage Contracts and Self-Composition

2. Self-Composition Deviation (SCD) Coverage Metric

3. Fuzzing Workflow and Feedback Strategies

4. Implementation Details: Rocket and BOOM Cores

5. Experimental Evaluation

6. Best Practices, Limitations, and Extensions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Coverage-Guided HW-SW Contract Fuzzing

1. Hardware-Software Leakage Contracts and Self-Composition

2. Self-Composition Deviation (SCD) Coverage Metric

3. Fuzzing Workflow and Feedback Strategies

4. Implementation Details: Rocket and BOOM Cores

5. Experimental Evaluation

6. Best Practices, Limitations, and Extensions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research