Papers
Topics
Authors
Recent
2000 character limit reached

Coverage-Guided HW-SW Contract Fuzzing

Updated 18 November 2025
  • The paper shows that symbolic leakage contracts combined with self-compositional testing effectively expose subtle microarchitectural side-channel vulnerabilities.
  • It introduces a Self-Composition Deviation (SCD) coverage metric that captures dynamic register divergence to guide test prioritization with measurable feedback.
  • Experimental evaluations on Rocket and BOOM cores confirm that adaptive fuzzing strategies significantly accelerate leak detection while reinforcing contract security.

Coverage-guided hardware-software contract fuzzing is a pre-silicon methodology for systematically uncovering side-channel vulnerabilities in processor microarchitecture by combining symbolic leakage contract specifications, a self-compositional test harness, and a novel feedback-driven coverage metric. Whereas traditional hardware fuzzing targets functional bugs, coverage-guided contract fuzzing addresses the subtler problem of verifying that a microarchitecture does not violate its information flow security guarantees as formally encoded in architectural-level contracts. This approach makes security-relevant microarchitectural divergences observable and efficiently directs the search toward states most likely to expose contract violations, particularly in the presence of optimizations such as out-of-order or speculative execution (Geier et al., 11 Nov 2025).

1. Hardware-Software Leakage Contracts and Self-Composition

A hardware-software leakage contract augments the instruction set architecture (ISA) semantics with explicit observations intended to formally characterize the information an attacker may legitimately learn. Define architectural states as σ=m,a\sigma = \langle m, a \rangle (memory mm and register file aa), with transitions σpσ\sigma \rightarrow_p \sigma' under a program pp. Observations Obs\ell \in \text{Obs} describe leakage, e.g., instruction fetch address or data cache access.

The contract thereby induces a ternary relation on states: σ()pσ\sigma \xrightarrow{(\ell)}_p \sigma' A contract trace for a program pp from initial state σ0\sigma_0 is the sequence p(σ0)=1,,np(\sigma_0) = \ell_1,\dots,\ell_n with each transition labeled by its observation. The attacker model is expressed as a mapping AtkObs\text{AtkObs} from a hardware execution trace to observable information; a minimal but effective choice is cycle count, AtkObs(τ)=τ\text{AtkObs}(\tau) = |\tau|. Security holds if, whenever two programs yield indistinguishable contract traces, their attacker observations are identical: p,μ,σ0,σ1. p(σ0)=p(σ1)    AtkObs((σ0,μ)p)=AtkObs((σ1,μ)p)\forall p, \mu, \sigma_0, \sigma_1. \ p(\sigma_0) = p(\sigma_1) \implies \text{AtkObs}((\sigma_0,\mu) \Rightarrow_p^*) = \text{AtkObs}((\sigma_1,\mu) \Rightarrow_p^*) Self-compositional simulation runs two copies of the core in lockstep, one holding secret input AA, one BB, initializing both under the same clock/reset and monitoring per-cycle divergence across all registers and stateful elements via a bit-vector Δt={rsA(t)[r]sB(t)[r]}\Delta_t = \{r \mid s_A(t)[r] \neq s_B(t)[r]\}.

2. Self-Composition Deviation (SCD) Coverage Metric

Naive fuzzing suffers from coverage blindness regarding side-channel vulnerabilities. Self-Composition Deviation (SCD) coverage is introduced as a metric tailored to side-channel analysis. It tracks not only which architectural registers differ between two executions but also transitions in these divergence patterns.

Given the current divergence Δt\Delta_t and previous Δt1\Delta_{t-1}, SCD forms a hashed index into a bit-vector cov\text{cov} (e.g., 2 MiB in size): cov[hash(Δt)(hash(Δt1)1)]1\text{cov}[\text{hash}(\Delta_t) \oplus (\text{hash}(\Delta_{t-1}) \gg 1)] \leftarrow 1 This per-cycle update records the occurrence of new or shifting divergence patterns. Each test case has its own coverage vector covtc\text{cov}_{tc}; the global search maintains a cumulative coverage cumulative_cov[i]=tcovt[i]\text{cumulative\_cov}[i] = \bigvee_t \text{cov}_t[i]. The process is thus guided to maximize the discovery of previously unseen divergence transition patterns, which are crucial for exposing microarchitectural side-channels.

3. Fuzzing Workflow and Feedback Strategies

The contract fuzzing pipeline consists of four stages: program generation/mutation, contract simulation, RTL self-composition, and coverage-driven prioritization.

  • A. Generation & Mutation: Programs are random RISC-V instruction sequences (words), up to four instructions each, generated to ensure realistic termination/data dependency. Two data sections (for AA and BB) are paired per program and manipulated using mutation (deletion, insertion, retention) and splicing, governed by an LRU policy for seed management.
  • B. Contract Simulation: Each (program,dataA,dataB)(\text{program}, \text{data}_A, \text{data}_B) is compiled and run through a Sail-based contract simulator. If contract traces differ, the input is contract-distinguishable and discarded; only contract-indistinguishable pairs proceed.
  • C. RTL Self-Composition: At RTL, the core is duplicated and instrumented via FIRRTL to compare all register states cycle-wise. Parallel simulation (Verilator + cocotb) ensures efficiency. On termination, any difference in cycle count between instances signals a leak.
  • D. Prioritization: Test cases that contribute new coverage bits to cumulative SCD coverage are prioritized. Under weighted feedback, seeds are selected proportionally to their score, defined by the rarity of their discovered bits:

Score(tc)=iC(tc)1ni\text{Score}(tc) = \sum_{i \in C(tc)} \frac{1}{n_i}

where C(tc)C(tc) are indices set by covtc\text{cov}_{tc}, nin_i is the number of corpus entries previously covering ii.

Pseudocode for the overall fuzzing loop organizes these steps, emphasizing contract-indistinguishability gating, RTL-based leak discovery, and adaptive corpus expansion by SCD novelty.

4. Implementation Details: Rocket and BOOM Cores

The approach was implemented atop DifuzzRTL, extended with:

  • FIRRTL pass to duplicate all registers/wires and inject comparators for Δt\Delta_t
  • cocotb test harness driving two TileLink channels and memory models with unified control
  • Cycle-accurate polling of the RISC-V HTIF tohost flag for robust termination detection
  • Parallel ProcessPoolExecutor to batch Verilator jobs, amortizing startup costs

For out-of-order pipelines (BOOM), implementation duplicated large reorder buffers, issue queues, and rename tables. Comparisons focused on architectural register file writes per cycle, obviating the need to handle speculation metadata.

Reset refinement extended beyond architectural registers to memory and cache invalidation, ensuring execution reproducibility.

5. Experimental Evaluation

Experiments comprised 5×1045\times 10^4-iteration campaigns on Rocket (in-order) and BOOM (out-of-order) using four prioritization strategies: Pass Feedback (corpus=1000), Pass 100 (corpus=100), New Coverage, and Weighted Feedback. Key findings:

  • SCD Coverage: On Rocket (seq-ct-b contract), median indexed bits after 10410^4 iterations were 26,587 (Pass), 46,256 (NewCoverage), 54,325 (Weighted), indicating substantial SCD improvement under guided feedback (Mann-Whitney U:p<0.01U: p<0.01). On BOOM (seq-arch), Weighted achieved ~2.05M bits vs. ~1.88M for other approaches.
  • Leak Discovery: On the Rocket core, no contract violations were found after 10610^6 iterations, confirming robustness under tested contracts. On BOOM, out-of-order speculative leaks were found; e.g., load reordering led to timing differences in violation of the contract. The number of test cases to first leak (median of 100 runs): Pass 338, Pass100 1077, NewCoverage 326, Weighted 279 (p=0.003p=0.003 for Weighted vs. Pass100), demonstrating that increased SCD coverage accelerates leak discovery.

6. Best Practices, Limitations, and Extensions

Best Practices:

  • Instrument architectural registers (optionally cache lines) for divergence tracking
  • Integrate coverage hash/update logic into the RTL harness to avoid large trace logs
  • Leverage parallel simulation to maximize throughput
  • Maintain precise data-section distinctions to enforce contract-indistinguishability

Limitations:

  • RTL simulation is slow; industrial-scale designs may necessitate hardware emulation or selective sampling
  • Cycle-count attacker model is imprecise for certain side-channels (e.g., cache-line or power analysis)
  • Corpus size impacts coverage-guided biasing; excessive size may dilute feedback efficacy

Extensions:

  • Expanding scope to SoCs by instrumenting interconnects (AXI/TileLink) would expose cross-component leaks
  • Refined attacker models (cache-set, power, ML-based) could increase sensitivity and precision
  • Hybrid approaches using incremental synthesis or verification loops allow fuzzing to efficiently eliminate candidate contracts beyond the reach of bounded model checking

Coverage-guided hardware-software contract fuzzing, enabled by self-composition and SCD coverage, enables effective, feedback-driven exploration of microarchitectural state space, accelerating exposure of timing and speculative vulnerabilities in modern, complex processor cores (Geier et al., 11 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Coverage-Guided Hardware-Software Contract Fuzzing.