Coverage-Guided HW-SW Contract Fuzzing
- The paper shows that symbolic leakage contracts combined with self-compositional testing effectively expose subtle microarchitectural side-channel vulnerabilities.
- It introduces a Self-Composition Deviation (SCD) coverage metric that captures dynamic register divergence to guide test prioritization with measurable feedback.
- Experimental evaluations on Rocket and BOOM cores confirm that adaptive fuzzing strategies significantly accelerate leak detection while reinforcing contract security.
Coverage-guided hardware-software contract fuzzing is a pre-silicon methodology for systematically uncovering side-channel vulnerabilities in processor microarchitecture by combining symbolic leakage contract specifications, a self-compositional test harness, and a novel feedback-driven coverage metric. Whereas traditional hardware fuzzing targets functional bugs, coverage-guided contract fuzzing addresses the subtler problem of verifying that a microarchitecture does not violate its information flow security guarantees as formally encoded in architectural-level contracts. This approach makes security-relevant microarchitectural divergences observable and efficiently directs the search toward states most likely to expose contract violations, particularly in the presence of optimizations such as out-of-order or speculative execution (Geier et al., 11 Nov 2025).
1. Hardware-Software Leakage Contracts and Self-Composition
A hardware-software leakage contract augments the instruction set architecture (ISA) semantics with explicit observations intended to formally characterize the information an attacker may legitimately learn. Define architectural states as (memory and register file ), with transitions under a program . Observations describe leakage, e.g., instruction fetch address or data cache access.
The contract thereby induces a ternary relation on states: A contract trace for a program from initial state is the sequence with each transition labeled by its observation. The attacker model is expressed as a mapping from a hardware execution trace to observable information; a minimal but effective choice is cycle count, . Security holds if, whenever two programs yield indistinguishable contract traces, their attacker observations are identical: Self-compositional simulation runs two copies of the core in lockstep, one holding secret input , one , initializing both under the same clock/reset and monitoring per-cycle divergence across all registers and stateful elements via a bit-vector .
2. Self-Composition Deviation (SCD) Coverage Metric
Naive fuzzing suffers from coverage blindness regarding side-channel vulnerabilities. Self-Composition Deviation (SCD) coverage is introduced as a metric tailored to side-channel analysis. It tracks not only which architectural registers differ between two executions but also transitions in these divergence patterns.
Given the current divergence and previous , SCD forms a hashed index into a bit-vector (e.g., 2 MiB in size): This per-cycle update records the occurrence of new or shifting divergence patterns. Each test case has its own coverage vector ; the global search maintains a cumulative coverage . The process is thus guided to maximize the discovery of previously unseen divergence transition patterns, which are crucial for exposing microarchitectural side-channels.
3. Fuzzing Workflow and Feedback Strategies
The contract fuzzing pipeline consists of four stages: program generation/mutation, contract simulation, RTL self-composition, and coverage-driven prioritization.
- A. Generation & Mutation: Programs are random RISC-V instruction sequences (words), up to four instructions each, generated to ensure realistic termination/data dependency. Two data sections (for and ) are paired per program and manipulated using mutation (deletion, insertion, retention) and splicing, governed by an LRU policy for seed management.
- B. Contract Simulation: Each is compiled and run through a Sail-based contract simulator. If contract traces differ, the input is contract-distinguishable and discarded; only contract-indistinguishable pairs proceed.
- C. RTL Self-Composition: At RTL, the core is duplicated and instrumented via FIRRTL to compare all register states cycle-wise. Parallel simulation (Verilator + cocotb) ensures efficiency. On termination, any difference in cycle count between instances signals a leak.
- D. Prioritization: Test cases that contribute new coverage bits to cumulative SCD coverage are prioritized. Under weighted feedback, seeds are selected proportionally to their score, defined by the rarity of their discovered bits:
where are indices set by , is the number of corpus entries previously covering .
Pseudocode for the overall fuzzing loop organizes these steps, emphasizing contract-indistinguishability gating, RTL-based leak discovery, and adaptive corpus expansion by SCD novelty.
4. Implementation Details: Rocket and BOOM Cores
The approach was implemented atop DifuzzRTL, extended with:
- FIRRTL pass to duplicate all registers/wires and inject comparators for
- cocotb test harness driving two TileLink channels and memory models with unified control
- Cycle-accurate polling of the RISC-V HTIF tohost flag for robust termination detection
- Parallel ProcessPoolExecutor to batch Verilator jobs, amortizing startup costs
For out-of-order pipelines (BOOM), implementation duplicated large reorder buffers, issue queues, and rename tables. Comparisons focused on architectural register file writes per cycle, obviating the need to handle speculation metadata.
Reset refinement extended beyond architectural registers to memory and cache invalidation, ensuring execution reproducibility.
5. Experimental Evaluation
Experiments comprised -iteration campaigns on Rocket (in-order) and BOOM (out-of-order) using four prioritization strategies: Pass Feedback (corpus=1000), Pass 100 (corpus=100), New Coverage, and Weighted Feedback. Key findings:
- SCD Coverage: On Rocket (seq-ct-b contract), median indexed bits after iterations were 26,587 (Pass), 46,256 (NewCoverage), 54,325 (Weighted), indicating substantial SCD improvement under guided feedback (Mann-Whitney ). On BOOM (seq-arch), Weighted achieved ~2.05M bits vs. ~1.88M for other approaches.
- Leak Discovery: On the Rocket core, no contract violations were found after iterations, confirming robustness under tested contracts. On BOOM, out-of-order speculative leaks were found; e.g., load reordering led to timing differences in violation of the contract. The number of test cases to first leak (median of 100 runs): Pass 338, Pass100 1077, NewCoverage 326, Weighted 279 ( for Weighted vs. Pass100), demonstrating that increased SCD coverage accelerates leak discovery.
6. Best Practices, Limitations, and Extensions
Best Practices:
- Instrument architectural registers (optionally cache lines) for divergence tracking
- Integrate coverage hash/update logic into the RTL harness to avoid large trace logs
- Leverage parallel simulation to maximize throughput
- Maintain precise data-section distinctions to enforce contract-indistinguishability
Limitations:
- RTL simulation is slow; industrial-scale designs may necessitate hardware emulation or selective sampling
- Cycle-count attacker model is imprecise for certain side-channels (e.g., cache-line or power analysis)
- Corpus size impacts coverage-guided biasing; excessive size may dilute feedback efficacy
Extensions:
- Expanding scope to SoCs by instrumenting interconnects (AXI/TileLink) would expose cross-component leaks
- Refined attacker models (cache-set, power, ML-based) could increase sensitivity and precision
- Hybrid approaches using incremental synthesis or verification loops allow fuzzing to efficiently eliminate candidate contracts beyond the reach of bounded model checking
Coverage-guided hardware-software contract fuzzing, enabled by self-composition and SCD coverage, enables effective, feedback-driven exploration of microarchitectural state space, accelerating exposure of timing and speculative vulnerabilities in modern, complex processor cores (Geier et al., 11 Nov 2025).