CovFuzz: Coverage-Guided Fuzzing

Updated 28 November 2025

CovFuzz is a methodology implementing coverage-guided fuzzing that leverages coverage feedback from neuron activations and code edge metrics to guide input mutations.
It features two implementations—CoCoFuzzing for neural code models and CovFUZZ for cellular protocols—each using tailored mutation engines to explore untested behaviors.
Empirical evaluations show significant enhancements in error detection and coverage, with adversarial retraining improving both model robustness and protocol security.

CovFuzz refers to a methodology and set of automated frameworks implementing coverage-guided fuzzing for the robust testing of code and protocol implementations. This approach has been realized in two distinct, field-defining systems: one for neural code models ("CoCoFuzzing," also denoted CovFuzzing) and one for cellular network protocol stacks (CovFUZZ). Both utilize feedback from coverage metrics—either neuron activations in deep models or code edge coverage in software stacks—to guide input mutation and discover erroneous or fragile behaviors.

1. Motivation and Problem Space

Coverage-guided fuzzing seeks to maximize the exploration of code or model state space by generating input mutations that expose previously untraversed behaviors. In conventional software, code coverage metrics (such as line or branch coverage) are used as feedback for mutation engines. In deep learning models, neuron activation coverage has emerged as an analogue.

CovFuzz, as realized in CoCoFuzzing, is motivated by the need for robustness evaluation of neural code models which are subject to rigid grammatical and semantic constraints. In protocol stack testing (CovFUZZ), the primary driver is ensuring security and implementation correctness in critical 4G/5G network procedures by surfacing vulnerabilities through protocol-aware fuzzing that targets all non-physical-layer control fields (Siroš et al., 28 Oct 2024, Wei et al., 2021).

2. Core Architectural Components

Both CovFuzz instantiations are characterized by modular architectures that abstract key entities in the fuzzing loop:

CovFUZZ (Cellular Protocols):
- Protocol-Stack Implementation: Includes srsENB, srsEPC, srsUE for 4G and srsGNB plus Open5GS for 5G. Interception hooks at RRC and MAC layers enable in-situ field mutation for RRC, NAS, PDCP, RLC, and MAC.
- Device-Under-Test (DUT): Can be open-source stack components (enabling "grey-box" feedback) or commercial off-the-shelf (COTS) UEs and modems (necessitating "black-box" operation).
- Fuzzing Controller: Receives intercepted packets, dissects via Wireshark-based library, applies mutation engines (Random or Coverage-based), and orchestrates resets, crash detection, and coverage logging. Communication uses low-latency shared-memory or sockets.
CoCoFuzzing (Neural Code Models):
- Seed Queue: Repository of syntactically legal, behavior-preserving code snippets for mutation.
- Mutation Engine: Applies one of ten semantic-preserving transformations, operable via Java AST analysis.
- Coverage Analyzer: Hooks into the neural model (PyTorch or TensorFlow), thresholds neuron activations, and tracks coverage progression.
- Test Corpus & Retraining Loop: Accumulates mutants with novel activation profiles and optionally uses them for adversarial retraining.

This architectural emphasis on protocol- and semantics-aware field/code mutation, plus standardized feedback, is foundational across implementations.

3. Coverage-Guided Mutation Algorithms

Central to CovFuzz is the guidance mechanism that dynamically adapts input mutation based on measured feedback:

CovFUZZ Mutation Probability Algorithm:
- Each candidate packet field $f$ is assigned per-iteration mutation probability $p^i_f$ .
- Initialization: $p^0_f = k / |F_P| \quad \text{(for all fields $f $in packet$ P$)}$ with clamping to $[0.005, 0.90]$ .
- Update rule after iteration $i$ (with $n^i$ actual mutations, coverage delta $c^i$ ):
$p^i_f \leftarrow p^{i-1}_f + \frac{F(c^i,i)}{\log_2(|V_f| + 1)}$

where $|V_f|$ is the value domain of $f$ . - $F(c, i)$ models both new coverage discovery and campaign progress, using hyperparameter $\beta$ per scenario, with empirically optimized values ( $\beta \approx 4$ for downlink grey-box, $\beta \approx 2$ for uplink grey-box).
CoCoFuzzing Mutant Selection (Coverage-Guided Loop):
- For each seed program $p$ , apply each operator to generate candidate mutants.
- Evaluate the number of new neurons activated.
- Select the mutant maximizing novel activation, up to a per-seed mutation budget ( $MAX = 3$ ).
- Continue recursively for each seed and accumulate maximally activating test cases.

These methods provably steer test input selection toward inputs that probe unexplored (code or neuron) coverage, subject to domain constraints (Siroš et al., 28 Oct 2024, Wei et al., 2021).

4. Feedback Mechanisms in Black-Box Contexts

When direct coverage feedback from the DUT is unavailable (black-box scenario), CovFuzz leverages proxy coverage obtained from open-source generator stacks:

CovFUZZ: Utilizes coverage profiles from srsENB for downlink and srsUE for uplink as proxies when testing COTS UEs or network elements, with the hypothesis that high generator-DUT coverage correlation suffices for effective guidance. All algorithmic update rules remain unchanged, with only $c^i$ sourced from the instrumental stack rather than the DUT.
CoCoFuzzing: Directs neuron coverage solely within the model under test, but a plausible implication is that similar proxy strategies may be feasible for models architecturally related to instrumentable reference implementations.

This abstraction broadens coverage-guided fuzzing applicability beyond pure grey-box settings.

5. Implementation and Mutation Operators

CovFUZZ implementation:
- Written in C++ atop srsRAN 4G/5G and Open5Gs.
- Packet interception at RRC and MAC layers, supporting mutation above the physical layer.
- Code instrumentation with LLVM AddressSanitizer and CoverageSanitizer for memory errors and AFL-style edge coverage.
- srsUE enhancements include TCP/ZMQ listeners for rapid resets and reduced attach timers for high-throughput testing.
CoCoFuzzing mutation operators (Java):
- Ten semantic-preserving transformations implemented as AST rewrites, including dead store insertion, numerical obfuscation, statement duplication, insertion of unreachable control blocks (if, if-else, switch, for, while), and variable renaming.
- Each operator guarantees syntactic correctness and semantic preservation, facilitating metamorphic testing.

Operator	Description	Guarantee
Dead store (Op1)	Insert unused local variable	No semantic change
Unreachable if	Insert "if (false) { ... }" block (Op5)	Block never executed
Variable rename	Rename variable and all uses (Op10)	Behavior unchanged

6. Empirical Evaluation and Discovered Vulnerabilities

CovFUZZ (Protocol Stacks):
- Grey-box: On srsRAN, 20×2000-iteration runs for uplink and downlink. Coverage-based fuzzer delivered +47.6% (downlink) and +11.9% (uplink) more code coverage than random fuzzer at optimal $k$ and $\beta$ .
- Black-box: Proxy-feedback fuzzer exceeded random baseline by +23.9% (downlink) and +11.3% (uplink).
- COTS Devices: 12 devices tested, with 10 showing crash/hang under malformed Attach messages (most via mutation of the sr_PUCCH_ResourceIndex field).
Bug classes surfaced in srsRAN:
- Use-after-free, buffer-overflows (including log routines), and assertion failures.
- Example: out-of-order RRCConnectionReconfigurationComplete triggers heap overflow in srsENB.
CoCoFuzzing (Neural Code Models):
- On NeuralCodeSum, CODE2SEQ, and CODE2VEC, applying random mutations or individual ops led to substantial metric drops (NeuralCodeSum BLEU -69.5% with single op; up to -85% for disruptive ops). NC-guided fuzzing drove BLEU as low as -84.8% vs baseline on NeuralCodeSum.
- Operators activate distinct neuron subsets, as measured by Jaccard distance.
- Coverage-guided mutants generated higher neuron coverage (up to 48.95% vs 47.39% baseline on NCS).
- Adversarial retraining with CovFuzz-generated examples improved model robustness, increasing BLEU/F1 scores by up to +35.2%.

7. Limitations, Generalizability, and Open Research Directions

CovFuzz frameworks rely upon several key assumptions and constraints:

Model and language coverage: CoCoFuzzing’s operators are Java-specific; extension to other languages or model architectures would require new semantic-preserving transformations.
Metric scope: The community continues to debate the optimal coverage metric (e.g., neuron, layer, or surprise adequacy); CovFuzz implementations are modular and could accommodate alternate feedback mechanisms.
Oracle definition: Metamorphic testing assumes strict semantic preservation, which may be violated subtly due to floating-point drift or compiler differences.
Mutation budget: A per-seed mutation cap ( $MAX = 3$ ) is enforced to preserve code naturalness, reflecting empirical distributions in real codebases.

A plausible implication is that the modular nature of CovFuzz enables application to other communication protocols (Wi-Fi, Bluetooth, IoT) or deep models, provided a suitable dissection and feedback interface exists. CovFUZZ’s fine-granular mutation algorithm and black-box proxy strategy constitute notable contributions for future work (Siroš et al., 28 Oct 2024, Wei et al., 2021).

References:

"CovFUZZ: Coverage-based fuzzer for 4G&5G protocols" (Siroš et al., 28 Oct 2024)
"CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing" (Wei et al., 2021)