FuzzCoder: Automated & LLM Fuzz Testing

Updated 19 March 2026

FuzzCoder is a suite of automated and machine learning–augmented fuzz testing techniques that generate fuzz targets from C codebases using annotated directives.
It employs LLM-guided byte-level mutation and bug-directed synthesis to optimize mutation strategies, enhance code coverage, and accelerate vulnerability detection.
Empirical evaluations demonstrate significant gains in bug yield and execution efficiency through techniques such as directed fuzzing and knowledge graph–enhanced driver synthesis.

FuzzCoder is a suite of techniques and frameworks spanning from fully automated fuzz-target generation in C to LLM-driven, byte-level and directed fuzzing pipelines. Its core aim is to improve the scalability, efficiency, and bug-finding yield of fuzz testing across diverse software artifacts, leveraging advances in program analysis and generative modeling. Below, FuzzCoder is dissected along its major algorithmic strata, following the evolution from rule-based to machine learning–augmented fuzzing paradigms.

1. Automated Fuzz-Target Generation in C Codebases

FuzzCoder originated as an end-to-end system for generating libFuzzer-compatible fuzz targets directly from C codebases without requiring manual harnesses or in-depth developer intervention (Kelly et al., 2019). The workflow consists of four main stages:

1.1 Pipeline Overview

Codebase Scanning: Each translation unit is parsed with a C front-end (e.g., Clang LibTooling), constructing the abstract syntax tree (AST) and symbol table. All function declarations and their associated type signatures are extracted.
Function Selection & Annotation: Functions with signatures deemed serializable are automatically targeted. Developers can annotate functions using the //@fuzztest directive, with sub-directives (Array, Value, Output, Cleanup) to handle parameters and resource management.
Wrapper Generation: For each selected function, FuzzCoder computes a flat byte-based input layout, generates a C harness implementing LLVMFuzzerTestOneInput, and assigns fuzzed bytes to parameter variables (via memcpy or direct assignment) according to annotation semantics.
Build & Run: All generated wrappers are compiled with Clang and AddressSanitizer, and executed using libFuzzer, distributing time budget round-robin among targets. Crashes and leaks are deduplicated using AddressSanitizer's stack-tokenization heuristic.

1.2 Signature Serializability and Annotation

FuzzCoder determines serializability by ensuring function parameters are restricted to primitive types and simple structs, forbidding nested pointers, unions, or recursive structures. The handling of the “Array(ptr, len)”, “Value(param, v)”, “Output(param)”, and “Cleanup(cond, fn, ...)” annotations enables sophisticated initialization and resource-management protocols encoded at the comment level.

1.3 Program Transformation Semantics

Given parameter types $T_1, ..., T_k$ (with array pairs removed), the input buffer $B[0 \dots M-1]$ is mapped to parameters via offsets $o_1, ..., o_k$ , computed as $o_i = o_{i-1} + \operatorname{sizeof}(T_{i-1})$ . Array directives re-cast a slice of $B$ into a contiguous buffer for ptr parameters.

1.4 Empirical Evaluation and Limitations

Evaluated on a 300K-line, 17K-function C stack, the fully automated mode found ~20× more bugs than manual wrapper writing, for zero developer annotation time. Setup times were <5 minutes per module with annotations (Kelly et al., 2019). However, annotation-absent runs suffered high rates of false positives (≈73%) due to parameter mis-serialization and over-permissive calling. Only one Array directive per target and simple signatures are supported; deep pointers and unions remain unhandled.

2. LLM-Guided Byte-Level Mutation Fuzzing

Recent FuzzCoder incarnations harness LLMs to guide byte-level mutations, supplanting uniform-random strategies in coverage-guided fuzzers such as AFL (Yang et al., 2024). At its core, FuzzCoder replaces AFL’s mutation operator with an LLM that predicts mutation positions and strategies using a sequence-to-sequence (seq2seq) Transformer.

2.1 Architecture and Predictive Model

AFL presents input $x$ (byte sequence) to FuzzCoder, which outputs a target sequence $y = (p_1, s_1, ..., p_k, s_k)$ , where $p_j$ are offsets and $s_j$ are mutation strategies out of a fixed twelve. AFL applies each mutation to $x$ , resulting in $z$ , which is then fed to the target. The model is formalized as an autoregressive $P(y|x;\Theta)$ , predicting interleaved positions and strategies, and implemented as an encoder-decoder (or decoder-only) Transformer.

2.2 Supervised Fine-Tuning (Fuzz-Instruct Dataset)

The Fuzz-Instruct dataset compiles ~30,000 mutation records from AFL on diverse formats (ELF, JPEG, MP3, XML, TIFF, GIF). For each successful new path/crash, the model receives $(x, y)$ pairs, where $x$ is input bytes and $y$ denotes the sequence of effective mutations. Training employs AdamW, negative log-likelihood loss, and large batch sizes, with models such as StarCoder-2 and CodeQwen benefitting from multi-epoch fine-tuning.

2.3 Mutator Semantics

The twelve predicted strategies implement classic AFL operations (bitflips, arithmetics, interest byte substitutions) at controlled offsets, enabling the model to target protocol fields, magic numbers, and structural fields in file formats selectively.

2.4 Metrics and Results

Performance is measured via Effective Proportion of Mutation (EPM), computed as

$\mathrm{EPM} = \frac{\#\{\text{mutations causing new coverage}\}}{\text{total mutations}}$

and Number of Crashes (NC):

$\mathrm{NC} = \sum_{i=1}^T \mathbf{1}[\text{mutation } i \text{ causes crash}]$

Across eight Fuzz-Bench programs, FuzzCoder improved EPM by 3–4× and discovered up to ~2× more crashes versus baseline AFL. Branch and line coverage also rose by factors up to 2–3× (Yang et al., 2024).

3. LLM-Driven Directed Fuzzing with Bug-Specific Mutator Synthesis

FuzzCoder has been generalized to implement directed fuzzing: finding inputs that not only maximize global coverage but specifically reach and trigger high-priority program locations or bugs (Feng et al., 30 Jun 2025).

3.1 Formalization

Directed fuzzing is framed as maximizing the joint probability that an input trace $\tau$ (a) reaches bug location $\ell$ ( $R(\tau,\ell)=1$ ), and (b) triggers the bug. The expected time to discovery $E[T(S, M)]$ is minimized via smarter seed and mutator selection.

3.2 Seed Generation via Call-Chain Reasoning

Static analysis extracts the call chain from the program entry to the vulnerable function. LLMs are prompted with the function summaries and tasked to generate, then iteratively refine, input samples that traverse the call chain—sidestepping the randomness and inefficiency of classical seeding. Candidate evaluation combines run-time instrumentation with cost models that softmax over path distances to the bug location.

3.3 Bug-Driven Mutator Construction

Upon successful reachability, FuzzCoder instructs the LLM to analyze bug reports or code, extracting constraints and suggesting specialized mutations. These are incorporated into a new weighted distribution over mutation operators, biasing execution toward condition-triggering behaviors.

3.4 Integration and Results

A three-phase architecture consists of offline reasoning (seed, mutator construction), runtime fuzzing (guided mutations, coverage/breach monitoring), and periodic feedback. On 14 real CVEs, FuzzCoder achieved 2×–5× acceleration in bug discovery compared to AFLGo, Beacon, WindRanger, and SelectFuzz. Setup overhead was modest (600s per bug), with statistical tests confirming significance (Feng et al., 30 Jun 2025).

4. Knowledge Graph–Enhanced Fuzz Driver Synthesis

Another variant—embodied in CKGFuzzer—extends FuzzCoder’s reasoning by leveraging code knowledge graphs for automated fuzz driver generation, refinement, and crash triage (Xu et al., 2024).

4.1 Code Graph Construction

Static interprocedural analysis builds a property graph $G = (V,E)$ , with nodes representing functions/files and edges representing containment and call relations. Nodes are annotated with signatures, implementations, and LLM-generated summaries.

4.2 Multi-Agent LLM Pipeline

The system orchestrates four agents:

API Retrieval: Queries the graph to identify high-value API combinations.
Driver Generation: LLMs synthesize LLVMFuzzerTestOneInput drivers for candidate API sets.
Program Repair: LLMs auto-fix compilation errors guided by a dynamic knowledge base of correct usages.
Coverage-Guided Mutation: Post-fuzzing, coverage results guide further API combination synthesis, dynamically amplifying exploration of under-tested code.

4.3 Crash Report Analysis

Sanitizer-triggered crashes are cross-referenced to code locations and pattern-matched (via chain-of-thought LLM prompts) against a CWE knowledge base, distinguishing driver misuse from library bugs.

4.4 Empirical Outcomes

On eight C libraries, average branch coverage improved by 8.73% over prior LLM prompting approaches, and manual crash review effort was reduced by 84.4%. Unique bugs (including nine previously unreported) were found with high accuracy in driver-bug classification (Xu et al., 2024).

5. Limitations and Integration Guidelines

FuzzCoder variants face several constraints:

The original C harness approach does not support complex signatures (nested pointers, unions) and is susceptible to false positives in the absence of explicit annotations.
LLM-based mutation guidance is bounded by the representativeness of its instruction set and dataset; inference costs necessitate GPU resources.
Knowledge-graph-driven workflows can bottleneck on graph construction and require careful tuning of context retrieval to avoid noise or LLM hallucinations.

Best practices recommend:

Initializing with fully automated target generation for broad shallow bug finding, followed by quick annotation passes to improve precision and reduce spurious results.
Integrating outputs into continuous integration, with coverage and bug deduplication, and incrementally enhancing type support.
For LLM-driven and knowledge-graph–enhanced approaches, maintaining a feedback loop between program analysis, LLM mutation/generation, and crash/coverage analytics, periodically retraining on the latest successful fuzz traces and updating repair/triage modules.

6. Future Directions

Prospective enhancements include:

Extending mutator/operator vocabularies, possibly via online learning or reinforcement signals tied to feedback from fuzzing campaigns (Yang et al., 2024).
Exploiting program context, edge-coverage, or dynamic symbolic traces to guide input generation and mutation, particularly for highly structured or protocol-based inputs.
Adapting knowledge graph and LLM pipeline methods to object-oriented or managed languages, and exploring ensembles or task-specific adapters for different codebases (Xu et al., 2024).
Closing the offline-to-online loop by incorporating on-the-fly mutation/operator synthesis and adaptive prompt conditioning based on observed fuzzing outcomes.

Across its incarnations, FuzzCoder establishes a family of automated, scalable, and model-augmented fuzz testing techniques—substantially raising bug discovery rates, code coverage, and engineering productivity compared with traditional and even prior LLM-driven fuzzer workflows.

Markdown Report Issue Upgrade to Chat

References (4)

A Case Study on Automated Fuzz Target Generation for Large Codebases (2019)

FuzzCoder: Byte-level Fuzzing Test via Large Language Model (2024)

Fuzzing: Randomness? Reasoning! Efficient Directed Fuzzing via Large Language Models (2025)

CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FuzzCoder.

FuzzCoder: Automated & LLM Fuzz Testing

1. Automated Fuzz-Target Generation in C Codebases

1.1 Pipeline Overview

1.2 Signature Serializability and Annotation

1.3 Program Transformation Semantics

1.4 Empirical Evaluation and Limitations

2. LLM-Guided Byte-Level Mutation Fuzzing

2.1 Architecture and Predictive Model

2.2 Supervised Fine-Tuning (Fuzz-Instruct Dataset)

2.3 Mutator Semantics

2.4 Metrics and Results

3. LLM-Driven Directed Fuzzing with Bug-Specific Mutator Synthesis

3.1 Formalization

3.2 Seed Generation via Call-Chain Reasoning

3.3 Bug-Driven Mutator Construction

3.4 Integration and Results

4. Knowledge Graph–Enhanced Fuzz Driver Synthesis

4.1 Code Graph Construction

4.2 Multi-Agent LLM Pipeline

4.3 Crash Report Analysis

4.4 Empirical Outcomes

5. Limitations and Integration Guidelines

6. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FuzzCoder: Automated & LLM Fuzz Testing

1. Automated Fuzz-Target Generation in C Codebases

1.1 Pipeline Overview

1.2 Signature Serializability and Annotation

1.3 Program Transformation Semantics

1.4 Empirical Evaluation and Limitations

2. LLM-Guided Byte-Level Mutation Fuzzing

2.1 Architecture and Predictive Model

2.2 Supervised Fine-Tuning (Fuzz-Instruct Dataset)

2.3 Mutator Semantics

2.4 Metrics and Results

3. LLM-Driven Directed Fuzzing with Bug-Specific Mutator Synthesis

3.1 Formalization

3.2 Seed Generation via Call-Chain Reasoning

3.3 Bug-Driven Mutator Construction

3.4 Integration and Results

4. Knowledge Graph–Enhanced Fuzz Driver Synthesis

4.1 Code Graph Construction

4.2 Multi-Agent LLM Pipeline

4.3 Crash Report Analysis

4.4 Empirical Outcomes

5. Limitations and Integration Guidelines

6. Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research