Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Scheduzz: Automatic Fuzzing with LLMs

Updated 27 July 2025
  • Scheduzz is an advanced automatic library fuzzing system that uses LLM-driven constraint extraction and dual scheduling to synthesize rational API drivers.
  • It integrates explicit AST analysis with implicit LLM-based constraints to overcome combinatorial driver synthesis challenges and maximize bug discovery.
  • Its dual scheduling mechanism efficiently allocates resources to promising driver candidates, leading to improved code coverage and reduction in false positives.

Scheduzz is an advanced automatic library fuzzing technique that employs LLMs and a constraint-based dual scheduling framework to generate and execute “rational” fuzz drivers for API-based libraries. Its design addresses both the combinatorial explosion and inefficiency of traditional fuzz driver synthesis methods by formalizing the library fuzzing process as an online optimization problem, integrating explicit type constraints and implicit usage constraints, and managing computational resources through asynchronous scheduling components. The system is evaluated on extensive real-world libraries, showing significant improvements in code coverage and bug discovery over previous state-of-the-art methods.

1. Motivations and Problem Context

Scheduzz targets the core challenge of automated library fuzzing: synthesizing drivers that not only invoke APIs in type-correct sequences but also comply with implicit library usage conventions. Conventional fuzz driver generation approaches often suffer from an exponential combination space for N exported functions (O(2ⁿ)), leading to many “irrational” drivers—sequences that misuse APIs, such as failing to close resources, violating API ordering dependencies, or mixing incompatible calls. Executing these drivers wastes resources and contributes to false positive bug reports. Scheduzz seeks to maximize the discovery of genuine library bugs by generating only rational, high-value drivers and judiciously allocating computational resources to the most promising driver candidates.

2. LLM-Based Constraint Extraction

Scheduzz’s centerpiece is the combined extraction of two categories of constraints:

  • Explicit constraints: Derived by parsing header files and API signatures using AST analysis and symbol extraction (LLVM/Clang-based), capturing type-correct argument passing and function accessibility relationships.
  • Implicit constraints: Extracted by querying powerful LLMs (such as GPT-3.5-turbo) using prompts that synthesize information from source code comments and library documentation. These constraints encode logical dependencies (“if function A is called, function B must follow”), mutual exclusions (conflicts), and typical usage idioms unseen by static analyzers.

The system formalizes constraint validity using the following forms in LaTeX:

  • Valid API combination condition:

fC.gC such that deps(f,g) or deps(g,f),fg\forall f \in C. \exists g \in C \text{ such that } \mathrm{deps}(f, g) \text{ or } \mathrm{deps}(g, f), \quad f \ne g

  • Rationality via implication (PQP \to Q, denoted as PQP \hookrightarrow Q) and conflict (PQP \otimes Q) constraints.

These sets of constraints form the basis for filtering out irrational driver candidates before resource-intensive fuzz execution.

3. Dual Scheduling Framework

Scheduzz implements a dual scheduling architecture, composed of two asynchronous, interacting schedulers:

  • Group Scheduler (GS): Selects promising tuples of API calls, optimizing for criteria such as expected code coverage, group similarity to past successes, API group length (often 5\leq 5), and entropy-based diversity. GS ensures synthesized API sets are both diverse and aligned with promising past patterns.
  • Driver Scheduler (DS): Manages a pool of generated fuzz drivers, dynamically allocating execution time (“energy”) based on signals including per-driver code coverage yield, execution time per coverage unit, and historical feedback on effectiveness. DS periodically suspends or resumes execution of drivers according to this priority.

Both GS and DS cooperate in an online optimization setting:

maxi=1Kj=1Ncov(DR(ui[j]))\max \sum_{i=1}^K \sum_{j=1}^N \mathrm{cov}(D_R(u_i[j]))

where DRD_R is a rational fuzz driver, uiu_i the API index sequence at stage ii, cov()\mathrm{cov}(\cdot) the observed code coverage, and KK/NN the number of stages and combinations considered. This structure transforms fuzz driver resource allocation into a continuous, feedback-driven maximization problem.

4. Implementation and Empirical Evaluation

Scheduzz is implemented in approximately 7,000 source lines, with Python orchestrating constraint extraction, LLM prompting, schedule management, and result evaluation; Prolog and shell scripts are used for constraint satisfaction and driver pipeline automation. It leverages Clang/LLVM infrastructure for AST-based type inference and symbol retrieval.

Benchmarked on 33 diverse real-world libraries from Utopia and PromptFuzz, Scheduzz demonstrated:

Method Relative Branch/Region Coverage (Mean/Max) Notable Bug Findings
Scheduzz 1.62× / 1.89× (vs. CKGFuzzer/OSS-Fuzz) 33 unknown bugs (3 with CVEs assigned)
CKGFuzzer Baseline
PromptFuzz Baseline
OSS-Fuzz Baseline

Scheduzz reported substantially lower stillborn driver rates (non-compiling/crashing drivers), and its resource-aware execution prevented over-allocation to unproductive driver variants.

5. Impact and Technical Contributions

Scheduzz’s design has several implications:

  • Bug finding efficacy: The tool identified 33 previously unknown bugs in “well-tested” libraries, three of which received CVEs, affirming the value of rational fuzzing sequences.
  • Resource utilization: The dual scheduling concept ensures that computational resources are preferentially channeled toward driver candidates likely to increase program coverage, avoiding combinatorically wasteful execution.
  • Constraint formalization: By unifying explicit and implicit constraints through LLM-driven analysis, the coverage of valid API usage idioms is substantially broadened beyond that achievable with static analysis alone, leading to higher-fidelity driver generation.
  • Reduction of false positives: The rationality check suppresses generation and execution of erroneous driver sequences, reducing noise and focusing bug reports on actionable failures.

6. Limitations and Future Directions

Scheduzz’s LLM-based approach introduces dependency on prompt engineering and the accuracy of LLM outputs. Hallucination of constraints can be mitigated by Retrieval-Augmented Generation (RAG) or integration of external knowledge graphs, as proposed for future research. Additionally, refining explicit constraint extraction—e.g., by better distinguishing input/output parameters using LLM analysis—remains a target for enhancement. Scaling to libraries with hundreds of APIs poses challenge to combinatorial tractability, suggesting the need for decomposition strategies or leveraging historical usage patterns as priors for driver generation.

Potential future work includes:

  • Integrating richer domain knowledge to further curtail LLM hallucination.
  • Enhancing scalability to encompass large, complex library interfaces through hierarchical scheduling or progressive refinement strategies.
  • Extending the technique to other forms of code coverage optimization or integrating with continuous integration fuzzing infrastructures.

7. Broader Implications for Automated Fuzzing

Scheduzz’s approach directly addresses limitations in existing fuzz driver generation pipelines by combining state-of-the-art LLM inference with formal constraint satisfaction and adaptive scheduling. The methodological advancement of mapping driver synthesis and execution to an online optimization problem, with explicit mathematical modeling, offers a framework applicable in other domains where combinatorial resource allocation is central. This integration of natural language processing, program analysis, and optimization is positioned to inform the evolution of fuzzing infrastructures, particularly in settings where correctness, efficiency, and high coverage are required.