LibLMFuzz: Autonomous Fuzzing for Binaries

Updated 23 July 2025

LibLMFuzz is a framework that automatically generates fuzz targets for binary-only libraries using LLM-guided context planning and traditional software tools.
It employs a multi-phase pipeline involving binary analysis, LLM-driven code synthesis, and iterative error correction to achieve 100% API coverage.
The system demonstrates scalability by producing syntactically correct drivers on the first execution for many libraries, significantly reducing manual intervention.

LibLMFuzz is a framework for automatic fuzz target generation for binary-only (black-box) libraries using a combination of LLMs and traditional software tooling. Developed to address the challenge of efficiently fuzzing closed-source and proprietary libraries, LibLMFuzz integrates an agentic LLM with tools for binary analysis, code compilation, and fuzzing, enabling fully autonomous planning, driver synthesis, and iterative self-repair without human intervention (Hardgrove et al., 20 Jul 2025).

1. System Architecture and Workflow

LibLMFuzz operates as middleware that orchestrates collaboration between an LLM agent (using frameworks such as LangChain and APIs like Gemini 2.0 Flash) and conventional tools (including disassemblers, compilers, and fuzzing engines). The architecture proceeds in a multi-phase pipeline:

Phase 0: Binary Analysis and Metadata Extraction
- The middleware leverages a disassembler (e.g., radare2) to inspect the supplied binary. It identifies the exported functions and filters out those susceptible to fuzzing, specifically functions accepting external or user-controlled input.
Phase 1: LLM-Guided Context Planning
- In this phase, the middleware coordinates an interactive dialogue with the LLM, supplying it with extracted function signatures and structural data from the binary.
- The LLM is prompted to develop a plan for fuzzing each function, requesting further context or disassembly if required. This iterative interaction builds up direct contextual knowledge for code generation.
Phase 2: Driver Generation and Repair
- Using accumulated context, the LLM generates minimal C/C++ fuzz drivers for each target function. The generated code targets the construction of randomized or structured input buffers compatible with the extracted function prototypes.
- The drivers are compiled using LLVM’s clang and linked with libFuzzer for testing in a sandboxed environment. Any compilation or runtime failures are relayed verbatim to the LLM, which is tasked with autonomously revising and self-repairing the driver code.
- This cyclic process iterates until a driver is both syntactically correct and yields nominal runtime behavior.

This phased approach enables end-to-end automation of fuzz target generation for stripped binaries, circumventing the necessity for source code or human expertise in reverse engineering the APIs.

2. Methodologies for Binary-Only Fuzzing

LibLMFuzz's methodology is designed for environments where conventional type and semantic information is obscured:

The disassembler provides function names and argument structures with limited type fidelity—complex structures or pointers are often simplified (e.g., reduced to 64-bit integer representations).
LLM prompts are carefully engineered to discourage speculative inference (“do not guess at types”) and to work solely with confirmed, tool-supplied context.
The middleware engages in iterative feedback: error strings from the compiler or runtime environment are incorporated into new LLM prompts, refining the LLM's understanding and prompting corrective action.
The LLM’s code synthesis emphasizes planning around buffer sizing, input mutability, and fuzz strategy, despite incomplete semantic details.

This approach employs the LLM’s inductive reasoning over partial information while ensuring corrections are grounded in concrete feedback from build and runtime phases.

3. Evaluation and Empirical Results

LibLMFuzz was evaluated on four widely used Linux libraries (cJSON, libmagic, libpcap, and libplist), each offered in binary-only (stripped) form. The key quantitative results include:

Metric	Value
Total fuzzable APIs	558
Synthesized drivers	1,601
Syntactic driver correctness	100% (all drivers for all APIs)
First-execution correctness	75.52% nominally correct on first run
Average drivers per function	2.87 (versions per successful driver)

LibLMFuzz achieved 100% API coverage—every fuzzable function was targeted with a syntactically correct driver. Compilation and nominal execution were achieved rapidly; on average, fewer than three iterations per function were needed to produce a working fuzz target (Hardgrove et al., 20 Jul 2025).

4. Challenges and Remediation Strategies

The development and evaluation of LibLMFuzz surfaced several core challenges:

Incomplete or Ambiguous Binary Context: Stripped binaries lack debug and rich type information, often reducing API signatures to ambiguous pointer or integer types. This sometimes restricted the semantic sophistication of generated drivers (e.g., fuzzing only a pointer address, not the underlying structure).
LLM Hallucination and Leakage: LLMs, influenced by their training corpora, may hallucinate function details or include superfluous includes/typedefs not evidenced in the supplied binary.
- Explicitly instructing the LLM to use only provided context and avoid conjectural code reduced such hallucinations.
Iterative Error Correction: Compilation and runtime errors (e.g., type mismatches, buffer overflows, linkage failures) were common on initial code generation.
- LibLMFuzz’s feedback mechanism—dynamic re-prompting of the LLM with error output and clarification—enabled rapid convergence to functional drivers.
Error-Correcting Loop Structure: The system implemented an algorithmic loop:
1. Compile and/or execute the generated driver code.
2. If failure occurs, capture and relay error messages to the LLM.
3. The LLM amends the driver in response; the cycle repeats until success.

Overall, the loop ensured steady improvement of automatically synthesized drivers even in the face of incomplete or opaque function semantics.

5. Impact, Limitations, and Future Directions

LibLMFuzz’s achievement of 100% API coverage for binary-only targets with no human intervention significantly reduces the engineering overhead required for fuzzing closed-source libraries. This advances the scalability, automation, and reach of fuzzing methodologies:

Reduced Engineering Overhead: The approach automates driver generation, previously a manual process that consumed considerable developer resources in closed-source scenarios.
Scalability: The paradigm is applicable to large codebases and libraries with hundreds or thousands of exported functions.
Advanced Autonomous Workflows: The cyclic, agentic loop between the LLM and tool-chain enables real-time adaptation and autonomous error remediation.

Limitations: While API superficial coverage is complete, the deeper semantic/branch coverage within the target functions is not yet characterized. Further, disassembler limitations with stripped binaries mean that some API usages might be fuzzed in a suboptimal way due to missing type or structure data.

Pathways for Future Research:

Integration of branch coverage instrumentation, enabling focus beyond mere API invocation to full behavioral exploration.
Incorporation of human-in-the-loop (for context enrichment via targeted reverse engineering where LLMs or disassemblers stall).
Expansion with dynamic analysis tools, enabling runtime data utilization for enhanced input crafting and bug exposure.
Ablation studies to refine which prompt templates, context elements, and error feedback strategies yield best performance.

In summary, LibLMFuzz demonstrates that LLM-augmented middleware can autonomously plan, synthesize, and iteratively refine fuzz drivers for binary libraries, offering a scalable, cost-efficient foundation for future research in automated vulnerability discovery for black-box software (Hardgrove et al., 20 Jul 2025).