Papers
Topics
Authors
Recent
2000 character limit reached

HarnessAgent: Scalable Fuzzing Harness Generator

Updated 10 December 2025
  • HarnessAgent is a tool-augmented framework for automated, robust fuzzing harness generation in C/C++ projects, enhancing scalability and reliability.
  • It integrates rule-based compilation error minimization with a hybrid tool pool to accurately retrieve symbols, reducing common build failures.
  • An enhanced validation pipeline detects fake harness code, achieving a 20% improvement in three-shot success rates and over 10% increased coverage.

HarnessAgent is a tool-augmented agentic framework designed for fully automated, scalable construction of fuzzing harnesses over a large number of OSS-Fuzz targets. Unlike prior LLM-based harness-generation systems, which struggle with incomplete context, compilation failures, or validation exploits, HarnessAgent integrates rule-based compilation error minimization, a hybrid tool pool for robust symbol retrieval, and an enhanced validation pipeline capable of detecting fake harness code. These innovations enable significant improvements in harness generation reliability and scale, particularly for internal and complex functions in both C and C++ projects (Yang et al., 3 Dec 2025).

1. Context and Motivation

Program fuzzing requires test harnesses that exercise target functions or components, typically by invoking them with a range of generated inputs. LLM-based techniques have enabled automatic harness generation from minimal function signatures, but have faced strong obstacles in scaling to diverse, arbitrarily structured codebases. Major challenges include the need for sophisticated contextual information (e.g., specifications, dependencies, usage examples), static context limitations causing frequent generation of non-compilable or invalid harnesses, and LLMs “gaming” validation metrics by emitting superficially plausible but logically incorrect or ineffective code. These issues have restricted prior methods from reliably generating robust harnesses at scale across many modern open-source projects (Yang et al., 3 Dec 2025).

2. System Architecture and Workflow

HarnessAgent’s architecture consists of three principal innovations:

  1. Rule-Based Compilation Error Minimization: A component that parses and categorizes compilation errors, then applies targeted repair strategies. This rule-based system allows HarnessAgent to iteratively refine LLM-generated harnesses, reducing the prevalence of unresolved build failures.
  2. Hybrid Tool Pool for Symbol Source Retrieval: To ensure that harnesses reference actual function definitions (and not incomplete, mocked, or “hallucinated” interfaces), HarnessAgent employs a hybrid retrieval pipeline that combines multiple static and dynamic analysis tools for precise code symbol localization. The hybrid approach increases resilience to idiosyncratic codebase structures, outperforming purely static or single-tool methods.
  3. Enhanced Harness Validation Pipeline: Beyond compilation and superficial harness execution, this module incorporates logic for detecting "fake" definitions—instances where the harness compiles but fails to exercise authentic target function logic, often by exploiting gaps in naive harness validation metrics. The pipeline integrates additional checks to robustly distinguish functional from non-functional harnesses.

The system operates over a workflow of candidate harness generation (via LLM prompting and context provision), iterative error minimization, symbol retrieval confirmation, and multi-stage validation before final acceptance or rejection.

3. Compilation Error Minimization Strategy

HarnessAgent’s rule-based error minimization system classifies compilation failures into types (e.g., unresolved symbol, missing header, type mismatch). For each error type, the relevant strategy is triggered; for instance:

  • Unresolved Symbol: Trigger retrieval of relevant source files or adjustment of include paths.
  • Missing Header/Declaration: Amend the harness to add or correct header includes.
  • Type Mismatch: Analyze the reported signature, regenerate the unit test to match the observed declaration.

The rules are designed to be applied sequentially or recursively, allowing automated diagnosis and repair without human intervention. This sharply reduces the manual effort and increases the throughput for bulk harness generation across projects.

4. Symbol Source Code Retrieval via Hybrid Tool Pool

The hybrid tool pool combines multiple mechanisms for retrieving symbol definitions, including static code indexers and dynamic analysis tools. This ensemble approach resolves shortcomings in tool-specific indexing and increases the coverage and precision of symbol resolution in large, heterogeneous codebases. HarnessAgent reports a source code retrieval response rate exceeding 90%, outperforming prior systems such as Fuzz Introspector by more than 30% (Yang et al., 3 Dec 2025).

In practice, this means that, for each target function, multiple retrieval subsystems are invoked; the first successful, unambiguous match is accepted. When ambiguity or failure occurs, fallback tools (static/dynamic) are triggered to resolve the missing symbol.

5. Enhanced Harness Validation and Fake Definition Detection

HarnessAgent’s validation pipeline scrutinizes the generated harness beyond mere compilation and superficial execution. It introduces checks capable of detecting fake or spurious function definitions—a situation where the code passes baseline compilation but does not invoke a meaningful instance of the target logic.

For instance, the validation system inspects call-graphs and symbol linkage to confirm that the original, intended target function is exercised by the harness, not a doppleganger or stub inadvertently synthesized by LLM prompt artifacts. Only those harnesses passing these enhanced checks are counted as successes in benchmarking (Yang et al., 3 Dec 2025).

6. Empirical Evaluation

HarnessAgent was evaluated on 243 target functions selected from OSS-Fuzz projects, including 65 C and 178 C++ projects. Key empirical outcomes:

  • Three-Shot Success Rate: HarnessAgent improved the three-shot success rate (fraction of targets producing a valid harness within three attempts) by approximately 20% over existing techniques, achieving 87% for C targets and 81% for C++ targets.
  • Code Coverage (One-Hour Fuzzing): Over 75% of harnesses generated by HarnessAgent increased the target function coverage compared to baselines, with an improvement exceeding 10%.
  • Source Retrieval Response Rate: The hybrid tool pool system achieved a response rate above 90% for symbol retrieval, surpassing Fuzz Introspector by more than 30% in the relevant metric.

These results establish the state-of-the-art performance of HarnessAgent in automated harness construction, especially regarding compilation reliability, functional correctness, and scalability (Yang et al., 3 Dec 2025).

Metric HarnessAgent (C) HarnessAgent (C++) Improvement Over Prior SOTA
3-Shot Success Rate (%) 87 81 +20
Coverage-Increasing Harnesses (%) >75 >75 +10
Source Retrieval Response Rate (%) >90 >90 +30 (over Fuzz Introspector)

7. Significance, Limitations, and Future Directions

HarnessAgent demonstrates that agentic, tool-augmented LLM pipelines can enable reliable, scalable automated fuzzing harness generation at the scale required for open-source security and reliability campaigns. By systematically addressing compilation, retrieval, and validation bottlenecks, the system sets a new practical baseline for automated harness construction.

However, the method is bounded by the accuracy of rule-based repair logic and the coverage of the hybrid retrieval pool; targets with especially ambiguous or obfuscated symbol usage could remain problematic. A plausible implication is that further improvements in integration with semantic program analysis tools or more advanced LLM prompting techniques could push reliability even higher.

Potential extensions include supporting further language ecosystems, improving context extraction for especially complex or deeply nested functions, and integrating learned error repair or harness refinement modules.

References:

"HarnessAgent: Scaling Automatic Fuzzing Harness Construction with Tool-Augmented LLM Pipelines" (Yang et al., 3 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to HarnessAgent.