Papers
Topics
Authors
Recent
Search
2000 character limit reached

FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

Published 20 May 2026 in cs.CR and cs.SE | (2605.21779v1)

Abstract: Software vulnerabilities pose critical security threats, with nearly 50,000 CVEs reported in 2025. While LLMs show promise for automated vulnerability detection, three key challenges remain. First, LLM-generated vulnerability reports suffer from high false positive rates and lack reproducible verification. Second, existing LLM-based approaches use suboptimal granularities for vulnerability localization: function-level analysis overlooks bugs when context becomes extensive, while line-level analysis lacks sufficient context. Third, existing approaches have difficulty reasoning about vulnerabilities with complex cross-function dependencies and triggering conditions. We present FuzzingBrain V2, a multi-agent system that addresses these gaps through four key contributions: (1) fully automated vulnerability analysis built on Google's OSS-Fuzz, ensuring all reported vulnerabilities are fuzzer-reproducible; (2) Suspicious Point, a novel control-flow-based abstraction for precise vulnerability localization at the optimal granularity; (3) logic-driven hierarchical function analysis with dual-layer fuzzing enhancing function coverage under resource constraints; (4) MCP-based static and dynamic analysis tools with context engineering enhancing complex vulnerability reasoning. On the AIxCC 2025 Final Competition C/C++ dataset, FuzzingBrain V2 achieved 90% detection rate (36 of 40 vulnerabilities). In real-world deployment, FuzzingBrain V2 discovered 29 zero-day vulnerabilities across 12 open-source projects, all confirmed and fixed by maintainers, with 2 assigned CVE IDs.

Summary

  • The paper introduces a multi-agent LLM framework that leverages Suspicious Point abstraction and dual-layer fuzzing for precise vulnerability discovery.
  • It integrates static analysis and dynamic execution with MCP, demonstrating a 90% detection rate on challenging, real-world software projects.
  • Real-world deployment uncovered 41 zero-day vulnerabilities, evidencing the framework’s reproducibility and its scalability in continuous integration environments.

FuzzingBrain V2: Advances in Automated Vulnerability Discovery via Multi-Agent LLM Systems

Introduction

The increasing prevalence of software vulnerabilities, with annual CVE disclosures reaching nearly 50,000 in 2025, underscores the persistent and growing importance of automated vulnerability discovery. Existing approaches leveraging static and dynamic analysis confront high false positive rates and limited ability to reproduce and verify vulnerabilities, especially those involving complex, context-dependent triggers. LLMs have emerged as promising tools for semantic code analysis but face challenges in precise vulnerability localization, reproducibility, and reasoning over intricate cross-function dependencies.

Architectural Innovations of FuzzingBrain V2

FuzzingBrain V2 introduces a modular, multi-agent system leveraging advanced LLMs, the Model Context Protocol (MCP), and integration with Google OSS-Fuzz for robust, scalable analysis. The system is designed around the following principles:

  • LLM multi-agent architecture: Hierarchically structured agents are specialized for direction generation, suspicious point (SP) identification and verification, and PoC generation, enabling a division of labor and parallelization without losing global context.
  • Suspicious Point (SP) abstraction: FuzzingBrain V2 formalizes the SP as an intermediate granularity between line-level and function-level, retaining control flow context while enabling precise localization and targeted reproduction.
  • Hierarchical, logic-driven search: Code is decomposed into semantically meaningful "directions," each comprising core and general function pools that prioritize analysis based on business logic risk and entry point proximity.
  • Dual-layer fuzzing: Integration of a global fuzzer targeting broad coverage and an SP-focused fuzzer for deep, constraint-guided exploration along verified control paths.
  • MCP-based tool orchestration: All agents interact via FastMCP, supporting compositional and reusable workflows for static and dynamic analysis, context compression, and seed generation. Figure 1

    Figure 1: Architecture of FuzzingBrain V2, with agent-based pipelining and the SP abstraction bridging static analysis, LLM coordination, and fuzzing.

Task Scheduling and Multi-Agent Pipeline

FuzzingBrain V2 divides work per (fuzzer, sanitizer) pair, supporting massive parallelization and tailored feature exploration. Each worker receives a filtered call graph and independently executes the multi-agent pipeline:

  • Static Analysis: Extraction of function metadata, call relationships, and fuzzer reachability.
  • Direction Generation: Logical decomposition of the codebase into business features (directions), each prioritized by input proximity and complexity.
  • SP Identification and Verification: Initial high-recall identification by the SP Generator, followed by conservative filtering through deeper MCP-assisted context tracing in the SP Verifier.
  • PoC Generation: Iterative LLM-driven input crafting, interleaved with dynamic feedback and corpus mutation, culminating in crash-reproducible PoCs or further seeding of the fuzzing infrastructure. Figure 2

    Figure 2: Parallelization via worker distribution over (fuzzer, sanitizer) pairs.

    Figure 3

    Figure 3: Per-worker pipeline, highlighting LLM tiering, static/dynamic analysis tool integration, and the distinction between Full-Scan (semantic) and Delta-Scan (commit-diff) workflows.

Evaluation: Effectiveness and Complexity

The system was benchmarked on the AIxCC 2025 Final Challenge (AFC) dataset, composed of 40 vulnerabilities in 12 large C/C++ projects. FuzzingBrain V2 demonstrated a 90% vulnerability detection rate (36/40), including 9 out of 12 "hard" challenges characterized by deep call chains and sophisticated input requirements. This outperformed both previous FuzzingBrain iterations and leading AIxCC finalist teams by a substantial margin. Figure 4

Figure 4: Vulnerability discovery results across AFC challenges, with FuzzingBrain V2 exhibiting leading coverage, particularly on high-difficulty cases.

Detailed case studies confirm that SP-guided fuzzing enabled the system to overcome obstacles that stymied both traditional fuzzing and generic LLM agents. For instance, discovery of leap second-related and protocol type confusion vulnerabilities required coordinated multi-field input generation and protocol-specific cryptographic reasoning, achievable only through multi-iteration agent collaboration and exploitation of dynamic execution feedback. Figure 5

Figure 5: Real-world PoC requirements for complex vulnerabilities, highlighting necessity for multi-step and cryptographically correct input preparation.

Figure 6

Figure 6: PoV (proof-of-vulnerability) generation progress: FuzzingBrain V2 successfully deepens its input search space compared to prior systems, enabling complex bug discovery.

Component Ablation and Contribution Analysis

A series of ablation experiments demonstrate the necessity of each FuzzingBrain V2 subsystem:

  • Removal of dynamic analysis tools reduced hard challenge detection from 9/12 to 1/12, confirming the criticality of execution feedback for path reachability validation.
  • Disabling the SP Verifier led to significant performance and scalability degradation due to unfiltered false positives.
  • Omission of direction-based scheduling increased resource consumption and time-to-discovery due to reduced prioritization.
  • Eliminating SP Fuzzing dramatically decreased deep bug discovery, confirming that random mutation alone is insufficient for complex constraint navigation. Figure 7

    Figure 7: Ablation study results, illustrating detection loss and cost/time overhead when key modules are removed.

Real-World Deployment and Zero-Day Vulnerability Impact

FuzzingBrain V2 was deployed on 19 open-source projects, uncovering 41 zero-day vulnerabilities spanning C, C++, and Java projects; 26 were confirmed, and 23 were fixed by maintainers, with two receiving CVE assignments.

Notably, FuzzingBrain V2 identified vulnerabilities in mature, heavily fuzzed projects where prior efforts had saturated coverage. Such results suggest strong generalization abilities and indicate that semantic, context-aware analysis yields dividends even in extensively tested codebases. Figure 8

Figure 8: Distribution of detected vulnerability classes, matching the prevalence observed in large-scale CVE disclosures.

Implications and Future Directions

Practically, FuzzingBrain V2 delivers a scalable, reproducibility-focused vulnerability discovery pipeline that can be adopted in continuous integration contexts and extended to multi-language codebases. Theoretically, the system substantiates the utility of modular LLM-agent architectures, judicious human-inspired task decomposition, and granular representations like SP for bridging semantic reasoning and dynamic verification.

Highlighted limitations include handling vulnerabilities requiring complex stateful or multi-input triggers, contexts with implicit or undocumented state transitions, and environments hindering code instrumentation. Future advancements may focus on:

  • Extending multi-input and temporal correlation handling
  • Adaptive, long-context memory management and retrieval-augmented agent coordination
  • Patch synthesis and validation
  • Binary-only target analysis via dynamic emulation

Conclusion

FuzzingBrain V2 operationalizes a multi-agent LLM-based vulnerability discovery framework that significantly advances the reproducibility, precision, and depth of automated security analysis. The introduction of the Suspicious Point abstraction, robust integration of static and dynamic MCP-based tools, and dual-layer fuzzing realize results that rival and exceed state-of-the-art competition teams and prior LLM-centered approaches. The research solidifies the role of specialized LLM agents as effective guides in the software security lifecycle and sets new benchmarks for both theoretical and practical progress in automated vulnerability analysis.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 11 tweets with 1 like about this paper.