- The paper introduces a multi-agent LLM framework that leverages Suspicious Point abstraction and dual-layer fuzzing for precise vulnerability discovery.
- It integrates static analysis and dynamic execution with MCP, demonstrating a 90% detection rate on challenging, real-world software projects.
- Real-world deployment uncovered 41 zero-day vulnerabilities, evidencing the framework’s reproducibility and its scalability in continuous integration environments.
FuzzingBrain V2: Advances in Automated Vulnerability Discovery via Multi-Agent LLM Systems
Introduction
The increasing prevalence of software vulnerabilities, with annual CVE disclosures reaching nearly 50,000 in 2025, underscores the persistent and growing importance of automated vulnerability discovery. Existing approaches leveraging static and dynamic analysis confront high false positive rates and limited ability to reproduce and verify vulnerabilities, especially those involving complex, context-dependent triggers. LLMs have emerged as promising tools for semantic code analysis but face challenges in precise vulnerability localization, reproducibility, and reasoning over intricate cross-function dependencies.
Architectural Innovations of FuzzingBrain V2
FuzzingBrain V2 introduces a modular, multi-agent system leveraging advanced LLMs, the Model Context Protocol (MCP), and integration with Google OSS-Fuzz for robust, scalable analysis. The system is designed around the following principles:
- LLM multi-agent architecture: Hierarchically structured agents are specialized for direction generation, suspicious point (SP) identification and verification, and PoC generation, enabling a division of labor and parallelization without losing global context.
- Suspicious Point (SP) abstraction: FuzzingBrain V2 formalizes the SP as an intermediate granularity between line-level and function-level, retaining control flow context while enabling precise localization and targeted reproduction.
- Hierarchical, logic-driven search: Code is decomposed into semantically meaningful "directions," each comprising core and general function pools that prioritize analysis based on business logic risk and entry point proximity.
- Dual-layer fuzzing: Integration of a global fuzzer targeting broad coverage and an SP-focused fuzzer for deep, constraint-guided exploration along verified control paths.
- MCP-based tool orchestration: All agents interact via FastMCP, supporting compositional and reusable workflows for static and dynamic analysis, context compression, and seed generation.
Figure 1: Architecture of FuzzingBrain V2, with agent-based pipelining and the SP abstraction bridging static analysis, LLM coordination, and fuzzing.
Task Scheduling and Multi-Agent Pipeline
FuzzingBrain V2 divides work per (fuzzer, sanitizer) pair, supporting massive parallelization and tailored feature exploration. Each worker receives a filtered call graph and independently executes the multi-agent pipeline:
- Static Analysis: Extraction of function metadata, call relationships, and fuzzer reachability.
- Direction Generation: Logical decomposition of the codebase into business features (directions), each prioritized by input proximity and complexity.
- SP Identification and Verification: Initial high-recall identification by the SP Generator, followed by conservative filtering through deeper MCP-assisted context tracing in the SP Verifier.
- PoC Generation: Iterative LLM-driven input crafting, interleaved with dynamic feedback and corpus mutation, culminating in crash-reproducible PoCs or further seeding of the fuzzing infrastructure.
Figure 2: Parallelization via worker distribution over (fuzzer, sanitizer) pairs.
Figure 3: Per-worker pipeline, highlighting LLM tiering, static/dynamic analysis tool integration, and the distinction between Full-Scan (semantic) and Delta-Scan (commit-diff) workflows.
Evaluation: Effectiveness and Complexity
The system was benchmarked on the AIxCC 2025 Final Challenge (AFC) dataset, composed of 40 vulnerabilities in 12 large C/C++ projects. FuzzingBrain V2 demonstrated a 90% vulnerability detection rate (36/40), including 9 out of 12 "hard" challenges characterized by deep call chains and sophisticated input requirements. This outperformed both previous FuzzingBrain iterations and leading AIxCC finalist teams by a substantial margin.
Figure 4: Vulnerability discovery results across AFC challenges, with FuzzingBrain V2 exhibiting leading coverage, particularly on high-difficulty cases.
Detailed case studies confirm that SP-guided fuzzing enabled the system to overcome obstacles that stymied both traditional fuzzing and generic LLM agents. For instance, discovery of leap second-related and protocol type confusion vulnerabilities required coordinated multi-field input generation and protocol-specific cryptographic reasoning, achievable only through multi-iteration agent collaboration and exploitation of dynamic execution feedback.
Figure 5: Real-world PoC requirements for complex vulnerabilities, highlighting necessity for multi-step and cryptographically correct input preparation.
Figure 6: PoV (proof-of-vulnerability) generation progress: FuzzingBrain V2 successfully deepens its input search space compared to prior systems, enabling complex bug discovery.
Component Ablation and Contribution Analysis
A series of ablation experiments demonstrate the necessity of each FuzzingBrain V2 subsystem:
Real-World Deployment and Zero-Day Vulnerability Impact
FuzzingBrain V2 was deployed on 19 open-source projects, uncovering 41 zero-day vulnerabilities spanning C, C++, and Java projects; 26 were confirmed, and 23 were fixed by maintainers, with two receiving CVE assignments.
Notably, FuzzingBrain V2 identified vulnerabilities in mature, heavily fuzzed projects where prior efforts had saturated coverage. Such results suggest strong generalization abilities and indicate that semantic, context-aware analysis yields dividends even in extensively tested codebases.
Figure 8: Distribution of detected vulnerability classes, matching the prevalence observed in large-scale CVE disclosures.
Implications and Future Directions
Practically, FuzzingBrain V2 delivers a scalable, reproducibility-focused vulnerability discovery pipeline that can be adopted in continuous integration contexts and extended to multi-language codebases. Theoretically, the system substantiates the utility of modular LLM-agent architectures, judicious human-inspired task decomposition, and granular representations like SP for bridging semantic reasoning and dynamic verification.
Highlighted limitations include handling vulnerabilities requiring complex stateful or multi-input triggers, contexts with implicit or undocumented state transitions, and environments hindering code instrumentation. Future advancements may focus on:
- Extending multi-input and temporal correlation handling
- Adaptive, long-context memory management and retrieval-augmented agent coordination
- Patch synthesis and validation
- Binary-only target analysis via dynamic emulation
Conclusion
FuzzingBrain V2 operationalizes a multi-agent LLM-based vulnerability discovery framework that significantly advances the reproducibility, precision, and depth of automated security analysis. The introduction of the Suspicious Point abstraction, robust integration of static and dynamic MCP-based tools, and dual-layer fuzzing realize results that rival and exceed state-of-the-art competition teams and prior LLM-centered approaches. The research solidifies the role of specialized LLM agents as effective guides in the software security lifecycle and sets new benchmarks for both theoretical and practical progress in automated vulnerability analysis.