- The paper introduces Argus, a novel multi-agent static analysis framework that integrates RAG and ReAct paradigms for enhanced vulnerability detection.
- It employs supply chain analysis and a hybrid data flow extraction method (Re³) to uncover over 10× more true vulnerabilities than traditional SAST tools.
- The approach reduces false positives and facilitates exploit verification through automated PoC generation and human-in-the-loop review.
Argus: Multi-Agent LLM-Centered Static Analysis for Full-Chain Vulnerability Detection
Motivation and Context
The increasing number and complexity of software vulnerabilities in modern codebases, including critical supply chain dependencies, have outpaced the detection capabilities of conventional rule-based static analysis systems. Traditional SAST tools such as CodeQL and Infer rely on handcrafted taint rules, which result in incomplete coverage, low recall for novel or system-specific flaws, and frequent false positives in realistic industrial settings. Recent proposals to augment SAST with LLMs typically treat LLMs as isolated experts without tightly integrating them into the end-to-end detection pipelines, which leads to shallow reasoning, hallucinated output, and inefficient token consumption.
Argus addresses these deficiencies by reorchestrating the SAST workflow around a collaborative multi-agent ensemble built on LLM primitives, emphasizing comprehensive supply chain analysis and advanced agentic techniques such as Retrieval-Augmented Generation (RAG) and ReAct. This agent-centric approach allows for deeper contextual reasoning, more precise sink and source extraction, and robust detection, including zero-day vulnerabilities.
Technical Overview of the Argus Framework
Argus's architecture is centered on two key components: RAG-enhanced supply chain sink analysis and the Re3 data flow extraction (Retrieval, Recursion, Review). The workflow is depicted below.
Figure 1: Argus's workflow incorporates multi-agent LLM-centric reasoning, collaborative supply chain retrieval, PoC verification, CodeQL sink identification, and Re3 data flow analysis, culminating in comprehensive vulnerability reporting.
Supply Chain Analysis and Sink Discovery
Unlike previous SAST frameworks that focus solely on codebase internals, Argus parses project management files to extract all dependency metadata, then systematically retrieves vulnerability records from authoritative sources (NVD, OSV, GHSA, Snyk) and community repositories. The retrieval is strengthened by evidence scoring derived from relevance, credibility, and content quality metrics.
Structured sink candidates are synthesized by the RAG agent:
Figure 2: RAG agent aggregates sink-related vulnerability information for precise candidate selection given target dependencies.
Proof-of-Concept Generation and Validation
Rather than static pattern matching, Argus employs a dedicated PoC agent under the ReAct paradigm to construct exploit scenarios, generate verification code, and produce repair suggestions for each vulnerability, ensuring the exploitability of sinks before flow mining.
Figure 3: PoC generation and verification confirm sink exploitability, supporting the correctness of detected vulnerabilities.
For verified sinks, Argus initiates data flow extraction using a hybrid backward-forward search: CodeQL performs forward taint analysis, while unreachable sinks are recursively traced upstream, producing surrogate sink trees evaluated in a subsequent forward pass. Candidate flows are then subjected to a multi-step LLM review: end-to-end reachability, hop-by-hop validation of sanitization, and structured reporting.
Figure 4: Examples of vulnerable flows in DataGear discovered by Argus, illustrating complex taint propagation.
Empirical Evaluation
Argus was benchmarked against CodeQL and IRIS on seven robust Java codebases ranging from 100K to 800K lines each, including PublicCMS, JeecgBoot, Rouyi, JSPWiki, DataGear, Yudao-Cloud, and KeyCloak. For each repository, comprehensive supply chain analysis and multi-agent reasoning yielded substantially higher sink recall and true vulnerability counts with only modest increases in token consumption.
Figure 5: Trade-off analysis between token consumption and sink discovery across Argus backbones and target codebases.
Strong numerical results highlight Argus's superiority: baseline SAST tools typically detected zero vulnerabilities (e.g., CodeQL and IRIS failed in most cases), while Argus consistently discovered >10× more true vulnerabilities, including several assigned new CVE numbers. Ablation studies demonstrated that sinks uncovered via RAG contributed disproportionately to detected vulnerabilities compared to static approaches—community-derived sinks are empirically more valuable for practical vulnerability mining.
Case Studies and Attack Verification
Argus's unique multi-agent reasoning enabled recycling of outdated vulnerabilities, yielding exploitable flows even after patches, and leveraged incorrect sink candidates as seeds to inspire discovery of true, previously undetected sinks. For example, RAG agents retrieved a fixed DataGear CVE, but Argus's Re3 workflow exposed new vulnerable data flows by reusing previous access paths. In another PublicCMS case, the initial suspected sinks were imprecise, but their upstream nodes eventually led to correct sink identification via recursion and review.
Figure 6: Screenshot of real exploit injected through zero-day vulnerability found by Argus in PublicCMS.
Reporting and Human-in-the-Loop Review
All detected vulnerabilities were formalized into structured prompts for LLM-based review and exported as detailed analysis reports, enabling transparent verification and actionable remediation.
Figure 7: Example prompt issued to the vulnerability review agent for structured, context-aware auditing.
Figure 8: Final vulnerability analysis report generated by Argus for comprehensive expert review.
Implications and Future Directions
Theoretical implications of Argus include the emergence of agentic LLM frameworks as primary orchestrators of static analysis tasks, capable of combining symbolically precise flows with deep contextual reasoning and supply chain data mining. Practically, Argus demonstrates that LLM-centered multi-agent systems are viable for production SAST deployments, outperforming prior monolithic prompting and symbolic analysis. The approach reduces false positives, operational costs, and enables mining of zero-day vulnerabilities, potentially transforming security audit workflows.
Future development avenues involve integrating Argus with dynamic analysis techniques (e.g., fuzzing), expanding agent fine-tuning through reinforcement learning with reward functions targeting sink discovery accuracy and data flow completeness, and augmenting agent collaboration for even deeper supply chain context. Additional opportunities exist in automating human-in-the-loop workflows, refining PoC generation at scale, and adapting the framework to languages beyond Java.
Conclusion
Argus establishes a novel LLM-agent-centered paradigm for end-to-end static vulnerability detection—including supply chain analysis, collaborative multi-agent workflows, and advanced retrieval and reasoning strategies. Empirical evidence indicates substantial gains in vulnerability recall, practical exploit detection, and reduction of false positives compared to existing symbolic and LLM-assisted methods. The integration of Argus within industrial SAST tools promises to enhance software security postures, foster more comprehensive supply chain coverage, and facilitate scalable vulnerability audits with transparent reporting and remediation capabilities.