Argus: Reorchestrating Static Analysis via a Multi-Agent Ensemble for Full-Chain Security Vulnerability Detection

Published 8 Apr 2026 in cs.CR, cs.CL, and cs.SE | (2604.06633v1)

Abstract: Recent advancements in LLMs have sparked interest in their application to Static Application Security Testing (SAST), primarily due to their superior contextual reasoning capabilities compared to traditional symbolic or rule-based methods. However, existing LLM-based approaches typically attempt to replace human experts directly without integrating effectively with existing SAST tools. This lack of integration results in ineffectiveness, including high rates of false positives, hallucinations, limited reasoning depth, and excessive token usage, making them impractical for industrial deployment. To overcome these limitations, we present a paradigm shift that reorchestrates the SAST workflow from current LLM-assisted structure to a new LLM-centered workflow. We introduce Argus (Agentic and Retrieval-Augmented Guarding System), the first multi-agent framework designed specifically for vulnerability detection. Argus incorporates three key novelties: comprehensive supply chain analysis, collaborative multi-agent workflows, and the integration of state-of-the-art techniques such as Retrieval-Augmented Generation (RAG) and ReAct to minimize hallucinations and enhance reasoning. Extensive empirical evaluation demonstrates that Argus significantly outperforms existing methods by detecting a higher volume of true vulnerabilities while simultaneously reducing false positives and operational costs. Notably, Argus has identified several critical zero-day vulnerabilities with CVE assignments.

Abstract PDF Upgrade to Chat

Authors (10)

Summary

The paper introduces Argus, a novel multi-agent static analysis framework that integrates RAG and ReAct paradigms for enhanced vulnerability detection.
It employs supply chain analysis and a hybrid data flow extraction method (Re³) to uncover over 10× more true vulnerabilities than traditional SAST tools.
The approach reduces false positives and facilitates exploit verification through automated PoC generation and human-in-the-loop review.

Argus: Multi-Agent LLM-Centered Static Analysis for Full-Chain Vulnerability Detection

Motivation and Context

The increasing number and complexity of software vulnerabilities in modern codebases, including critical supply chain dependencies, have outpaced the detection capabilities of conventional rule-based static analysis systems. Traditional SAST tools such as CodeQL and Infer rely on handcrafted taint rules, which result in incomplete coverage, low recall for novel or system-specific flaws, and frequent false positives in realistic industrial settings. Recent proposals to augment SAST with LLMs typically treat LLMs as isolated experts without tightly integrating them into the end-to-end detection pipelines, which leads to shallow reasoning, hallucinated output, and inefficient token consumption.

Argus addresses these deficiencies by reorchestrating the SAST workflow around a collaborative multi-agent ensemble built on LLM primitives, emphasizing comprehensive supply chain analysis and advanced agentic techniques such as Retrieval-Augmented Generation (RAG) and ReAct. This agent-centric approach allows for deeper contextual reasoning, more precise sink and source extraction, and robust detection, including zero-day vulnerabilities.

Technical Overview of the Argus Framework

Argus's architecture is centered on two key components: RAG-enhanced supply chain sink analysis and the Re $^3$ data flow extraction (Retrieval, Recursion, Review). The workflow is depicted below.

Figure 1: Argus's workflow incorporates multi-agent LLM-centric reasoning, collaborative supply chain retrieval, PoC verification, CodeQL sink identification, and Re $^3$ data flow analysis, culminating in comprehensive vulnerability reporting.

Supply Chain Analysis and Sink Discovery

Unlike previous SAST frameworks that focus solely on codebase internals, Argus parses project management files to extract all dependency metadata, then systematically retrieves vulnerability records from authoritative sources (NVD, OSV, GHSA, Snyk) and community repositories. The retrieval is strengthened by evidence scoring derived from relevance, credibility, and content quality metrics.

Structured sink candidates are synthesized by the RAG agent:

Figure 2: RAG agent aggregates sink-related vulnerability information for precise candidate selection given target dependencies.

Proof-of-Concept Generation and Validation

Rather than static pattern matching, Argus employs a dedicated PoC agent under the ReAct paradigm to construct exploit scenarios, generate verification code, and produce repair suggestions for each vulnerability, ensuring the exploitability of sinks before flow mining.

Figure 3: PoC generation and verification confirm sink exploitability, supporting the correctness of detected vulnerabilities.

Data Flow Extraction via Re $^3$

For verified sinks, Argus initiates data flow extraction using a hybrid backward-forward search: CodeQL performs forward taint analysis, while unreachable sinks are recursively traced upstream, producing surrogate sink trees evaluated in a subsequent forward pass. Candidate flows are then subjected to a multi-step LLM review: end-to-end reachability, hop-by-hop validation of sanitization, and structured reporting.

Figure 4: Examples of vulnerable flows in DataGear discovered by Argus, illustrating complex taint propagation.

Empirical Evaluation

Argus was benchmarked against CodeQL and IRIS on seven robust Java codebases ranging from 100K to 800K lines each, including PublicCMS, JeecgBoot, Rouyi, JSPWiki, DataGear, Yudao-Cloud, and KeyCloak. For each repository, comprehensive supply chain analysis and multi-agent reasoning yielded substantially higher sink recall and true vulnerability counts with only modest increases in token consumption.

Figure 5: Trade-off analysis between token consumption and sink discovery across Argus backbones and target codebases.

Strong numerical results highlight Argus's superiority: baseline SAST tools typically detected zero vulnerabilities (e.g., CodeQL and IRIS failed in most cases), while Argus consistently discovered $>10\times$ more true vulnerabilities, including several assigned new CVE numbers. Ablation studies demonstrated that sinks uncovered via RAG contributed disproportionately to detected vulnerabilities compared to static approaches—community-derived sinks are empirically more valuable for practical vulnerability mining.

Case Studies and Attack Verification

Argus's unique multi-agent reasoning enabled recycling of outdated vulnerabilities, yielding exploitable flows even after patches, and leveraged incorrect sink candidates as seeds to inspire discovery of true, previously undetected sinks. For example, RAG agents retrieved a fixed DataGear CVE, but Argus's Re $^3$ workflow exposed new vulnerable data flows by reusing previous access paths. In another PublicCMS case, the initial suspected sinks were imprecise, but their upstream nodes eventually led to correct sink identification via recursion and review.

Figure 6: Screenshot of real exploit injected through zero-day vulnerability found by Argus in PublicCMS.

Reporting and Human-in-the-Loop Review

All detected vulnerabilities were formalized into structured prompts for LLM-based review and exported as detailed analysis reports, enabling transparent verification and actionable remediation.

Figure 7: Example prompt issued to the vulnerability review agent for structured, context-aware auditing.

Figure 8: Final vulnerability analysis report generated by Argus for comprehensive expert review.

Implications and Future Directions

Theoretical implications of Argus include the emergence of agentic LLM frameworks as primary orchestrators of static analysis tasks, capable of combining symbolically precise flows with deep contextual reasoning and supply chain data mining. Practically, Argus demonstrates that LLM-centered multi-agent systems are viable for production SAST deployments, outperforming prior monolithic prompting and symbolic analysis. The approach reduces false positives, operational costs, and enables mining of zero-day vulnerabilities, potentially transforming security audit workflows.

Future development avenues involve integrating Argus with dynamic analysis techniques (e.g., fuzzing), expanding agent fine-tuning through reinforcement learning with reward functions targeting sink discovery accuracy and data flow completeness, and augmenting agent collaboration for even deeper supply chain context. Additional opportunities exist in automating human-in-the-loop workflows, refining PoC generation at scale, and adapting the framework to languages beyond Java.

Conclusion

Argus establishes a novel LLM-agent-centered paradigm for end-to-end static vulnerability detection—including supply chain analysis, collaborative multi-agent workflows, and advanced retrieval and reasoning strategies. Empirical evidence indicates substantial gains in vulnerability recall, practical exploit detection, and reduction of false positives compared to existing symbolic and LLM-assisted methods. The integration of Argus within industrial SAST tools promises to enhance software security postures, foster more comprehensive supply chain coverage, and facilitate scalable vulnerability audits with transparent reporting and remediation capabilities.

Markdown Report Issue