MAPTA System: Autonomous Web Security
- MAPTA System is a distributed multi-agent architecture that enables autonomous web security assessments using LLM-guided orchestration and sandboxed execution.
- It achieves high vulnerability detection rates on benchmarks (e.g., 76.9% overall, 100% for SSRF) while monitoring cost and resource usage for practical scanning.
- The system has demonstrated real-world impact by uncovering critical vulnerabilities in open-source web applications with minimal operational costs and validated exploit reporting.
MAPTA System refers to a distributed multi-agent architecture purpose-built for autonomous web application security assessment. It combines LLM-guided orchestration with tool-grounded attack execution and end-to-end exploit validation, and is designed to address the scalability crisis in security auditing introduced by rapid adoption of AI-powered development platforms. On established benchmarks, MAPTA demonstrates high vulnerability detection rates, cost-effective operation, and impactful results in open-source penetration testing.
1. System Architecture and Components
MAPTA’s architecture consists of three principal agent roles organized to maximize separation of strategic reasoning, tactical execution, and autonomous validation:
- Coordinator Agent: The strategic core. Its responsibilities include planning attack strategies, reasoning about application behaviors, orchestrating the scan workflow, and deciding whether to invoke tools directly (e.g., via
run_command
orrun_python
) or to offload subtasks to sandbox agents (sandbox_agent
tool). Upon candidate vulnerability discovery, it synthesizes PoC reports for validation. - Sandbox Agents: Tactical executors within a shared per-assessment Docker container. These agents re-use state (tools, credentials, enumeration data) and conduct execution isolated from the Coordinator context, avoiding context bloat. Multiple sandbox agents may operate in parallel when workload partitioning benefits coverage.
- Validation Agent: Dedicated to confirming candidate findings. It transforms PoC artifacts (scripts, payloads, HTTP sequences) into concrete exploits and performs direct verification inside the Docker container, significantly reducing false positives before vulnerabilities are reported externally.
A core feature is the shared Docker container per-assessment, which allows agents to maintain state (such as credentials and discovered artifacts) across multiple steps, yet preserves statelessness in the individual LLM agent contexts.
2. Performance Metrics and Benchmarking
MAPTA was evaluated on the 104-scenario XBOW benchmark, covering a spectrum of web vulnerability classes. Reported performance metrics include:
- Overall Success Rate: 76.9%
- SSRF and Misconfiguration: 100% success
- Broken Authorization: 83% success
- Injection Attacks: SSTI at 85%; SQL Injection at 83%; Command Injection at 75%
- XSS: 57% success
- Blind SQL Injection: 0% success
Operational metrics tracked include average tool calls per challenge (25.1), median solution time (143.2 seconds), and detailed token-level resource consumption. Notably, success rates correlate negatively with resource usage (tool calls, cost), enabling practical early-stopping thresholds (e.g., 40 tool calls or $0.30 cost per challenge). Cross-site scripting and blind SQL injection are specifically identified as areas of challenge.
3. Cost Analysis and Resource Efficiency
MAPTA provides explicit accounting for operational costs across all benchmarks:
- Total Cost: $21.38 over 104 challenges
- Median Cost per Successful Attempt: $0.073
- Median Cost per Failed Attempt: $0.357
- Overall Average per Challenge: $0.206
Token-level cost factors are tracked: $1.25 per 1M input tokens,$10.00 per 1M output tokens, $0.125 per 1M cached tokens. Output tokens are the main cost driver. Early-stopping heuristics follow strong negative correlations (Pearson –0.661 between tool calls and success) and are recommended for controlling scan resources. A compact cost formula used:whereare the cost factors, and are token counts.
4. Real-World Impact and Vulnerability Discovery
MAPTA’s practical significance is demonstrated through assessments of 10 popular open-source web applications (with GitHub stars ranging from 8K to 70K):
- Vulnerability Discovery Rate: 19 unique findings across 6 applications (60% app coverage)
- Severity Distribution: ~74% high/critical, rest medium/low
- Critical Vulnerabilities Discovered:
- Remote code executions (RCE) via database export functions
- Client-side secret exposure through JavaScript configuration endpoints
- Arbitrary file write due to client-controlled parameters
- SSRF from unauthenticated email relay
- Command injection via database operations
Average operational cost per real-world scan was ~$3.67. Responsible disclosure procedures were followed, with ten findings under CVE review at publication time.
5. Technical Approach and Tool Integration
MAPTA’s multi-agent system is grounded in:
- LLM Orchestration: Distinct agents with bounded action spaces and tool sets, ensuring modularity and avoid prompt contamination.
- Tool-grounded execution: Direct invocation of OS commands and Python scripts for system-level and language-level attacks. Orchestration loop includes hypothesis generation, targeted tool calls, PoC synthesis, and exploit validation.
- Sandboxed Assessment: Every agent shares a dedicated Docker container per job for reusable state, but maintains cognitive isolation to optimize context length and prevent prompt dilution.
- Validated Exploit Reporting: Only successful, repeatable exploits are reported, eliminating false positives common in static or purely LLM-based approaches.
Resource usage (tokens, tool calls, time) is tracked in real time to facilitate cost/performance tradeoffs and enable dynamic early stopping.
6. Limitations and Future Directions
Current weaknesses include low success on blind SQL injection and moderate performance on XSS. The authors highlight several lines for further work:
- Improvement of payload generation strategies for challenging vulnerability classes
- Integration of advanced strategies (timing, DOM-based, network-level attacks)
- Enhanced orchestration logic for real-time resource-based early-stopping
- Expansion to additional vulnerability types beyond current HTTP-centric scope
- Canary marker integration for business logic validation
- Open-source release and inviting community contributions for broader evaluation and enhancement
MAPTA represents a comprehensive, resource-aware, multi-agent framework for automated web penetration testing, balancing LLM-guided attack planning, isolated tool execution, and strict real-world validation to achieve robust vulnerability detection and practical cost control in both benchmark and open-source ecosystems (David et al., 28 Aug 2025).