PentestMCP: A Toolkit for Agentic Penetration Testing (2510.03610v1)

Published 4 Oct 2025 in cs.CR and cs.AI

Abstract: Agentic AI is transforming security by automating many tasks being performed manually. While initial agentic approaches employed a monolithic architecture, the Model-Context-Protocol has now enabled a remote-procedure call (RPC) paradigm to agentic applications, allowing for the flexible construction and composition of multi-function agents. This paper describes PentestMCP, a library of MCP server implementations that support agentic penetration testing. By supporting common penetration testing tasks such as network scanning, resource enumeration, service fingerprinting, vulnerability scanning, exploitation, and post-exploitation, PentestMCP allows a developer to customize multi-agent workflows for performing penetration tests.

Summary

The paper introduces the MCP standard for decoupling agents from tools, enabling the dynamic integration of new resources during penetration testing.
It details a multi-agent workflow that automates network scanning, resource enumeration, and vulnerability exploitation using agentic AI.
Case studies on CVE-2017-5638 and CVE-2017-0144 demonstrate PentestMCP's robust capability to execute complex penetration tests autonomously.

PentestMCP: A Toolkit for Agentic Penetration Testing

Introduction

The field of cybersecurity is rapidly evolving with the integration of Agentic AI, which automates a plethora of traditionally manual tasks such as penetration testing, vulnerability discovery, and exploitation. The paper "PentestMCP: A Toolkit for Agentic Penetration Testing" introduces the Model-Context-Protocol (MCP) standard that transforms these processes by decoupling agents from the tools they use, promoting a Remote-Procedure Call (RPC) paradigm that accommodates dynamic incorporation of new tools and knowledge bases at runtime. PentestMCP, the focal point of this discussion, is a library of MCP server implementations that supports myriad penetration testing tasks including network scanning, exploitation, and post-exploitation. This framework facilitates the customization of multi-agent workflows, adding versatility and efficiency to cybersecurity operations.

PentestMCP Architecture

PentestMCP leverages a flexible playbook approach, emulating the cyber "kill-chain" methodology with the following distinct tasks:

Network Scanning and Fingerprinting: Initial identification of live hosts and running services.
Resource Enumeration and Discovery: Gathering sensitive data for subsequent targeting.
Vulnerability Scanning: Identifying security flaws within software and configurations.
Exploitation and Post-Exploitation: Gaining unauthorized access and executing further actions.

Each MCP server offers specialized tool calls that agents can engage with, ranging from traditional CLI integration to RPC-accessible services like Metasploit [metasploit_rpc].

Case Studies: CVE-2017-5638 and CVE-2017-0144

CVE-2017-5638

One significant demonstration involved exploiting the Apache Struts vulnerability CVE-2017-5638, which notably precipitated the Equifax breach of 2017. Utilizing PentestMCP, a configured agent executed a comprehensive test, beginning with network scanning to identify vulnerable services, culminating in deploying an exploitation framework that enabled remote code execution and sensitive data extraction from the target system.

Figure 1: CVE-2017-5638 setup

CVE-2017-0144 (Eternal Blue)

Similarly, PentestMCP was employed to demonstrate the exploitation of CVE-2017-0144 (Eternal Blue), made infamous by the WannaCry and NotPetya malware campaigns [cve-2017-0144]. This exercise validated PentestMCP's versatility in identifying System Message Block (SMB) vulnerabilities, followed by executing exploits to achieve arbitrary code execution on affected systems, thus illustrating the agent's capability to perform complex penetration testing workflows autonomously.

Figure 2: CVE-2017-0144 setup

Evaluation and Implications

The practical applications of PentestMCP extend far beyond theoretical discussions, showcasing its ability to automate intricate penetration testing operations efficiently. Initial evaluations with diverse AI models, such as OpenAI's GPT-5 and Claude's Opus 4, revealed variability in performance due to model differences, but consistently demonstrated the framework's robustness in managing and executing complex testing tasks.

The deployment of PentestMCP marks a significant advancement towards integrating AI-driven automation in cybersecurity, providing a powerful toolkit for developers and security professionals alike. With continuous refinement and expanded tool availability, PentestMCP has the potential to significantly enhance the efficacy of penetration testing processes, introducing a new era of adaptive, intelligent security operations.

Conclusion

The development of PentestMCP signifies a pivotal shift in penetration testing methodologies, emphasizing the integration of flexible, adaptable AI-driven frameworks in cybersecurity. As Agentic AI continues to evolve, frameworks like PentestMCP will likely play a crucial role in automating and augmenting security operations, offering increased protection against evolving digital threats. The future of cybersecurity will inevitably be shaped by such innovations, as the integration of AI becomes ever more pervasive in combating complex, real-world security challenges.