- The paper presents MCPPrivacyDetector, a framework that systematically detects protocol-induced privacy leakage in MCP servers using cross-language static analysis.
- It employs unified code abstraction, context-aware semantic filtering, and interprocedural taint tracking to identify sensitive data flows across diverse programming environments.
- Empirical results reveal over 10% leakage prevalence with peaks in Java and Python deployments, highlighting urgent needs for improved privacy safeguards.
Detecting Privacy Leakage Risks in MCP Servers: Analysis of the MCPPrivacyDetector Framework
Introduction
The standardization of the Model Context Protocol (MCP) has fundamentally altered the design and integration of LLM-powered autonomous agents with external toolchains, data sources, and runtime resources. The protocol's transparent interface between agent logic and local/remote capabilities introduces a unique class of privacy leakage risks, characterized by protocol-induced exfiltration rooted in MCP's event-driven and cross-language abstraction. The paper "What Happens Locally, Leaks Globally": Detecting Privacy Leakage Risks in MCP Servers (2606.21338) presents a systematic, context-aware methodology for detecting privacy leakages in heterogeneous MCP deployments, implemented as the MCPPrivacyDetector static analysis framework.
The MCP Protocol and Privacy Risk Vectors
MCP operates via a host-client-server architecture wherein LLM agents invoke registered tool handlers, which execute logic within the local context and serialize responses as model context. The execution workflow—spanning initialization, capability registration, remote invocation, and context propagation—formalizes the boundary crossing of data between the local server and remote LLM context.
Figure 1: Sequential depiction of data and control flow during MCP server execution.
This architectural abstraction increases the risk of implicit privacy leakage through the mechanism of context propagation. Since serialized return values, error messages, or logging outputs can include credentials, API keys, or PII, these artifacts are unintentionally injected into the LLM's context, potentially exposed in downstream LLM outputs, caches, or external logging systems.
Mechanistically, privacy leakage in MCP servers is realized via:
- Local plaintext leakage: Sensitive artifacts persist in logs or stdout accessible to unintended parties.
- Implicit cross-boundary propagation: Sensitive data serialized by tool handlers are forwarded to LLMs, crossing organizational and operational trust boundaries without explicit outbound logic.
MCPPrivacyDetector: Cross-Language Privacy Leakage Detection
The detection of privacy risks in MCP deployments presents three primary technical challenges: language heterogeneity, context-dependent sensitivity, and propagation of sensitive artifacts via implicit flows. MCPPrivacyDetector addresses these via a multi-stage pipeline encompassing unified program representation, context-aware semantic analysis, and interprocedural taint tracking.
Figure 2: Architectural overview of MCPPrivacyDetector, detailing unified cross-language representation, context-aware semantic filtering, and taint propagation analysis.
Unified Cross-Language Abstraction
MCPPrivacyDetector employs CodeQL as the analysis substrate, parsing heterogeneous codebases (Python, Go, Java, JavaScript, TypeScript) into a relational structure. Key program entities (identifiers, attribute accesses, call nodes, assignments, returns) are standardized, abstracting away language-specific semantics while preserving the fidelity needed for precise information flow analysis.
Context-Aware Rule Matching and Filtering
The framework initiates leakage discovery through rule-based extraction of candidate sensitive entities and program sinks. To minimize false positives, contextual semantic filtering is employed, leveraging characteristics such as identifier usage, type analysis, and protocol-specific sink identification. This context-aware phase distinguishes genuine secrets (e.g., authentication secrets, API keys, PII) from benign identifiers or values.
Interprocedural Taint Analysis
MCPPrivacyDetector applies forward reachability analysis over the program entity graph to determine feasible source-to-sink paths. The taint analysis considers direct assignment, argument propagation, and attribute-based flows, mapping protocol-relevant sources (hardcoded credentials, config file reads, sensitive environment loads) to protocol and logging sinks (e.g., @mcp.tool handlers, print, log, outbound HTTP requests).
Empirical Results and Ecosystem Analysis
Application of MCPPrivacyDetector to a comprehensive corpus of 10,655 MCP servers yields a set of robust, quantitatively significant results.
- Prevalence: Over 10% of MCP servers, across all supported languages, exhibit detectable privacy-leakage paths.
- Language Disparity: Java servers peak with a 19.1% leakage rate, followed by Python (15.4%); all languages studied exceed 10%.

Figure 3: Distribution of MCP server privacy leakage rates across major programming languages.
Manual validation on a random subsample of 200 flagged servers yielded a 4% false positive rate and no observed false negatives, attesting to the framework's detection accuracy.
Case Analyses
Representative real-world cases elucidate both attack surface vectors and systemic developer oversights. They include:
- Debug logging of Bearer tokens, facilitating direct credential replay.
- Propagation of sensitive API keys into outbound HTTP headers, exposing them to external infrastructure.
- Printing of authentication secrets to stdout, enabling leakage via terminal or log aggregation systems.
These cases confirm that the detected leakages are not theoretical but manifest direct exploitability and protocol-level exfiltration risk.
Implications and Future Directions
The results indicate a systemic disconnect between protocol-level data handling and the security semantics enforced by conventional code review and static analysis. Unlike classic exfiltration bugs requiring explicit outbound code, MCP-induced leakage is a byproduct of protocol machinery, developer practices, and architectural abstraction.
In practical terms, the findings necessitate:
- Broader adoption of context-aware, protocol-specific static analysis for all MCP services, both open and private.
- Automated sanitization and auditing mechanisms at serialization boundaries and prior to model context injections.
- Language-agnostic best practices for tool handler design, emphasizing least privilege and output filtering.
From a theoretical perspective, the prevalence of protocol-induced side-channels in MCP suggests richer models for LLM agent security must incorporate semantic representations of implicit cross-boundary propagation. Enhanced compositional reasoning and cross-context taint semantics remain promising research directions.
Conclusion
The study establishes MCPPrivacyDetector as an effective tool for systematic, protocol-aware static analysis of privacy leakage in MCP server ecosystems. Strong empirical evidence indicates that privacy risks are widespread, concentrated, and largely protocol-induced, with leakage paths often hidden from explicit developer intent. The framework's results underscore the necessity for ecosystem-wide integration of context- and protocol-aware analysis methodologies to safeguard against systemic privacy leakage in LLM-centric agent architectures.
(2606.21338)