Cross-Domain Leakage in System Boundaries
- Cross-domain leakage is the unintended propagation of sensitive data across system boundaries such as language, protocol, and organization, compromising privacy.
- It is quantified with metrics like Mean Reciprocal Rank, retrieval pivot risk, and differential privacy measures to assess leakage severity.
- Mitigation strategies include fine-tuned defense mechanisms, protocol-aware safeguards, and noise injection to prevent unauthorized data reconstruction.
Cross-domain leakage refers to the unintended propagation, disclosure, or inferability of sensitive information when a system crosses a boundary between domains. In recent work, that boundary may be between training language and query language, local tool state and model context, one tenant and another in hybrid retrieval, one collaborator’s dataset and another’s model access, one recommendation domain and another, or one hospital site and another (Dong et al., 1 Jun 2025, Yan et al., 19 Jun 2026, Thornton, 9 Feb 2026, Zhang et al., 2020, Wang et al., 2024, Bouaziz et al., 31 Mar 2026). The term therefore designates a family of leakage phenomena defined less by a single modality than by the fact that information survives or is reconstructed after crossing a boundary that the system designer may have treated as protective.
1. Definitions and boundary types
Within multilingual LLM privacy research, cross-domain leakage is made explicit by treating cross-lingual privacy leakage as a case in which fine-tuning occurs on an English-only private dataset, while the adversary issues semantically equivalent queries in another language and still elicits English PII or its translation (Dong et al., 1 Jun 2025). In MCP servers, the same term denotes unintended propagation of sensitive local state such as credentials, API keys, and PII across the local/LLM boundary into the model’s context, even when the server source code contains no explicit outbound request (Yan et al., 19 Jun 2026). In secure multi-party machine learning, it denotes a party’s ability to infer population-level properties of another party’s private dataset from black-box access to the jointly trained model (Zhang et al., 2020).
A related usage appears in systems that combine components thought to be safe in isolation. In hybrid RAG, a vector-retrieved seed chunk can pivot through entity links into sensitive graph neighborhoods, producing cross-tenant leakage that does not occur in vector-only retrieval (Thornton, 9 Feb 2026). In cross-domain recommendation, leakage occurs when embeddings exchanged between a source domain and a target domain allow inference of user interactions beyond what is intended (Wang et al., 2024). In deep transfer learning, the source and target datasets often belong to different organizations, and leakage can arise through disclosed weights, hidden representations, or shared gradients (Chen et al., 2020).
| Boundary | Leakage object | Representative setting |
|---|---|---|
| Language boundary | PII tokens or memorized private sequences | Cross-lingual LLM querying |
| Local/LLM protocol boundary | Credentials, API keys, PII | MCP tool handlers |
| Organizational boundary | Dataset properties, membership, batch properties | MPC and deep transfer learning |
| Retrieval boundary | Unauthorized tenant or sensitivity-level items | Hybrid RAG |
| Channel boundary | Forbidden vault fields in internal messages or memory | Multi-agent LLM systems |
| Site/domain boundary | Site identity or user interaction traces | Cross-hospital transfer, CDR |
Taken together, these definitions indicate that “domain” is operational rather than purely semantic: it may refer to language, modality, tenant, site, protocol stage, collaborator, or model-internal communication channel.
2. Formalizations and measurement
The literature formalizes cross-domain leakage with metrics matched to the object being protected. In cross-lingual LLM leakage, a private sequence is denoted by , and leakage under a query is measured by token-level Mean Reciprocal Rank:
A higher MRR indicates stronger memorization of and thus greater privacy leakage (Dong et al., 1 Jun 2025).
In multi-agent LLM systems, leakage is defined relative to a vault of sensitive fields and an allowed set . A channel leak on occurs if there exists a forbidden field whose semantic similarity to channel content exceeds a calibrated threshold , with in the reported experiments. System-level exposure is then aggregated by
0
and for the observed channels 1 by
2
(Yagoubi et al., 12 Feb 2026).
Hybrid RAG introduces metrics tied to authorization failure. Retrieval Pivot Risk is
3
where 4 holds if 5 or 6. Leakage magnitude is counted by 7, and traversal structure by Pivot Depth, the minimum graph-hop distance from any vector seed to the first leaked chunk (Thornton, 9 Feb 2026).
In domain-adapted ASR, leakage is defined at aligned word positions where the model outputs a private context word 8 instead of the true acoustic word 9:
0
This definition measures unintended disclosure of context or training words in transcription output (Züfle et al., 27 May 2026).
Cross-domain recommendation instead frames protection through 1-Local Differential Privacy. A mechanism 2 that outputs a shared embedding 3 preserves 4-LDP if
5
for adjacent user datasets 6 and 7 differing in one interaction (Wang et al., 2024). The shift from ranking metrics to authorization metrics, substitution rates, and LDP guarantees suggests that cross-domain leakage is measured at the level where the boundary is enforced.
3. Leakage channels and propagation mechanisms
A recurrent finding is that leakage is often not localized to a single output surface. In cross-lingual LLMs, layer-wise Logit Lens analysis identifies three phases: an “encoding” phase with 8, a “shared conceptual” phase in which MRR rises similarly across all languages, and a “language-specific decoding” phase in which MRR diverges. Cosine similarity between 9 and 0 peaks near the transition into the shared cross-lingual representation, and the paper accordingly distinguishes Privacy-Universal Neurons from Language-Specific Privacy Neurons (Dong et al., 1 Jun 2025).
In MCP servers, leakage is protocol-induced. Whatever a handler returns, logs, or raises becomes part of the protocol-conformant response, creating an “invisible pipe” from handler output to serialized JSON to LLM context. The implicit sinks are any return expression inside a @mcp.tool-decorated function, logging calls inside a handler, and unhandled exception payloads. MCPPrivacyDetector operationalizes this with a unified cross-language program representation, semantic filtering, and taint analysis over relations such as assignFlow, paramFlow, and returnFlow (Yan et al., 19 Jun 2026).
Multi-agent systems exhibit a related internal-channel problem. AgentLeak distinguishes external output 1, inter-agent messages 2, tool arguments 3, and shared memory 4, and organizes attacks into six families totaling 32 classes. The framework’s emphasis is that sensitive fields move through coordination pathways that conventional output-only audits do not inspect (Yagoubi et al., 12 Feb 2026).
Hybrid RAG reveals a boundary failure at composition time. A vector-retrieved chunk mentions entities that serve as graph pivots into unauthorized neighborhoods, and in the bipartite chunk-entity graph studied in the paper, any cross-tenant leakage path has minimal Pivot Depth exactly 5: authorized chunk 6 shared entity 7 unauthorized chunk (Thornton, 9 Feb 2026). Domain-adapted ASR shows a different but structurally similar mechanism: contextual biasing through prompts or LoRA lowers the barrier for context words to override acoustic evidence, so a phonetically similar private word may be transcribed even when another word is spoken (Züfle et al., 27 May 2026).
Network-level leakage in local research agents moves the boundary further outward. Passive adversaries such as ISPs observe only domain names or IP addresses, packet timings, and sizes, yet WRAs visit 8–9 domains with distinguishable timing correlations, creating uniquely fingerprintable bursts that support prompt recovery and trait inference across sessions (Jeong et al., 27 Aug 2025). Deep transfer learning exposes additional channels: model-based transfer reveals weights, mapping-based transfer reveals per-sample hidden representations, and parameter-based transfer reveals gradients or shared updates (Chen et al., 2020).
4. Empirical manifestations
The empirical record spans direct disclosure, statistical inference, benchmark contamination, and hidden-channel exfiltration. Some studies measure leakage as explicit retrieval of unauthorized content, while others show that evaluation itself can be corrupted by domain spillover or duplicate contamination.
| Setting | Quantitative finding | Citation |
|---|---|---|
| Cross-lingual LLM privacy | Average cross-lingual MRR drops from 0, 1, and 2 under MPNC; peak per-language reductions reach 3; Valid-PPL increases by 4 point on average | (Dong et al., 1 Jun 2025) |
| MCP servers | Leakage rate is 5 overall and 6 among servers handling privacy-related data; Java is 7, Python 8, and JavaScript/Go/TypeScript 9–0 | (Yan et al., 19 Jun 2026) |
| Multi-agent LLM systems | In multi-agent configurations, 1, 2, and total exposure across 3–4–5 is 6; output-only audits miss 7 of violations | (Yagoubi et al., 12 Feb 2026) |
| Hybrid RAG | In the synthetic corpus, undefended hybrid retrieval has 8 with mean Leakage@k 9 for benign and 0 for adversarial queries; in Enron, 1; PD is uniformly 2 | (Thornton, 9 Feb 2026) |
| Cross-modal retrieval benchmark | On SoundDesc full test 3, CE gives 4 on the original training split and 5 after deduplication; on the duplicates-only subset, 6 drops from 7 to 8 | (Weck et al., 2023) |
| Secure multi-party ML | On Adult with Income as 9 and 4 output classes, black-box attack accuracy is 0 with 1 in training and 2 without 3; 5-way fine-grained inference reaches 4–5 | (Zhang et al., 2020) |
| Domain-adapted ASR | Prompt-only leakage rises from 6 without context to 7 for a word prompt, 8 for 1 sentence, 9 for 5 sentences, and 0 for 10 sentences; combined fine-tuning plus prompt reaches 1, 2, and 3–4 | (Züfle et al., 27 May 2026) |
| Local research agents | The prompt-recovery attack recovers over 5 of the functional and domain knowledge of prompts, and multi-session inference recovers up to 6 of 7 latent traits; mitigation reduces attack effectiveness by an average of 8 | (Jeong et al., 27 Aug 2025) |
Feature-level site leakage in cross-hospital chest X-ray transfer adds a cautionary measurement result. Multi-site SSL improves RSNA AUC from 9 with ImageNet initialization to 0, while adversarial site confusion reduces probe accuracy on frozen backbone features from 1 to 2 and on projection features from 3 to 4 (Bouaziz et al., 31 Mar 2026). This demonstrates that lowered measured leakage and improved transfer are not identical outcomes.
5. Mitigation strategies
Mitigation methods differ according to whether the leakage channel is representational, protocol-level, retrieval-level, or architectural. In cross-lingual LLMs, Multilingual Privacy Neuron Control identifies privacy-relevant neurons by integrated-gradients attribution, constructs 5 and 6 from attribution frequency thresholds 7 and 8, and zeroes those activations during the forward pass without retraining or modifying weights (Dong et al., 1 Jun 2025).
Protocol-aware defenses are central in MCP. MCPPrivacyDetector lifts Python, JavaScript/TypeScript, Go, Java, and other MCP server code into a unified representation with CodeQL, filters false positives such as len(...), hex(...), repr(...), and str(...), and performs taint analysis to enumerate feasible source-to-sink flows. Recommended safeguards include tool-level output sanitization, fine-grained logging policies, typed secret annotations, runtime taint tracking, and registry-level vetting or CI hooks (Yan et al., 19 Jun 2026).
In hybrid RAG, the core defense is placed at a single boundary. Defense D1 performs a per-hop authorization check at graph expansion, removing nodes whose tenant differs from the user’s tenant or whose sensitivity exceeds the user’s clearance. On both the synthetic corpus and Enron, this drives RPR to 9 and Leakage@k to 00 with latency overhead 01 ms in the synthetic setting; additional defenses D2–D5 reduce context size further but are not necessary for security once D1 is in place (Thornton, 9 Feb 2026).
Agentic systems require channel-specific enforcement rather than output-only filtering. AgentLeak recommends framework-level intercept-and-sanitize hooks for 02 and 03, selective disclosure policies grounded in the allowed set, full-channel auditing across 04–05, privacy-aware coordination protocols, and Pareto calibration of defense–utility trade-offs. Its prototype interceptor reduces internal leaks from 06 to 07 at a 08 percentage-point TSR cost (Yagoubi et al., 12 Feb 2026).
Other domains emphasize minimizing shared signal before it crosses the boundary. In domain-adapted ASR, fine-tuning without context prompts yields near-zero leakage, while a prompt-level mitigation that includes both the context word and the acoustic word reduces prompt-only leakage to 09 and combined leakage to 10 (Züfle et al., 27 May 2026). In cross-domain recommendation, P2M2-CDR disentangles domain-common and domain-specific factors, perturbs all exchanged embeddings with Laplace noise to achieve 11-LDP, and uses domain-inter and domain-intra contrastive losses; the reported experiments show up to 12 relative gain in HR@10 over the strongest CDR baselines while adding privacy protection (Wang et al., 2024). In deep transfer learning, proposed defenses include DP-SGD, SGLD, adversarial regularization, and encryption; on the industry Marketing set, SGLD changes target AUC from 13 to 14 while lowering Attack-AUC from 15 to 16 (Chen et al., 2020). For local research agents, blocking uniquely identifying domains or obfuscating traces with decoy prompts reduces attack effectiveness by about 17, although the work states that only strong network-level protections such as VPNs or anonymous routing can fully close the metadata channel (Jeong et al., 27 Aug 2025).
6. Interpretation, limitations, and open directions
Several studies argue that measuring leakage changes how system behavior should be interpreted. In cross-hospital chest X-ray transfer, multi-site SSL is the main driver of improved RSNA transfer, whereas adversarial site confusion lowers measured site leakage but does not reliably improve AUC and increases variance (Bouaziz et al., 31 Mar 2026). In deep transfer learning, parameter-based transfer leaks less than raw features, yet model weights, shared hidden representations, and gradients all remain viable channels under the paper’s threat models (Chen et al., 2020).
The secure multi-party setting illustrates a further limitation of standard privacy intuitions. Record-level differential privacy protects individuals but does not prevent leakage of global statistics, whereas group-DP that protects an entire party’s dataset would require noise proportional to 18 and therefore lead to prohibitively low utility (Zhang et al., 2020). In parallel, hybrid RAG shows that two individually secure retrieval components can compose into an insecure pipeline if authorization is not re-checked at the vector-to-graph transition (Thornton, 9 Feb 2026), and MCP servers show that leakage can be “implicit” in protocol semantics rather than explicit in code paths (Yan et al., 19 Jun 2026).
The cross-lingual privacy-neuron study further states that the two-stage process of first encoding private information in a shared latent space and then performing domain-specific decoding likely extends to other domain shifts, including topic domains and modalities (Dong et al., 1 Jun 2025). A plausible implication is that cross-domain leakage is often best understood as a boundary-placement problem: the decisive failure may occur neither at training time nor at the final user-visible output, but at an intermediate layer, a protocol handoff, a graph-expansion step, an inter-agent message, or even a network trace. The strongest common lesson across these literatures is therefore not a single universal defense, but the need to identify where domain crossing actually occurs and to measure leakage at that boundary rather than assuming that conventional output monitoring, component isolation, or cryptographic non-disclosure suffices.