Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 166 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration (2509.14284v1)

Published 16 Sep 2025 in cs.CR, cs.AI, and cs.CL

Abstract: As LLMs become integral to multi-agent systems, new privacy risks emerge that extend beyond memorization, direct inference, or single-turn evaluations. In particular, seemingly innocuous responses, when composed across interactions, can cumulatively enable adversaries to recover sensitive information, a phenomenon we term compositional privacy leakage. We present the first systematic study of such compositional privacy leaks and possible mitigation methods in multi-agent LLM systems. First, we develop a framework that models how auxiliary knowledge and agent interactions jointly amplify privacy risks, even when each response is benign in isolation. Next, to mitigate this, we propose and evaluate two defense strategies: (1) Theory-of-Mind defense (ToM), where defender agents infer a questioner's intent by anticipating how their outputs may be exploited by adversaries, and (2) Collaborative Consensus Defense (CoDef), where responder agents collaborate with peers who vote based on a shared aggregated state to restrict sensitive information spread. Crucially, we balance our evaluation across compositions that expose sensitive information and compositions that yield benign inferences. Our experiments quantify how these defense strategies differ in balancing the privacy-utility trade-off. We find that while chain-of-thought alone offers limited protection to leakage (~39% sensitive blocking rate), our ToM defense substantially improves sensitive query blocking (up to 97%) but can reduce benign task success. CoDef achieves the best balance, yielding the highest Balanced Outcome (79.8%), highlighting the benefit of combining explicit reasoning with defender collaboration. Together, our results expose a new class of risks in collaborative LLM deployments and provide actionable insights for designing safeguards against compositional, context-driven privacy leakage.

Summary

  • The paper demonstrates the emergence of compositional privacy leakage where adversaries aggregate benign responses to infer sensitive data.
  • It evaluates defense strategies, showing that Theory-of-Mind improves sensitive blocking at the cost of benign utility while CoDef offers a superior privacy-utility trade-off.
  • Experiments with models like GPT-5 in controlled, multi-agent scenarios reveal that coordinated defenses are essential for mitigating emergent privacy risks.

Compositional Privacy Leakage in Multi-Agent LLM Systems: Risks and Defenses

Introduction and Problem Formulation

This work systematically investigates a novel privacy risk in multi-agent LLM deployments: compositional privacy leakage. Unlike traditional memorization or direct inference attacks, compositional leakage arises when an adversary aggregates individually innocuous outputs from multiple agents to infer sensitive information that is not explicitly revealed by any single agent. This risk is particularly salient in distributed, modular LLM-based systems where agents are assigned specialized roles and possess disjoint, non-sensitive data fragments. Figure 1

Figure 1: Illustration of how individually innocuous information shared across multiple agents can be aggregated by an adversary to infer sensitive or private data not explicitly revealed by any single agent, highlighting the emergent privacy risks in collaborative multi-agent LLM settings.

The paper formalizes this threat model by considering a set of defender agents, each with a unique, non-overlapping subset of structured data. The adversary, with black-box query access, issues sequential queries to these agents, aiming to reconstruct a global sensitive attribute ss^* that is only inferable through the composition of multiple responses. The framework abstracts away from LLM-specific parameters, focusing on inference-time privacy threats in realistic, siloed multi-agent environments.

Evaluation Framework and Metrics

To quantify compositional leakage, the authors construct a controlled evaluation framework with synthetic multi-agent scenarios. Each scenario is designed such that the sensitive attribute ss^* is derivable only through multi-step adversarial composition, not from any single agent's data. The adversary is provided with an optimal plan PP^*, ensuring that leakage is not limited by planning ability but by the information flow and defense mechanisms.

Leakage is measured via two primary metrics:

  • Leakage Accuracy: Whether the adversary's final prediction matches the ground truth sensitive attribute.
  • Plan Execution Success: Whether all intermediate steps in the adversary's plan are successfully executed, distinguishing between failures in information elicitation and failures in reasoning or composition.

The interaction is modeled as a POMDP, capturing the partial observability of both adversary and defenders, and the sequential, interactive nature of the attack.

Defense Mechanisms

The paper proposes and evaluates two principal defense strategies:

  1. Theory-of-Mind (ToM) Defense: Each defender agent simulates the adversary's knowledge state and intent, blocking queries that appear to enable sensitive inferences through adversarial aggregation. This approach requires explicit modeling of adversarial goals and the evolution of the adversary's knowledge.
  2. Collaborative Consensus Defense (CoDef): Defender agents share their local query-response histories and vote on whether a query is safe to answer. A single defender's decision to block is sufficient to deny the query, enabling collective mitigation of compositional leakage while preserving benign functionality. Figure 2

    Figure 2: Overview of the defense mechanisms tested, highlighting differences in information flow and response strategies.

Additional baselines include Chain-of-Thought (CoT) prompting, CoT with sensitive set awareness, and self-voting (multiple samples from a single defender's policy).

Experimental Results

Experiments are conducted across 119 adversarial and benign scenarios using Qwen3-32B, Gemini-2.5-pro, and GPT-5 as defender models. The results reveal several key findings:

  • CoT Baselines Are Insufficient: Simple CoT prompting blocks only 31–39% of sensitive queries for Qwen3-32B and Gemini-2.5-pro, while maintaining high benign success (64–76%). This demonstrates that step-by-step reasoning alone does not reliably prevent compositional leakage.
  • ToM Defense Substantially Improves Sensitive Blocking: ToM raises sensitive blocking rates to 89–97% across models (e.g., Qwen3-32B: 88.8%, Gemini-2.5-pro: 96.6%, GPT-5: 94.2%). However, this comes at a significant cost to benign utility, with benign success dropping to 52.9–61.6%.
  • CoDef Achieves the Best Privacy-Utility Trade-off: CoDef blocks 86–90% of sensitive queries while maintaining higher benign success (66.3–70.2%), yielding the highest balanced outcome (up to 79.8% for Gemini-2.5-pro). This demonstrates the benefit of collaborative, ensemble-style defense.
  • Model Capacity Matters: Larger models (GPT-5) achieve higher sensitive blocking even with simpler strategies, suggesting that model capacity and reasoning ability influence defense effectiveness.

Analysis of Reasoning Depth and Defense Efficacy

The paper analyzes the relationship between defender reasoning depth and compositional privacy mitigation. Reasoning depth is categorized from direct answers/refusals (Depth 0) to multi-step, cross-agent reasoning (Depth 3+). Advanced defenses (ToM, CoDef) elicit deeper reasoning, with substantial increases in Depth 2 and Depth 3+ responses, indicating richer adversarial intent modeling and collaborative inference. Figure 3

Figure 3: Reasoning depth across the different settings for Gemini.

Plan execution analysis further shows that for benign queries, execution rates remain high across all methods, while for sensitive queries, execution drops sharply under ToM and CoDef, indicating effective blocking of adversarial plans without unduly hindering benign functionality. Figure 4

Figure 4: Percentage of the times the adversary successfully elicits what it intended for benign and sensitive queries across different defense strategies for two models, highlighting the privacy-utility trade-off.

A qualitative example illustrates how an adversary can aggregate employee flight, payment, and personal data to identify an individual, and how ToM and CoDef defenses block sensitive steps, preventing full compositional leakage. Figure 5

Figure 5: Qualitative illustration of multi-step adversarial inference and defenses, showing how ToM and CoDef block sensitive steps and prevent compositional leakage.

Implications and Future Directions

This work exposes a critical, underrecognized class of privacy risks in multi-agent LLM deployments. The results demonstrate that privacy cannot be guaranteed by isolated, single-agent safeguards or by standard privacy-preserving practices such as differential privacy or input sanitization. Instead, robust protection requires explicit modeling of adversarial strategies, collaborative reasoning, and coordination among agents.

The findings have significant implications for the design of LLM-based multi-agent systems in enterprise, healthcare, and other sensitive domains. As LLMs are increasingly deployed in distributed, modular architectures, compositional privacy leakage must be considered a first-class threat. The demonstrated efficacy of collaborative defenses suggests that ensemble-style, consensus-based mechanisms are a promising direction for future research.

Further work is needed to:

  • Develop scalable, automated methods for adversarial intent modeling and collaborative defense in large, heterogeneous agent networks.
  • Formalize compositional privacy guarantees and integrate them with existing privacy frameworks.
  • Explore adaptive, context-aware defenses that dynamically calibrate privacy-utility trade-offs based on task requirements and risk profiles.

Conclusion

The paper provides a rigorous framework for evaluating and mitigating compositional privacy leakage in multi-agent LLM systems. By formalizing the threat model, constructing controlled evaluation scenarios, and empirically validating defense strategies, it establishes that privacy risks in distributed AI systems are fundamentally emergent and require collective, context-sensitive safeguards. The results underscore the necessity of moving beyond single-agent privacy guarantees toward coordinated, multi-agent defenses that balance utility and privacy in complex, real-world deployments.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Explain it Like I'm 14

What is this paper about?

This paper looks at a new kind of privacy risk in AI systems where multiple AI “agents” work together. Even if each agent gives answers that seem harmless on their own, those small pieces can be combined—like puzzle pieces—to reveal a private secret. The authors call this problem “compositional privacy leakage.”

Example: One agent tells you a customer’s name from an ID, another agent tells you what they bought, and a third agent tells you what medical claims are linked to certain products. None of these answers alone is private. But put them together, and you might learn something sensitive about the person’s health that no single agent directly revealed.

What questions were they trying to answer?

The paper explores three main questions in simple terms:

  • Does combining harmless answers from different AI agents let attackers figure out private facts?
  • How can we measure and test this kind of “put-the-pieces-together” privacy leak in a fair, controlled way?
  • What defenses can stop harmful combinations while still letting useful (benign) tasks work?

How did they paper it?

Think of a company with different AI helpers (agents), each in charge of one safe table of facts (like “ID → name” or “ID → purchases”). No single agent has the private secret. But a curious attacker can visit several agents and collect fragments to reconstruct that secret.

To test this, the authors built controlled scenarios where:

  • Each agent holds only non-sensitive pieces.
  • The sensitive fact can only be discovered by stitching together multiple agents’ answers.
  • They include both “adversarial” cases (where combining pieces would reveal something sensitive) and “benign” cases (where combining pieces is useful and not sensitive).

They also gave the attacker a “plan” (a step-by-step recipe) for which agents to ask and in what order. This makes sure the test measures the defenses—not whether the attacker is clever enough to plan.

They measured:

  • How often the attacker correctly guesses the secret (leakage).
  • How often the attacker can carry out all steps in their plan.
  • How often benign tasks succeed (the system helps when it should).
  • A “Balanced Outcome” that averages “blocked sensitive cases” and “successful benign cases” to show the privacy-vs-utility trade-off.

They tested different defense strategies:

  • Chain-of-Thought (CoT): The agent explains its steps but doesn’t try to detect tricky intent.
  • Theory-of-Mind (ToM): The agent tries to guess the questioner’s intentions—like asking, “If I answer this, could they use it with other info to get something private?”
  • Collaborative Consensus Defense (CoDef): Multiple defender agents share a summary of what’s been asked/answered and “vote” on whether to answer. If any one flags risk, the system blocks the response. This is like a team of guards who can veto risky requests.

What did they find?

Big picture:

  • Simple step-by-step reasoning (CoT) is not enough. Attackers still succeed often.
  • Trying to read the asker’s intent (ToM) blocks many sensitive leaks, but also blocks more harmless tasks.
  • Teaming up with voting and shared context (CoDef) gives the best balance between stopping leaks and staying helpful.

Key results (rounded):

  • CoT alone blocked only about 31–39% of sensitive cases in two models; in the strongest model tested it did better but still left risks. So, plain reasoning tends to overshare.
  • ToM blocked about 89–97% of sensitive cases—a big improvement—but it also made the system say “no” more often to harmless requests, lowering helpfulness.
  • CoDef blocked about 86–90% of sensitive cases while keeping helpful success around 66–70%. It achieved the best overall balance in several settings (for example, a “Balanced Outcome” up to about 79.8%).
  • Stronger models (like the largest one tested) handled the trade-off better overall: they were better at both spotting risky intent and completing benign tasks.
  • Extra analysis showed that “deeper” reasoning (thinking through risks and cross-agent effects) correlated with better protection. Also, defenses sharply reduced plan execution for sensitive queries while keeping it relatively high for benign ones—exactly what you want.

Why does this matter?

As more apps use many AI agents working together (like separate helpers for HR, finance, and compliance), privacy risks don’t just come from what one agent knows or remembers. The real danger can be in how answers combine across agents and over time. That means:

  • Old defenses aimed at a single model (like only redacting names or focusing on memorized text) are not enough.
  • Systems need to anticipate how an answer could be used with other pieces—this is where Theory-of-Mind reasoning helps.
  • Collaboration among defender agents—sharing history and letting any one of them veto risky replies—can protect privacy without shutting down useful work.
  • Designers should test multi-agent systems with scenarios that mix benign and adversarial cases, and measure both privacy and usefulness together (not just one or the other).

In short, the paper shows that “the sum leaks more than its parts” in multi-agent AI. It also offers practical ways—intent-aware reasoning and collaborative voting—to keep systems useful while stopping hidden, compositional privacy leaks.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise, actionable list of what remains missing, uncertain, or unexplored in the paper’s current scope.

  • External validity: Results are based on a synthetic, structured-table benchmark (119 scenarios). It remains unclear how findings transfer to real organizational data (unstructured text, noisy records, heterogeneous schemas, missing values, multimodal evidence) and to domain-specific constraints (e.g., healthcare, finance).
  • Depth and breadth of compositions: The paper does not systematically vary composition depth, branching factor, or asynchronous/parallel querying; it is unknown how leakage scales with longer, more complex, or concurrent multi-agent interactions.
  • Auxiliary knowledge sensitivity: The attacker’s background knowledge is not systematically parameterized; the leakage-utility trade-off under varying and uncertain auxiliary knowledge levels remains unquantified.
  • Planning assumption: The attacker is given an optimal plan P*; the real-world impact when adversaries must discover plans (with exploration costs and errors) is not studied. How defense performance changes under realistic planning noise is unknown.
  • Partial leakage quantification: Metrics use exact-match indicators (Leakage Accuracy, PlanExec@m), offering no measure of partial inference (e.g., narrowing candidate space). A graded leakage metric (e.g., information gain, rank reduction, mutual information) is missing.
  • Cost-sensitive evaluation: The Balanced Outcome metric equally weights privacy and utility; organizations may assign asymmetric costs. No evaluation under cost-sensitive or risk-weighted objectives is provided.
  • Judge reliability and bias: A single LLM (Qwen3-32B) adjudicates outcomes; inter-rater reliability, calibration against human judgments, or ensemble-judge robustness are not assessed.
  • Model and decoding robustness: Results rely on specific models and decoding setups (e.g., temperature 0 vs. 1). Robustness to sampling stochasticity, model updates, fine-tuning, and inference-time variability is not established.
  • Adversary capability spectrum: The attacker model family and strength are limited; leakage under stronger/weaker, adaptive, or fine-tuned adversaries (including tool-using agents) remains unexplored.
  • Realistic adversary tactics: The threat model omits adversaries using multiple identities (Sybil), reset identities (to evade history-based defenses), parallel queries, cover queries (to camouflage intent), or prompt-injection/social-engineering across agents.
  • Defender trust and byzantine resilience: All defenders are assumed honest and non-colluding. The impact of compromised or colluding defenders (Byzantine participants) and resilience of CoDef under such conditions are untested.
  • Shared state privacy risk: CoDef’s shared aggregated state may itself leak sensitive fragments across defenders or create a new attack surface if compromised. No secure state-sharing mechanism (e.g., access control, MPC, DP) or leakage analysis is provided.
  • Voting rule design: CoDef uses “block if any defender blocks.” There is no paper of alternative thresholds, weighted votes, confidence calibration, or adaptive consensus rules to tune false positives/negatives.
  • Communication and scaling costs: The computational, latency, and communication overhead of CoDef at scale (large number of agents, long interactions) are not measured; scalability limits and optimization strategies are open.
  • Formal guarantees: There is no formal privacy guarantee (e.g., noninterference, information-flow bounds, DP-style composition theorems) quantifying leakage under multi-agent composition and defenses.
  • Policy learning vs. prompting: Defenders are prompt-designed; no training or optimization (e.g., RL, supervised safety tuning) is used to improve ToM or CoDef policies over time. How learned policies compare to prompt-only defenses is unknown.
  • Adaptive adversary–defender dynamics: No longitudinal or game-theoretic analysis where adversaries adapt to defenses and defenders update policies (e.g., via online learning) is provided.
  • Integration with access control and information-flow controls: The work does not combine its defenses with formal access control, data provenance/taint tracking, or type systems that enforce allowed compositions at query or dataflow time.
  • Utility preservation strategies: ToM significantly over-blocks benign queries, but no calibration methods (e.g., thresholds, uncertainty estimation, abstention with appeal, human-in-the-loop escalation) are proposed or evaluated.
  • Domain-specific benign utility: Utility is measured generically; there is no analysis of domain-relevant task quality degradation (e.g., workflow impacts, KPI regressions) or user experience under blocking.
  • Error tolerance and noise: The scenarios assume correct intermediate values; effects of noisy data, schema drift, entity resolution errors, or adversary-introduced noise on both leakage and defense reliability are not studied.
  • Multi-modal and tool-use settings: The paper is text-only with structured tables; effects of tools (retrieval, code execution, spreadsheets/BI tools), multimodal inputs (images, PDFs), and API integrations on compositional leakage are open.
  • Cross-lingual and cross-domain generalization: The robustness of defenses across languages, jargon-heavy domains, and culturally varying privacy norms is not evaluated.
  • Privacy budgets and rate limits: There is no concept of per-user/agent privacy budgets, query limits, or throttling; how such operational controls interact with ToM/CoDef remains to be tested.
  • Logging, audit, and compliance: The work does not address how to audit compositional leakage attempts, provide explanations for blocks, or meet regulatory requirements (e.g., HIPAA/GDPR) under multi-agent collaboration.
  • Fairness implications: Over-blocking may differentially impact certain users or tasks; fairness in blocking decisions and disparate impact are not analyzed.
  • Safety–privacy interplay: Interactions between privacy defenses and other safety layers (e.g., jailbreak defenses, toxicity filters) are unexamined; potential interference or compounding failures are unknown.
  • Defense evasion via query rewriting: Whether adversaries can reliably bypass ToM/CoDef by rephrasing, decomposing, or reordering queries (distribution shift) is not tested.
  • CoDef state design: The specific content and granularity of the shared state are fixed; the impact of summarization quality, compression, retention windows, or redaction strategies on leakage and utility is unexplored.
  • Failure analysis granularity: The paper reports aggregate metrics but provides limited taxonomy of failure modes (e.g., which step types leak most, which agent schemas are most vulnerable), hindering targeted mitigation design.
  • Real-world deployment constraints: There is no assessment of engineering trade-offs (costs, latency, monitoring), incident response procedures, or operator workflows necessary to operationalize these defenses.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 12 posts and received 138 likes.