Bayesian Advantage of Re-Identification Attack in the Shuffle Model

Published 5 Nov 2025 in cs.CR | (2511.03213v1)

Abstract: The shuffle model, which anonymizes data by randomly permuting user messages, has been widely adopted in both cryptography and differential privacy. In this work, we present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model. We begin with a basic setting: one sample is drawn from a distribution $P$, and $n - 1$ samples are drawn from a distribution $Q$, after which all $n$ samples are randomly shuffled. We define $β_n(P, Q)$ as the success probability of a Bayes-optimal adversary in identifying the sample from $P$, and define the additive and multiplicative Bayesian advantages as $\mathsf{Adv}_n^{+}(P, Q) = β_n(P,Q) - \frac{1}{n}$ and $\mathsf{Adv}_n^{\times}(P, Q) = n \cdot β_n(P,Q)$, respectively. We derive exact analytical expressions and asymptotic characterizations of $β_n(P, Q)$, along with evaluations in several representative scenarios. Furthermore, we establish (nearly) tight mutual bounds between the additive Bayesian advantage and the total variation distance. Finally, we extend our analysis beyond the basic setting and present, for the first time, an upper bound on the success probability of Bayesian attacks in shuffle differential privacy. Specifically, when the outputs of $n$ users -- each processed through an $\varepsilon$-differentially private local randomizer -- are shuffled, the probability that an attacker successfully re-identifies any target user's message is at most $e^{{\varepsilon}/n$.}

Abstract PDF Upgrade to Chat

Summary

The paper introduces Bayesian metrics like βₙ(P, Q) to quantify an adversary’s success in the shuffle model, establishing asymptotic bounds based on likelihood ratios.
It demonstrates a strong link between Bayesian advantage and total variation distance, providing tight mutual bounds that clarify privacy risks.
The study extends the analysis to shuffle differential privacy using clone and blanket decomposition techniques, offering practical guidelines for mitigating re-identification attacks.

Bayesian Advantage of Re-Identification Attack in the Shuffle Model

Introduction

The paper "Bayesian Advantage of Re-Identification Attack in the Shuffle Model" (2511.03213) offers a comprehensive analysis of Bayesian re-identification attacks within the shuffle model, which is pivotal for anonymity in cryptography and differential privacy. The shuffle model anonymizes data through random shuffling of user messages, amplifying local differential privacy. This work explores the Bayesian advantage where attackers seek to identify an individual’s message from shuffled outputs, an essential consideration in privacy-preserving systems.

Problem Formulation and Basic Setting

At the core of this research is the formalization of re-identification attacks. Each user generates a message according to a distribution, which is then shuffled. The attacker aims to correctly identify a target user's message. The paper introduces $\beta_n(P, Q)$ , the success probability for a Bayes-optimal adversary, alongside additive and multiplicative Bayesian advantages, $\mathsf{Adv}_n^{+}$ and $\mathsf{Adv}_n^{\times}$ , respectively. These metrics are crucial for analyzing an attacker's success in distinguishing individual messages within a shuffled collection.

Analytical Insights

A significant contribution of this paper is deriving closed-form expressions and asymptotic behavior of $\beta_n(P, Q)$ . By leveraging likelihood ratios and classical information theory techniques, the authors establish that the Bayesian success probability is bounded by $\frac{M}{n}$ , with $M = \sup_{x: Q(x) \neq 0} \frac{P(x)}{Q(x)}$ . This formulation facilitates an elegant asymptotic characterization of attack effectiveness as the number of users increases.

Moreover, the paper demonstrates a link between the Bayesian advantage and total variation distance, providing tight mutual bounds. This relationship foregrounds total variation distance as a key determinant of re-identification risk, highlighting its significance in cryptographic and privacy applications.

Generalization to Shuffle Differential Privacy

Extending beyond the basic setup, the analysis encompasses shuffle differential privacy (shuffle DP). Here, outputs from $\varepsilon$ -differentially private local randomizers are shuffled, and the work offers upper bounds on the attack's success probability in this paradigm. Specifically, the research asserts that a Bayesian adversary's success in posterior re-identification is limited by $\frac{e^{\varepsilon}}{n}$ , underscoring the protective boundary endowed by differential privacy when compounded with shuffling.

Decomposition Techniques

The study innovatively applies decomposition techniques, namely clone and blanket methods, to derive bounds on re-identification success. These methods allow for a decomposition of privacy amplification effects which are key in understanding both privacy assurance and the scope of adversarial risk in systems employing the shuffle model.

Figure 1: Decomposition methods for 1-DP Laplace mechanism.

Case Studies and Implications

The paper offers illustrative applications in real-world scenarios, such as honeyword systems and differential privacy models, providing quantitative insights into how $n-1$ decoys can protect against adversary identification of user-specific messages. These evaluations serve to guide practitioners in designing systems resilient to re-identification attacks.

A distinctive insight is that while clone decomposition gives general bounds, the blanket decomposition yields sharper, scenario-specific constraints, which are vital for tailoring differential privacy guarantees to particular implementations or system requirements.

Figure 2: beta_n(P,Q) vs. n where P=Zipf(0.7) and Q is uniform.

Conclusion

This study lays foundational work in comprehensively understanding Bayesian re-identification attacks in the shuffle model, proposing both theoretical bounds and practical methodologies for risk mitigation. By establishing connections to classical information theory metrics and providing rigorous analysis through decomposition techniques, the paper significantly advances privacy-preserving protocols under the shuffle model framework. These findings have substantial implications for the design of secure multiparty computations and other privacy-centric protocols in modern computational infrastructures.

Markdown