Distributed Differential Privacy via Shuffling (1808.01394v3)

Published 4 Aug 2018 in cs.CR and cs.DS

Abstract: We consider the problem of designing scalable, robust protocols for computing statistics about sensitive data. Specifically, we look at how best to design differentially private protocols in a distributed setting, where each user holds a private datum. The literature has mostly considered two models: the "central" model, in which a trusted server collects users' data in the clear, which allows greater accuracy; and the "local" model, in which users individually randomize their data, and need not trust the server, but accuracy is limited. Attempts to achieve the accuracy of the central model without a trusted server have so far focused on variants of cryptographic MPC, which limits scalability. In this paper, we initiate the analytic study of a shuffled model for distributed differentially private algorithms, which lies between the local and central models. This simple-to-implement model, a special case of the ESA framework of [Bittau et al., '17], augments the local model with an anonymous channel that randomly permutes a set of user-supplied messages. For sum queries, we show that this model provides the power of the central model while avoiding the need to trust a central server and the complexity of cryptographic secure function evaluation. More generally, we give evidence that the power of the shuffled model lies strictly between those of the central and local models: for a natural restriction of the model, we show that shuffled protocols for a widely studied selection problem require exponentially higher sample complexity than do central-model protocols.

Citations (325)

View on Semantic Scholar

Summary

The paper introduces the shuffled model that leverages an anonymous shuffler to amplify privacy while reducing reliance on centralized trust.
It provides protocols, including Pλ and its extension for real-valued sums, that achieve near-central accuracy with robust privacy guarantees.
Comparative analysis highlights that the shuffled approach balances privacy-accuracy trade-offs and offers scalable advantages over purely local methods.

Analyzing the Shuffled Model for Distributed Differential Privacy

The paper "Distributed Differential Privacy via Shuffling" presents an analysis of a novel model for implementing distributed differentially private protocols, which mediates between the central and local models commonly discussed in the literature. In the central model, a trusted entity aggregates user data, allowing high accuracy but necessitating trust in the server. Conversely, the local model ensures privacy independently at the user level but at the cost of accuracy, requiring extensive data volumes to achieve meaningful statistics. The shuffled model proposed in this paper introduces an intermediate approach, incorporating features from both models and thus balancing trust and accuracy.

The crux of the shuffled model is the deployment of an anonymous channel—termed a shuffler—that permutes user-supplied messages before they reach the data-collecting entity. This permutation disrupts the link between individual input data and its randomized output, providing privacy amplification beyond classical local approaches.

Contributions and Analytical Results

The paper offers a protocol in the shuffled model with compelling privacy and accuracy guarantees. Specifically, the P_\lambda protocol computes the sum of Boolean inputs, claiming privacy guarantees and utility with error bounds mirroring those achievable with the central model, albeit with better alignment to the privacy constraints comparable to local approaches. Critically, this result is derived from leveraging the permutation feature of the shuffler, which essentially distributes the sensitivity among the participants more effectively than isolated local randomization could achieve.

A significant advancement is the P^\R_{n,\lambda,r} protocol extension to handle real-valued sums, accommodating more complex input domains. By scaling the number of randomized encodings, this protocol acknowledges potential error increases from rounding real numbers to bits and counters it through repeated executions, ultimately converging on an output that boasts the statistical strength of central computations without the necessity of a globally trusted agent.

Privacy-Accuracy Trade-offs

Through Theorems presented, the shuffled model demonstrates achieving near-central accuracy with no central trust assumption, stipulating only that sufficient anonymity through shuffling holds. The distinction from purely cryptographic efforts is that shuffling allows for less interaction and computational overhead, pivotal for scalability.

Further comparative insights emerge when investigating variable-selection problems and histogram queries under the shuffled model. Particularly, lower bounds indicate that solutions under this paradigm may still require substantially larger samples for function accuracy, affirming that while closer to central model capabilities, the shuffled approach does not universally outstrip local limitations.

Implications: Theoretical and Practical

The theoretical implications offer a fresher perspective on adversarial assumptions in differential privacy, particularly emphasizing configurability over inherent model strictures. Privacy is achievable not by staunch assumptions but adaptive techniques that leverage distributed systems' innate anonymity.

Practically, this model introduces a framework ripe for deployment in scenarios where balance between user trust independence and algorithmic accuracy is paramount—for instance, in federated contexts where data never consolidates under one authority but might be fundamental to understanding global user behavior trends.

In summary, the shuffled model's interpretative strength underscores a pivotal move towards bridges in privacy regimes—where adaptability and configurability define new privacy-scales and approaches outside rigid model boundaries. The concepts present here can catalyze further inquiries into distributed data systems and their multipoint utility caps, reshaping perspectives on what entities need to be trusted to leverage aggregated insights without privacy invasions. As we look ahead, understanding how to unify these mechanisms under a singular regulatory or practical framework presents both a challenge and an opportunity for future research.

PDF Markdown