Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences (1807.01647v2)

Published 4 Jul 2018 in cs.LG, cs.CR, and stat.ML

Abstract: Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called "privacy amplification by subsampling" principle, which ensures that a differentially private mechanism run on a random subsample of a population provides higher privacy guarantees than when run on the entire population. Several instances of this principle have been studied for different random subsampling methods, each with an ad-hoc analysis. In this paper we present a general method that recovers and improves prior analyses, yields lower bounds and derives new instances of privacy amplification by subsampling. Our method leverages a characterization of differential privacy as a divergence which emerged in the program verification community. Furthermore, it introduces new tools, including advanced joint convexity and privacy profiles, which might be of independent interest.

Citations (354)

View on Semantic Scholar

Summary

The paper introduces a unified analytical framework that redefines privacy amplification via subsampling using advanced joint convexity and privacy profiles.
It leverages α-divergences to derive tighter differential privacy bounds than traditional ad-hoc methods.
The results guide the practical selection of subsampling strategies to enhance privacy guarantees in data analysis and machine learning.

Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences

The paper "Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences" provides an in-depth analytical framework to explore the privacy amplification phenomenon associated with differential privacy (DP) mechanisms when data is processed on randomly subsampled datasets. Differential privacy, a cornerstone in ensuring privacy in data analysis, benefits from numerous techniques to enhance its performance, and subsampling stands out as particularly effective. While traditional analyses of subsampling have often been ad-hoc, this paper introduces a comprehensive approach that not only consolidates existing understandings but also introduces novel insights into privacy amplification.

Contributions and Methodology

This research leverages $\alpha$ -divergences, a concept emergent from program verification, to redefine differential privacy within the context of subsampling. Key to their methodology is the establishment of a new analytical tool—advanced joint convexity—and the introduction of privacy profiles, which systematically describe privacy guarantees of algorithms. This characterization allows one to derive tight privacy amplification results, offering more precise privacy guarantees for subsampling than previously available.

The paper's contributions are notable in several respects:

Unified Analysis Framework: Previously scattered results about privacy amplification are integrated into a cohesive framework, allowing both improved retrieval of past results and enhancement of lower bounds on privacy. The framework is applicable to various subsampling strategies, extending beyond well-studied cases such as Poisson subsampling with replacement.
Novel Tools for Privacy Analysis:
- Advanced Joint Convexity: A new property of $\alpha$ -divergences that facilitates upper bounding the divergence between overlapping mixture distributions. This extension of joint convexity provides a foundational basis for tighter privacy amplification results.
- Privacy Profiles: These describe the entire range of differential privacy parameters a mechanism satisfies, enabling a nuanced understanding of how subsampling affects privacy across different settings.
Theoretical and Practical Implications: By providing a clear understanding of how various subsampling techniques enhance privacy, this research equips practitioners with the ability to choose the most suitable subsampling method for their privacy needs. It also sets a stage for further computational interpretations and implementations in real-world systems.

Key Results and Implications

The paper presents several cases, such as sampling without replacement and Poisson subsampling, where the authors derive tight privacy amplification bounds. For example, they establish that Poisson subsampling leads to a straightforward multiplicative reduction in the $\delta$ parameter of DP guarantees, refining earlier approximations.

An important theoretical insight is recognizing that some combinations of subsampling methods and privacy settings offer natural compatibility for privacy amplification, while others, like Poisson subsampling under substitution relations, do not. This distinction encourages more strategic method selection in practice.

Speculation on Future Developments

The implications of this paper extend into an evolved understanding of privacy-preserving methods across machine learning and data analytics. Possible forward directions include:

Broader Application in Machine Learning: Developing fine-grained differential privacy controls tailored for machine learning pipelines involving iterative updates and stochastic gradients.
Advancements in Privacy Notions: Extending the presented framework to other notions of privacy such as Renyi DP, which considers the privacy costs over a continuous range and could benefit from advanced joint convexity properties.
Tool Development for Practitioners: Implementations based on this framework could significantly aid developers and researchers in dynamically adjusting privacy parameters according to specific needs within different applications.

In summary, this paper makes significant strides in understanding and enhancing the potential of differential privacy via subsampling through rigorous analysis and the introduction of innovative conceptual tools. These contributions not only improve the theoretical underpinnings of privacy amplification but also offer practical pathways to enhance the protection of individual privacy in computations involving sensitive data.

PDF Markdown