Fingerprinting Codes and the Price of Approximate Differential Privacy
(1311.3158v3)
Published 13 Nov 2013 in cs.CR
Abstract: We show new lower bounds on the sample complexity of $(\varepsilon, \delta)$-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database $D \in ({0,1}d)n$ has the form "What fraction of the individual records in the database satisfy the property $q$?" We show that in order to answer an arbitrary set $\mathcal{Q}$ of $\gg nd$ counting queries on $D$ to within error $\pm \alpha$ it is necessary that $$ n \geq \tilde{\Omega}\Bigg(\frac{\sqrt{d} \log |\mathcal{Q}|}{\alpha2 \varepsilon} \Bigg). $$ This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and $(\varepsilon, \delta)$-differential privacy is asymptotically larger than what is required merely for accuracy, which is $O(\log |\mathcal{Q}| / \alpha2)$. In addition, we show that our lower bound holds for the specific case of $k$-way marginal queries (where $|\mathcal{Q}| = 2k \binom{d}{k}$) when $\alpha$ is not too small compared to $d$ (e.g. when $\alpha$ is any fixed constant). Our results rely on the existence of short \emph{fingerprinting codes} (Boneh and Shaw, CRYPTO'95, Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.
The paper establishes new information-theoretic lower bounds on the sample complexity required for achieving (ε, δ)-differential privacy in counting queries.
It introduces a novel connection between fingerprinting codes and differential privacy, using the tracing properties of these codes to derive robust lower bounds.
By generalizing fingerprinting constructions to arbitrary counting queries and error-robust settings, the work broadens the practical applicability of privacy-preserving mechanisms.
An Analysis of "Fingerprinting Codes and the Price of Approximate Differential Privacy"
The paper "Fingerprinting Codes and the Price of Approximate Differential Privacy" by Mark Bun, Jonathan ULLMan, and Salil Vadhan presents important advancements in understanding the sample complexity required for achieving differential privacy in answering counting queries on large databases. The authors provide information-theoretic lower bounds on the sample complexity of (ϵ,δ)-differentially private algorithms, contributing valuable insights into the nature of privacy-preserving mechanisms.
Key Contributions
Sample Complexity Lower Bounds: The central result of this paper is a new lower bound on the sample complexity necessary for answering counting queries while satisfying (ϵ,δ)-differential privacy. Specifically, they demonstrate that for answering an arbitrary set Q of counting queries accurately, the sample size n must be at least Ω~(dlog∣Q∣/α2), where d is the dimensionality of the data, and α is the error tolerance. This is crucial as it quantifies the heightened cost of privacy for high-dimensional data.
Connection to Fingerprinting Codes: The authors establish a novel connection between fingerprinting codes and differential privacy, positing that the existence of short fingerprinting codes implies strong lower bounds on the sample complexity of differentially private data releases. This connection is leveraged to prove the main lower bounds using the theory of fingerprinting codes, specifically exploiting their tracing capabilities against collusive attacks.
Generalization to Arbitrary Counting Queries: By expanding the structure of fingerprinting codes to a generalized form, the paper argues that similar principles can be applied to any set of counting queries, not just simplistic forms like k-way marginals. This significantly broadens the applicability of their results, extending their impact across a range of query types.
Error-Robust Fingerprinting Codes: The work extends existing constructions to create fingerprinting codes that are robust to a fixed fraction of adversarial errors. This modification aids in supporting lower bounds for more realistic scenarios where noise can distort the data beyond clean theoretical constructs.
Implications and Future Directions
The implications of this research are both theoretical and practical. Theoretically, it clarifies the inherent cost of differential privacy and positions fingerprinting codes as a fundamental tool in understanding these costs. Practically, these insights inform the design of privacy-preserving systems, emphasizing the additional sample requirements imposed by privacy constraints, particularly as datasets become high dimensional.
For future research, it would be pertinent to explore the optimization of error-robust fingerprinting codes further, aiming to increase their error tolerance while maintaining minimal sample complexity. Additionally, the trade-offs between computational efficiency and privacy guarantees in practical implementations remain an open and vital area of exploration. Considering different data distributions and beyond-Worst-Case analysis could refine the applicability of these theories to real-world scenarios.
In conclusion, the paper contributes a critical understanding of the intersection between differential privacy and information theory, proposing a framework where robust fingerprinting codes offer strong bounds on the necessary datasets for ensuring privacy with acceptable accuracy. This work establishes a foundation that subsequent research and implementations of differential privacy need to consider when addressing privacy in data analysis.