The Geometry of Differential Privacy: the Sparse and Approximate Cases (1212.0297v1)

Published 3 Dec 2012 in cs.DS

Abstract: In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an $O(\log² d)$ approximation to the optimal mechanism is known. Our first contribution is to give an $O(\log² d)$ approximation guarantee for the case of $(\eps,\delta)$-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when $d > n \triangleq |x|_1$. It is known that better mechanisms exist in this setting. Our second main contribution is to give an $(\eps,\delta)$-differentially private mechanism which is optimal up to a $\polylog(d,N)$ factor for any given query set $A$ and any given upper bound $n$ on $|x|_1$. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the $\eps$-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size $n$ by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix $A$.

Citations (215)

View on Semantic Scholar

Summary

The paper introduces an O(log² d) approximation mechanism for (ε,δ)-differential privacy by leveraging correlated Gaussian noise and hereditary discrepancy.
The paper presents mechanisms for sparse query settings that achieve mean squared errors within a polylogarithmic factor of the optimum.
The paper applies convex geometry and discrepancy theory to derive universal error bounds, advancing privacy-accuracy trade-offs in data analysis.

The Geometry of Differential Privacy: The Sparse and Approximate Cases

The paper "The Geometry of Differential Privacy: The Sparse and Approximate Cases" by Aleksandar Nikolov, Kunal Talwar, and Li Zhang addresses key challenges in the design of differentially private mechanisms for answering linear queries over databases. The focus is on optimizing the trade-off between privacy and accuracy, particularly in settings with sparse datasets or when employing approximate differential privacy parameters.

Summary of Contributions

Approximation for Approximate Differential Privacy: The authors extend previous work that provides an $O(\log^2 d)$ approximation for the optimal mechanism under $(\epsilon,\delta)$ -differential privacy. Their approach involves adding correlated Gaussian noise to query answers, and they establish the tightness of their approximation relative to the hereditary discrepancy, a fundamental measure in combinatorial discrepancy theory.
Sparse Cases with Large Query Sets: Investigating scenarios where the number of queries exceeds the number of individuals (i.e., $d > n$ ), the paper presents mechanisms that achieve mean squared error within $polylog(d,N)$ of the optimal. This is achieved by using linear regression over the $\ell_1$ ball together with Gaussian noise.
Hereditary Discrepancy Bounds: The methodology enables deriving the first polylogarithmic approximation to the hereditary discrepancy of a matrix, a significant result considering that computing hereditary discrepancy exactly is not known to be in NP.

Theoretical Insights and Implications

Convex Geometry and Differential Privacy: The authors leverage tools from convex geometry, specifically the minimum volume enclosing ellipsoid, to devise mechanisms that offer privacy-accuracy trade-offs. These geometrical insights allow for handling both dense and sparse cases efficiently.
Connection to Discrepancy Theory: Hereditary discrepancy serves as a bridge between the geometry of privacy mechanisms and linear algebraic properties of datasets. By bounding privacy noise in terms of hereditary discrepancy, the paper advances our understanding of the fundamental limits of privacy-preserving data analysis.
Universal Upper Bounds: For counting queries within the context of approximate differential privacy, the research provides universal bounds on the error, improving significantly upon previous results, particularly reducing dependency on database size and circumventing the need for large query workloads.

Future Directions

The findings open several avenues for further exploration in differential privacy:

Efficient Computation of Hereditary Discrepancy: The approximation result for hereditary discrepancy suggests potential for new algorithmic approaches in discrepancy theory, potentially impacting a broader range of applications such as theoretical computer science and optimization.
Balancing Noise and Utility: Understanding the impact of database sparsity and the number of queries on privacy can inform the development of more sophisticated privacy-preserving mechanisms, especially in big data contexts.
Fine-Tuning Differential Privacy Parameters: By exploring the trade-offs in $(\epsilon,\delta)$ -differential privacy, the research hints at optimizing these parameters to achieve desired utility guarantees without excessive privacy loss.

In summary, this work advances the state of differential privacy in statistical databases by offering new bounds and efficient algorithms. Through the novel application of convex geometry and discrepancy theory, it provides a robust framework that enhances our understanding of privacy mechanisms in diverse data environments. As differential privacy continues to be pivotal in data sharing and analysis, these insights are critical for both theoretical advancements and practical implementations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/SethInternet/status/1879925588653474109