Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism
(2111.12981v2)
Published 25 Nov 2021 in cs.DS, cs.CR, cs.IT, math.IT, and stat.ML
Abstract: We give the first polynomial-time algorithm to estimate the mean of a $d$-variate probability distribution with bounded covariance from $\tilde{O}(d)$ independent samples subject to pure differential privacy. Prior algorithms for this problem either incur exponential running time, require $\Omega(d{1.5})$ samples, or satisfy only the weaker concentrated or approximate differential privacy conditions. In particular, all prior polynomial-time algorithms require $d{1+\Omega(1)}$ samples to guarantee small privacy loss with "cryptographically" high probability, $1-2{-d{\Omega(1)}}$, while our algorithm retains $\tilde{O}(d)$ sample complexity even in this stringent setting. Our main technique is a new approach to use the powerful Sum of Squares method (SoS) to design differentially private algorithms. SoS proofs to algorithms is a key theme in numerous recent works in high-dimensional algorithmic statistics -- estimators which apparently require exponential running time but whose analysis can be captured by low-degree Sum of Squares proofs can be automatically turned into polynomial-time algorithms with the same provable guarantees. We demonstrate a similar proofs to private algorithms phenomenon: instances of the workhorse exponential mechanism which apparently require exponential time but which can be analyzed with low-degree SoS proofs can be automatically turned into polynomial-time differentially private algorithms. We prove a meta-theorem capturing this phenomenon, which we expect to be of broad use in private algorithm design. Our techniques also draw new connections between differentially private and robust statistics in high dimensions. In particular, viewed through our proofs-to-private-algorithms lens, several well-studied SoS proofs from recent works in algorithmic robust statistics directly yield key components of our differentially private mean estimation algorithm.
The paper introduces a novel polynomial-time algorithm for pure differentially private mean estimation using Sum-of-Squares and Exponential Mechanism.
The algorithm achieves significantly improved sample complexity of ilde{O}(d) compared to previous methods requiring ilde{\Omega}(d^{1.5}) for pure privacy.
A meta-theorem demonstrates how Sum-of-Squares proofs can be converted into efficient differentially private algorithms, offering a general framework for future research.
Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism
This paper introduces a novel polynomial-time algorithm for mean estimation of multivariate probability distributions under pure differential privacy constraints, achieving sample complexity of O~(d), where d is the dimensionality. This work addresses and overcomes limitations of prior methodologies, which either required computationally expensive processes or only satisfied weaker notions of differential privacy. The utilization of Sum-of-Squares (SoS) methods marks a significant advancement in the process of designing efficient and private statistical estimation algorithms.
Key Contributions and Methodology
Algorithmic Approach:
The proposed algorithm applies the exponential mechanism, leveraging SoS techniques to ensure a bounded privacy loss. The mechanism traditionally operates by favoring selections with high utility while maintaining differential privacy through score functions that exhibit bounded sensitivity.
A novel concept introduced herein shows how SoS proofs can translate complex statistical estimation problems into tractable polynomial-time algorithms. In essence, SoS provides a systematic way to design and analyze algorithms that are inherently private and computationally feasible.
Sample Complexity:
The algorithm achieves mean estimation with O~(d) samples, a marked improvement compared to existing methods that necessitate Ω(d1.5) samples under pure differential privacy or resort to concentrated or approximate differential privacy with reduced privacy guarantees.
Robustness and Heavy-Tailed Distributions:
By integrating techniques from robust statistics, the algorithm extends its applicability to scenarios involving adversarial sample corruptions. It effectively maintains accuracy of α+O(η) despite an η-fraction of corrupt samples.
Convex Programming Integration:
The paper utilizes convex optimization techniques, embedding them into the exponential mechanism framework. Convex programs serve as the score functions, facilitating privacy-preserving sampling from log-concave distributions. This synergy between convex programming and differential privacy is an innovative stride in computational statistics.
Meta-Theorem and Broader Implications:
The authors establish a meta-theorem that provides a general framework for converting SoS proofs into efficient differentially private algorithms. This serves as a potential blueprint for future research, enabling broader applications of SoS methods in private algorithm design.
Practical and Theoretical Implications
The algorithm presented not only advances private mean estimation but stands as a testament to the broader implications of SoS frameworks in privacy-sensitive statistical tasks. It offers a promising direction for future exploration of privacy-preserving techniques applicable to a range of complex data-centric problems. As privacy concerns increasingly influence algorithmic design, this work provides a methodology for achieving robust privacy guarantees without compromising computational efficiency, making it particularly relevant for applications in high-dimensional data analysis, especially in big data contexts where privacy concerns are paramount.
Future Directions
Continued investigation into the interplay between SoS techniques and differential privacy could unveil additional applications beyond mean estimation, such as covariance estimation and regression. Additionally, exploring extensions to adaptive data analyses or real-time private data processing could further broaden the scope of impact. The promising reduction in sample complexity achieved here invites future work to refine and adapt these techniques for more generalized frameworks of robust private algorithm design across diverse statistical models.