Differential Privacy via Wavelet Transforms (0909.5530v1)

Published 30 Sep 2009 in cs.DB

Abstract: Privacy preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, {\em $\epsilon$-differential privacy} provides one of the strongest privacy guarantees. Existing data publishing methods that achieve $\epsilon$-differential privacy, however, offer little data utility. In particular, if the output dataset is used to answer count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders the results useless. In this paper, we develop a data publishing technique that ensures $\epsilon$-differential privacy while providing accurate answers for {\em range-count queries}, i.e., count queries where the predicate on each attribute is a range. The core of our solution is a framework that applies {\em wavelet transforms} on the data before adding noise to it. We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data, we show the effectiveness and efficiency of our solution.

Citations (524)

View on Semantic Scholar

Summary

The paper introduces Privelet, a novel framework applying wavelet transforms to enforce ε-differential privacy with improved data utility.
It details customized approaches for ordinal and nominal data using Haar and a novel nominal wavelet transform to minimize noise.
Experimental results show that Privelet reduces noise variance to polylogarithmic levels, significantly outperforming traditional methods on range-count queries.

Differential Privacy via Wavelet Transforms: An Overview

This paper addresses the crucial issue of privacy-preserving data publishing by introducing a novel approach leveraging wavelet transforms to ensure $\epsilon$ -differential privacy. The authors focus on enhancing data utility, particularly for range-count queries, while maintaining robust privacy guarantees.

Core Contributions

The research presents a method named Privelet, which incorporates wavelet transforms to achieve $\epsilon$ -differential privacy. Unlike existing methods that often compromise data utility, Privelet offers substantial improvements, especially in answering range-count queries with reduced noise.

Wavelet Transform Framework: The framework applies wavelet transforms to the frequency matrix of the dataset before adding noise, ensuring privacy while retaining data utility.
Various Data Types: The paper extends the framework for handling both ordinal and nominal data, utilizing the Haar wavelet transform for the former and introducing a novel nominal wavelet transform for the latter.
Theoretical Analysis: Privelet is rigorously analyzed for both its privacy and utility guarantees. The authors propose a novel concept of generalized sensitivity to justify the injected noise in wavelet coefficients.
Computational Efficiency: The technique is shown to operate efficiently, with linear complexity concerning the number of data tuples and the size of the frequency matrix.

Experimental Validation

Extensive experiments conducted on both real-world and synthetic datasets demonstrate that Privelet significantly outperforms traditional methods, such as that proposed by Dwork et al., by providing more accurate results for range-count queries while ensuring $\epsilon$ -differential privacy.

Accuracy Improvement: Privelet offers a marked improvement in accuracy, with noise variance reduced to polylogarithmic in relation to the dataset size, a substantial advancement over previous linear noise variance bounds.
Scalability: The method scales well with large datasets, maintaining reasonable computational overhead.

Implications and Future Work

The theoretical implications of this work underscore the adaptability of wavelet transforms in differential privacy contexts, opening avenues for their application in other areas of secure data analysis and publication. Practically, the method allows more informative statistical analysis on publicly shared datasets with sensitive information.

Future developments could explore optimizing the framework for scenarios with known query distributions, or expanding the scope to other utility metrics beyond noise variance, such as relative error.

In conclusion, the paper contributes a significant stride toward balancing data utility with privacy in data publishing, utilizing advanced mathematical tools to push the boundaries of what can be achieved under the differential privacy paradigm.

PDF Markdown