DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization

Published 24 Aug 2022 in cs.CR and cs.CY | (2208.11693v1)

Abstract: A large amount of high-dimensional and heterogeneous data appear in practical applications, which are often published to third parties for data analysis, recommendations, targeted advertising, and reliable predictions. However, publishing these data may disclose personal sensitive information, resulting in an increasing concern on privacy violations. Privacy-preserving data publishing has received considerable attention in recent years. Unfortunately, the differentially private publication of high dimensional data remains a challenging problem. In this paper, we propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases: a Markov-blanket-based attribute clustering phase and an invariant post randomization (PRAM) phase. Specifically, splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable allocation of privacy budget, while a double-perturbation mechanism satisfying local differential privacy facilitates an invariant PRAM to ensure no loss of statistical information and thus significantly preserves data utility. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy. We conduct extensive experiments on four real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data utility of the published data while satisfying differential privacy.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a two-phase DP2-Pub mechanism that uses a differentially private Bayesian network for effective attribute clustering and adaptive privacy budgeting.
It employs invariant PRAM with a novel double-perturbation method to preserve data utility and maintain consistent statistical properties.
Experimental results demonstrate lower variation distances and misclassification rates compared to methods like PrivBayes and DPPro, confirming its practical effectiveness.

Differentially Private High-Dimensional Data Publication with Invariant Post Randomization

Introduction

The exponential increase in data collection and the inherent high-dimensional nature of such data pose significant challenges in maintaining privacy when publishing or sharing this information. High-dimensional and heterogeneous data, common in various domains such as healthcare, social networking, and IoT, carry detailed insights for analysis. However, unauthorized access to these data can cause privacy violations. In response to this concern, this study introduces DP2-Pub, a novel mechanism for the differentially private publication of high-dimensional data. The mechanism operates in two key phases: attribute clustering via a Markov-blanket-based approach and invariant Post Randomization (PRAM), ensuring high data utility without compromising privacy.

Current literature offers several methods under both centralized and distributed settings, including PrivBayes, which utilizes Bayesian networks to model data correlations, and DPPro which focuses on data projection for privacy preservation. Both approaches, nevertheless, suffer limitations, particularly in maintaining data utility through the injection of noise or the simplistic application of random projection which neglects data correlations. The proposed DP2-Pub aims at addressing these shortcomings by introducing attribute clustering and invariant PRAM for preserving statistical information while satisfying differential privacy.

DP2-Pub Mechanism

The DP2-Pub mechanism proposes a two-phase process to address high-dimensional data privacy. Initially, a differentially private Bayesian network is constructed to understand attribute dependencies. Following this, attributes are clustered based on correlations depicted by the Bayesian network, leading to a more effective allocation of the privacy budget by treating clusters differently based on their internal cohesion and external coupling. In the subsequent phase, a novel double-perturbation method aligned with local differential privacy principles is introduced to perform invariant PRAM, significantly enhancing data utility preservation by maintaining consistent statistical properties post perturbation.

Experimental Evaluation

Extensive experiments conducted on real-world datasets validate the effectiveness of DP2-Pub in improving the utility of published data. The mechanism demonstrates superior performance in maintaining lower variation distances and achieving lower misclassification rates in SVM classifications when compared to existing methods like PrivBayes and DPPro. Notably, the approach shows resilience across varying privacy budgets, indicating robust applicability in real-world scenarios.

Implications and Future Directions

The DP2-Pub mechanism presents a significant advancement in the domain of differentially private high-dimensional data publication. By judiciously combining attribute clustering with invariant PRAM, it upholds the integrity of statistical information, thus providing a practical solution to the prevalent problem of privacy-preserving data publication. The distinction in handling attribute correlations and a novel approach towards data perturbation set a foundation for future explorations. Future work may explore integrating manifold learning techniques to further enhance the mechanism's utility and explore its adaptability to manifold diverse datasets and privacy scenarios.

The findings and methodologies introduced in this study contribute valuably to the ongoing discourse on differential privacy, offering an innovative perspective on handling high-dimensional data with a nuanced understanding of attribute correlations and privacy budget allocation. This direction not only paves the way for enhanced privacy-preserving data publishing techniques but also invites further research on optimizing data utility in the field of differential privacy.

Markdown Report Issue