- The paper introduces a two-phase DP2-Pub mechanism that uses a differentially private Bayesian network for effective attribute clustering and adaptive privacy budgeting.
- It employs invariant PRAM with a novel double-perturbation method to preserve data utility and maintain consistent statistical properties.
- Experimental results demonstrate lower variation distances and misclassification rates compared to methods like PrivBayes and DPPro, confirming its practical effectiveness.
Differentially Private High-Dimensional Data Publication with Invariant Post Randomization
Introduction
The exponential increase in data collection and the inherent high-dimensional nature of such data pose significant challenges in maintaining privacy when publishing or sharing this information. High-dimensional and heterogeneous data, common in various domains such as healthcare, social networking, and IoT, carry detailed insights for analysis. However, unauthorized access to these data can cause privacy violations. In response to this concern, this paper introduces DP2-Pub, a novel mechanism for the differentially private publication of high-dimensional data. The mechanism operates in two key phases: attribute clustering via a Markov-blanket-based approach and invariant Post Randomization (PRAM), ensuring high data utility without compromising privacy.
Related Work
Current literature offers several methods under both centralized and distributed settings, including PrivBayes, which utilizes Bayesian networks to model data correlations, and DPPro which focuses on data projection for privacy preservation. Both approaches, nevertheless, suffer limitations, particularly in maintaining data utility through the injection of noise or the simplistic application of random projection which neglects data correlations. The proposed DP2-Pub aims at addressing these shortcomings by introducing attribute clustering and invariant PRAM for preserving statistical information while satisfying differential privacy.
DP2-Pub Mechanism
The DP2-Pub mechanism proposes a two-phase process to address high-dimensional data privacy. Initially, a differentially private Bayesian network is constructed to understand attribute dependencies. Following this, attributes are clustered based on correlations depicted by the Bayesian network, leading to a more effective allocation of the privacy budget by treating clusters differently based on their internal cohesion and external coupling. In the subsequent phase, a novel double-perturbation method aligned with local differential privacy principles is introduced to perform invariant PRAM, significantly enhancing data utility preservation by maintaining consistent statistical properties post perturbation.
Experimental Evaluation
Extensive experiments conducted on real-world datasets validate the effectiveness of DP2-Pub in improving the utility of published data. The mechanism demonstrates superior performance in maintaining lower variation distances and achieving lower misclassification rates in SVM classifications when compared to existing methods like PrivBayes and DPPro. Notably, the approach shows resilience across varying privacy budgets, indicating robust applicability in real-world scenarios.
Implications and Future Directions
The DP2-Pub mechanism presents a significant advancement in the domain of differentially private high-dimensional data publication. By judiciously combining attribute clustering with invariant PRAM, it upholds the integrity of statistical information, thus providing a practical solution to the prevalent problem of privacy-preserving data publication. The distinction in handling attribute correlations and a novel approach towards data perturbation set a foundation for future explorations. Future work may delve into integrating manifold learning techniques to further enhance the mechanism's utility and explore its adaptability to manifold diverse datasets and privacy scenarios.
The findings and methodologies introduced in this paper contribute valuably to the ongoing discourse on differential privacy, offering an innovative perspective on handling high-dimensional data with a nuanced understanding of attribute correlations and privacy budget allocation. This direction not only paves the way for enhanced privacy-preserving data publishing techniques but also invites further research on optimizing data utility in the field of differential privacy.