Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Slicing: A New Approach to Privacy Preserving Data Publishing (0909.2290v1)

Published 12 Sep 2009 in cs.DB and cs.CR

Abstract: Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the l-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.

Citations (382)

Summary

  • The paper introduces slicing, a novel data anonymization technique that vertically partitions data by attribute correlations and horizontally into buckets, permuting values within buckets to break associations.
  • Slicing preserves significant attribute correlations for better data utility compared to generalization and offers stronger privacy protection, including resistance to membership disclosure, than bucketization.
  • Evaluations show slicing outperforms generalization in utility, performs comparably or better than bucketization with added privacy, and is efficiently computable and suitable for high-dimensional data.

Slicing: A New Approach to Privacy-Preserving Data Publishing

The paper "Slicing: A New Approach to Privacy-Preserving Data Publishing" introduces a data anonymization technique called slicing, which addresses the limitations presented by earlier methods like generalization and bucketization in privacy-preserving microdata publishing. Prior techniques have certain drawbacks: generalization can result in significant information loss, particularly with high-dimensional data, while bucketization does not guard against membership disclosure and is somewhat limited when there isn't a clear separation between quasi-identifiers and sensitive attributes.

Key Insights and Method

Slicing partitions data in two ways: vertically by grouping correlated attributes into columns and horizontally by organizing data into buckets. Within each bucket, column values are permuted to break the association between different columns while maintaining the association within each column. This two-pronged partitioning allows data utility to be preserved better than with generalization while providing privacy protection that outstrips bucketization's capabilities.

  1. Attribute Correlations: By focusing on grouping highly-correlated attributes, slicing benefits from preserving significant attribute correlations. This feature allows sliced data to maintain its utility because the correlation structure, which is often a target in data mining, remains intact within columns.
  2. Privacy Features: Slicing naturally incorporates the notion of \ell-diversity for attribute disclosure protection. This is done by ensuring that sensitive values in the dataset cannot be discerned by adversaries with more than a 1/1/\ell probability. Additionally, slicing introduces a large number of "fake" tuples with plausible attribute values, effectively safeguarding against membership disclosure.
  3. Algorithmic Efficiency: The slicing technique is efficiently computable and consists of three main phases: attribute partitioning, column generalization, and tuple partitioning. The attribute partitioning leverages clustering algorithms to group highly correlated attributes, while tuple partitioning, based on a Mondrian-inspired methodology, ensures privacy by creating ambiguity around tuple associations.

Experimental Evaluation

The authors have undertaken extensive empirical evaluations that involve experiments on datasets from the UCI machine learning repository, showcasing comparisons with other methods. Results consistently demonstrate that slicing outperforms generalization regarding data utility, especially when handling sensitive attributes. Comparative workload analyses show slicing provides similar or improved results over bucketization, with the added benefit of effectively protecting membership information.

Moreover, slicing's ability to handle high-dimensional data is highlighted. By reducing the dimensionality of a dataset through attribute partitioning, slicing is well-suited to applications in complex data environments, such as transaction databases, where a large number of attributes need to be considered simultaneously.

Implications and Future Work

This work opens multiple avenues for future research. One area involves expanding slicing through overlapping partitions, which could allow an attribute to be part of multiple columns, thereby releasing more comprehensive correlation information while still maintaining privacy. The authors also propose exploring optimized tuple partitioning strategies to strengthen membership privacy.

Furthermore, the ideas introduced in slicing could be adaptable towards stronger privacy models, such as differential privacy, if designed appropriately for the non-interactive data publishing context. As the landscape of data privacy evolves, slicing offers a versatile foundation upon which both theoretical advancements and practical applications might build.

In summary, slicing represents a significant contribution to the privacy-preserving data publishing toolkit. By marrying effective privacy protection with robust data utility, this approach circumvents many of the pitfalls encountered by its predecessors, proposes novel solutions for high-dimensional data contexts, and establishes itself as a promising framework for future exploration in data anonymization methodologies.