Preventing Disclosure of Sensitive Knowledge by Hiding Inference

Published 28 Aug 2013 in cs.CR, cs.DB, and cs.LG | (1308.6744v1)

Abstract: Data Mining is a way of extracting data or uncovering hidden patterns of information from databases. So, there is a need to prevent the inference rules from being disclosed such that the more secure data sets cannot be identified from non sensitive attributes. This can be done through removing or adding certain item sets in the transactions Sanitization. The purpose is to hide the Inference rules, so that the user may not be able to discover any valuable information from other non sensitive data and any organisation can release all samples of their data without the fear of Knowledge Discovery In Databases which can be achieved by investigating frequently occurring item sets, rules that can be mined from them with the objective of hiding them. Another way is to release only limited samples in the new database so that there is no information loss and it also satisfies the legitimate needs of the users. The major problem is uncovering hidden patterns, which causes a threat to the database security. Sensitive data are inferred from non-sensitive data based on the semantics of the application the user has, commonly known as the inference problem. Two fundamental approaches to protect sensitive rules from disclosure are that, preventing rules from being generated by hiding the frequent sets of data items and reducing the importance of the rules by setting their confidence below a user-specified threshold.