Machine Learning with Membership Privacy using Adversarial Regularization (1807.05852v1)

Published 16 Jul 2018 in stat.ML, cs.CR, and cs.LG

Abstract: Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As a fundamental inference attack, he aims to distinguish between data points that were part of the model's training set and any other data points from the same distribution. This is known as the tracing (and also membership inference) attack. In this paper, we focus on such attacks against black-box models, where the adversary can only observe the output of the model, but not its parameters. This is the current setting of machine learning as a service in the Internet. We introduce a privacy mechanism to train machine learning models that provably achieve membership privacy: the model's predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. We design a strategic mechanism where the privacy mechanism anticipates the membership inference attacks. The objective is to train a model such that not only does it have the minimum prediction error (high utility), but also it is the most robust model against its corresponding strongest inference attack (high privacy). We formalize this as a min-max game optimization problem, and design an adversarial training algorithm that minimizes the classification loss of the model as well as the maximum gain of the membership inference attack against it. This strategy, which guarantees membership privacy (as prediction indistinguishability), acts also as a strong regularizer and significantly generalizes the model. We evaluate our privacy mechanism on deep neural networks using different benchmark datasets. We show that our min-max strategy can mitigate the risk of membership inference attacks (close to the random guess) with a negligible cost in terms of the classification error.

Authors (3)

Milad Nasr (48 papers)
Reza Shokri (46 papers)
Amir Houmansadr (63 papers)

Citations (441)

View on Semantic Scholar

Summary

Machine Learning with Membership Privacy Using Adversarial Regularization

The paper "Machine Learning with Membership Privacy using Adversarial Regularization" explores a critical aspect of data privacy in the context of modern ML systems. As ML models become increasingly integrated into applications that handle sensitive data, protecting the privacy of such data becomes paramount. One notable threat to data confidentiality in ML is the membership inference attack. This attack allows adversaries to deduce whether specific data points were part of a model's training dataset by analyzing the model's outputs. The focus of this paper is on black-box models, commonly used in Machine Learning as a Service (MLaaS), where only the model's predictions are accessible, not its internals.

Key Contributions

Adversarial Privacy Mechanism: The authors introduce a privacy mechanism that anticipates membership inference attacks. The design goal is to create ML models that achieve high utility (predictive accuracy) while being robust against these attacks. The mechanism is formalized as a min-max game, a strategic optimization technique that balances model utility and privacy robustness.
Adversarial Training Algorithm: A novel training algorithm is developed, inspired by adversarial learning frameworks like generative adversarial networks (GANs). This adversarial process trains models to minimize classification loss while simultaneously minimizing the potential membership inference gain that an adversary could achieve.
Empirical Evaluation: The proposed privacy mechanism is rigorously evaluated on deep neural networks across several benchmark datasets including CIFAR100 and datasets used in privacy research like Purchase100 and Texas100. The results demonstrate a significant reduction in the efficacy of membership inference attacks, nearly to the level of random guesswork, with a marginal decrease in classification accuracy.

Numerical Results and Insights

For the CIFAR100 dataset with the Densenet architecture, the introduction of the privacy mechanism resulted in an attack accuracy reduction from 54.5% to 51.0%, suggesting almost no discernment power for the adversary, while the classification accuracy only decreased by 3%.
Similarly, in the Purchase100 dataset, inference attack accuracy dropped from 67.6% to 51.6% with only a 3.6% drop in classification accuracy.
These results highlight the mechanism's effectiveness in narrowing the gap between training set member predictions and non-member predictions, effectively mitigating information leakage.

Implications and Theoretical Considerations

The proposed adversarial regularization mechanism presents a robust defense against membership inference attacks with limited utility trade-offs. Theoretically, the use of min-max optimization aligns with the adversary's gain as a regularizing term for the classifier. This dual focus on privacy and utility optimization provides a scalable approach to secure ML model training, holding promise for application in privacy-sensitive domains such as healthcare, finance, and personal data analytics.

Future Directions

As the field progresses, integrating differential privacy mechanisms with the proposed adversarial regularization could further enhance privacy guarantees. Additionally, future work could explore the implications of expanded adversarial models, investigate more complex data distributions, or integrate other defense techniques like homomorphic encryption and secure computation. This confluence of methods would refine procedures for safeguarding data privacy while maintaining the integrity and efficacy of ML services.

PDF Markdown

Related Papers

Find Related Papers