MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples (1909.10594v3)

Published 23 Sep 2019 in cs.CR and cs.LG

Abstract: In a membership inference attack, an attacker aims to infer whether a data sample is in a target classifier's training dataset or not. Specifically, given a black-box access to the target classifier, the attacker trains a binary classifier, which takes a data sample's confidence score vector predicted by the target classifier as an input and predicts the data sample to be a member or non-member of the target classifier's training dataset. Membership inference attacks pose severe privacy and security threats to the training dataset. Most existing defenses leverage differential privacy when training the target classifier or regularize the training process of the target classifier. These defenses suffer from two key limitations: 1) they do not have formal utility-loss guarantees of the confidence score vectors, and 2) they achieve suboptimal privacy-utility tradeoffs. In this work, we propose MemGuard, the first defense with formal utility-loss guarantees against black-box membership inference attacks. Instead of tampering the training process of the target classifier, MemGuard adds noise to each confidence score vector predicted by the target classifier. Our key observation is that attacker uses a classifier to predict member or non-member and classifier is vulnerable to adversarial examples. Based on the observation, we propose to add a carefully crafted noise vector to a confidence score vector to turn it into an adversarial example that misleads the attacker's classifier. Our experimental results on three datasets show that MemGuard can effectively defend against membership inference attacks and achieve better privacy-utility tradeoffs than existing defenses. Our work is the first one to show that adversarial examples can be used as defensive mechanisms to defend against membership inference attacks.

Authors (5)

Jinyuan Jia (69 papers)
Ahmed Salem (35 papers)
Michael Backes (157 papers)
Yang Zhang (1132 papers)
Neil Zhenqiang Gong (118 papers)

Citations (363)

View on Semantic Scholar

Summary

The paper introduces MemGuard, a defense that perturbs model confidence scores with adversarial examples, reducing membership inference accuracy to near random levels.
MemGuard employs a two-phase approach that constructs adversarial examples under strict utility-loss constraints without modifying the training process.
Comparative experiments on datasets like Location, Texas100, and CH-MNIST show that MemGuard achieves superior privacy-utility tradeoffs versus methods such as dropout and differential privacy-SGD.

An Evaluation of MemGuard: A Defense Against Black-Box Membership Inference Attacks

The paper examines a novel approach, named MemGuard, designed to mitigate privacy risks associated with membership inference attacks on machine learning models. Specifically, these are black-box attacks where an adversary can determine if a given data sample was part of the training dataset solely based on access to the model's outputs, commonly confidence scores. These attacks present significant privacy concerns, especially in domains handling sensitive data like medical records or personal information.

Key Contributions

MemGuard introduces a defense mechanism with formal utility-loss guarantees, ensuring that the defense does not adversely impact the model's utility beyond acceptable thresholds. The defense bypasses the need to alter the training process, focusing instead on adding noise to the confidence scores. This noise transforms them into adversarial examples capable of misleading membership inference attacks, leveraging the vulnerability of ML classifiers to such adversarial inputs.

Methodology

MemGuard's defense strategy is implemented in two strategic phases:

Phase I consists of constructing an adversarial example from the confidence score. An original contribution here is a novel algorithm for generating adversarial examples under utility-loss constraints unique to maintaining confidence scores. The algorithm ensures the predicted class label stays constant, and the perturbed confidence scores maintain validity as a probability distribution.
Phase II utilizes a probabilistic mechanism to introduce the adversarial perturbation with a calculated probability, balancing the trade-off between privacy protection and utility loss. This method also includes analytical solutions that minimize attacker accuracy while balancing expected distortions to confidence scores.

Experimental Results

Experimental results show MemGuard's efficacy across three datasets: Location, Texas100, and CH-MNIST. The evaluations demonstrate reductions in attacker accuracy to levels equivalent to random guessing, indicating effective obfuscation of membership status when allowed reasonable perturbation magnitudes (e.g., $L_1$ -norm perturbation thresholds). The paper notably includes both non-adaptive (traditional) attacks and more sophisticated adaptive attacks that might anticipate such defenses.

Comparison with Existing Defenses

The performance of MemGuard is benchmarked against state-of-the-art defenses, including $L_2$ Regularization, Min-Max Game methods, Dropout techniques, and Differential Privacy-SGD. MemGuard achieved superior privacy-utility tradeoffs, attaining lower membership inference accuracies at equivalent or lower levels of confidence score distortion. Other methods, such as Model Stacking, resulted in substantial losses of predicted label accuracy, highlighting MemGuard's advantage of maintaining prediction utility.

Implications and Future Work

The paper concludes that by leveraging adversarial perturbations effectively, it's possible to defend against ML model privacy attacks without needing to adjust the training epochs, hyperparameters, or model structures extensively. The exploration of adversarial examples in this context opens new avenues for defensive strategies against a variety of machine-learning-based inference threats.

Future research could expand MemGuard's applicability to additional inference challenges, such as those posed by white-box settings or other side-channel threats. Enhancing robustness to adaptive attacks without significant computational costs remains a crucial focus, alongside optimizing the probabilistic noise addition strategies to maintain model performance while increasing privacy guarantees.