Label-Only Membership Inference Attacks (2007.14321v3)

Published 28 Jul 2020 in cs.CR, cs.LG, and stat.ML

Abstract: Membership inference attacks are one of the simplest forms of privacy leakage for machine learning models: given a data point and model, determine whether the point was used to train the model. Existing membership inference attacks exploit models' abnormal confidence when queried on their training data. These attacks do not apply if the adversary only gets access to models' predicted labels, without a confidence measure. In this paper, we introduce label-only membership inference attacks. Instead of relying on confidence scores, our attacks evaluate the robustness of a model's predicted labels under perturbations to obtain a fine-grained membership signal. These perturbations include common data augmentations or adversarial examples. We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences. We further demonstrate that label-only attacks break multiple defenses against membership inference attacks that (implicitly or explicitly) rely on a phenomenon we call confidence masking. These defenses modify a model's confidence scores in order to thwart attacks, but leave the model's predicted labels unchanged. Our label-only attacks demonstrate that confidence-masking is not a viable defense strategy against membership inference. Finally, we investigate worst-case label-only attacks, that infer membership for a small number of outlier data points. We show that label-only attacks also match confidence-based attacks in this setting. We find that training models with differential privacy and (strong) L2 regularization are the only known defense strategies that successfully prevents all attacks. This remains true even when the differential privacy budget is too high to offer meaningful provable guarantees.

Authors (4)

Christopher A. Choquette-Choo (49 papers)
Nicholas Carlini (101 papers)
Nicolas Papernot (123 papers)
Florian Tramer (19 papers)

Citations (438)

View on Semantic Scholar

Summary

The paper introduces a novel membership inference attack that relies solely on predicted labels for determining training membership.
It leverages input perturbations and adversarial examples to bypass traditional confidence masking defenses in sensitive data models.
Experimental results reveal that only differential privacy and strong ℓ₂ regularization significantly mitigate these privacy threats.

Overview of Label-Only Membership Inference Attacks

The paper "Label-Only Membership Inference Attacks" presents a significant exploration of privacy threats associated with machine learning models, specifically focusing on membership inference (MI) attacks. These attacks are a method by which an adversary can determine whether a specific data point was part of a model's training set. The authors propose a new form of MI attack that requires only the predicted labels from the model, rather than access to the confidence scores traditionally used in MI attacks. The paper is authored by Christopher A. Choquette-Choo, Florian Tramèr, Nicholas Carlini, and Nicolas Papernot.

Membership Inference Background

Membership inference attacks pose a critical challenge to the privacy of machine learning models trained on sensitive data such as medical records or financial information. Existing MI attacks usually depend on confidence scores, which reflect the probability distribution over class labels and allow the adversary to infer membership based on prediction confidence disparities. However, the proposed label-only membership inference (LOMI) attack obscures confidence score access, pushing the boundaries of traditional attack scenarios.

Label-Only Attack Methodology

The authors introduce a label-only MI attack framework that assesses a model's susceptibility to input perturbations. This approach exploits the model's prediction labels under slight variations of input data to deduce training membership. The attack involves generating data augmentations or adversarial examples without querying the model for confidence vectors. This strategy challenges the effectiveness of defenses adopting "confidence masking," which alter confidence scores without modifying predicted labels, demonstrating that simply obfuscating confidence scores does not suffice to protect against MI attacks.

Numerical Results and Defense Evaluation

Numerical experiments confirm that label-only attacks can perform as effectively as traditional confidence-based attacks. They are particularly effective against defenses like MemGuard and adversarial regularization, which aim to mask confidence scores. This analysis is compelling as the performance of these defenses against confidence-vector attacks drops significantly, whereas label-only attacks continue to perform with high accuracy, suggesting that existing defenses are inadequate. Notably, only differential privacy and strong $\ell_2$ regularization present significant obstacles to MI attacks, including both in-label only scenarios and augmented data training.

Implications and Future Directions

This research indicates that ML systems need more robust defenses against MI attacks, highlighting the insufficiency of current measures that modify confidence scores without fundamentally addressing overfitting. The exploration into the label-only domain opens pathways for further inquiry into attack strategies where adversaries have even more limited access, and the results suggest broader evaluations of defenses in real-world applications might be necessary.

For future AI developments, exploring the interplay between adversarial robustness and MI attack efficiency could yield insights for mitigating privacy risks. It calls into attention the vital balance between model accuracy and privacy guarantees, advocating for methodologies that protect against MI through improved generalization without compromising performance.

In conclusion, the paper delivers a foundational leap in understanding model vulnerabilities in label-only query environments, challenging conventional defense approaches. This paper's insights are invaluable for researchers striving to fortify ML models against privacy-invasive MI attacks across various deployment scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/briandcolwell/status/1918006114345750643

https://twitter.com/briandcolwell/status/1918005057687306507