Membership Inference Attacks From First Principles (2112.03570v2)

Published 7 Dec 2021 in cs.CR and cs.LG

Abstract: A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instead be evaluated by computing their true-positive rate at low (e.g., <0.1%) false-positive rates, and find most prior attacks perform poorly when evaluated in this way. To address this we develop a Likelihood Ratio Attack (LiRA) that carefully combines multiple ideas from the literature. Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics.

Citations (537)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces LiRA, a novel attack that enhances membership inference by using per-example Gaussian modeling and calibration.
It refines the evaluation of privacy risks through shadow models and repeated querying strategies to distinguish training participation accurately.
Evaluations on multiple datasets show up to 10x higher true-positive rates at low false-positive rates, underscoring the need for robust privacy assessments.

Membership Inference Attacks From First Principles

This paper, authored by Carlini et al., evaluates and advances the methodology of membership inference attacks to better assess the privacy vulnerabilities of machine learning models. Traditional techniques focus on average-case scenarios, which may not effectively measure the worst-case privacy risks. The authors propose a novel Likelihood Ratio Attack (LiRA) that enhances the efficacy of such attacks, specifically at low false-positive rates, making them more relevant for real-world privacy evaluations.

Core Concepts

Membership inference attacks (MIAs) are designed to determine whether a particular data point was included in a model's training set. These attacks are crucial for evaluating if neural networks leak information about their training data, which often contains sensitive information.

Traditional MIAs are evaluated using average-case metrics such as accuracy or ROC-AUC. However, these metrics do not capture the attack's ability to confidently identify individual training examples. The paper argues for evaluating attacks based on true-positive rates at very low false-positive rates, which are more relevant in practice.

The Likelihood Ratio Attack (LiRA)

The paper proposes LiRA as a more robust method to evaluate membership inference. This approach uses shadow models to estimate the likelihood of a particular model's output under two hypotheses: the example was either part of the training set or it wasn't. By leveraging Gaussian modeling, LiRA effectively distinguishes between these two scenarios, allowing for a more accurate assessment of membership.

Key innovations of LiRA include:

Per-example calibration: By modeling the likelihoods of each example individually, the attack adapts to variations in data points' difficulty, improving its precision.
Parametric modeling: LiRA estimates the distributions of outputs using Gaussian approximations, enhancing its performance with fewer shadow models.
Query strategies: LiRA queries the model multiple times, often using data augmentations, to increase its robustness and accuracy.

Evaluation and Comparison

The authors present extensive evaluations of their method against prior attacks across several datasets, including CIFAR-10, CIFAR-100, ImageNet, and WikiText-103. LiRA demonstrated significant improvements, achieving up to 10 times higher true-positive rates at extremely low false-positive rates compared to existing methods.

One of the standout results showed that traditional evaluations might underreport the threat of MIAs by focusing on irrelevant or misleading metrics. By reshaping the evaluation landscape to focus on low-FPR regimes, LiRA offers a substantial improvement in practice.

Implications

The implications of this research are manifold. LiRA can better inform practitioners about the potential privacy risks associated with deploying machine learning models, especially in domains where data privacy is crucial, such as healthcare and finance. Furthermore, it raises awareness of the need to reevaluate previously dismissed privacy defenses and could inspire improved defenses that stand robustly against more sophisticated attacks.

Future Directions

Future work should explore the application of LiRA to broader settings, such as federated learning or models trained with differential privacy techniques. Research could also explore further reducing the computational overhead of LiRA, thus making it more accessible for a wider range of applications. Additionally, investigating the role of architectural choices and training regimes on MIAs could provide deeper insights into safeguarding model privacy.

In summary, the paper offers a meaningful and methodically refined approach to membership inference, setting a higher standard for privacy evaluation in machine learning. This work not only enhances the understanding of model vulnerabilities but also categorically changes how privacy risks are assessed.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (6)

Tweets

https://twitter.com/kenziyuliu/status/1785517801308066226

YouTube

Show All Videos