Knock Knock, Who's There? Membership Inference on Aggregate Location Data (1708.06145v2)

Published 21 Aug 2017 in cs.CR

Abstract: Aggregate location data is often used to support smart services and applications, e.g., generating live traffic maps or predicting visits to businesses. In this paper, we present the first study on the feasibility of membership inference attacks on aggregate location time-series. We introduce a game-based definition of the adversarial task, and cast it as a classification problem where machine learning can be used to distinguish whether or not a target user is part of the aggregates. We empirically evaluate the power of these attacks on both raw and differentially private aggregates using two mobility datasets. We find that membership inference is a serious privacy threat, and show how its effectiveness depends on the adversary's prior knowledge, the characteristics of the underlying location data, as well as the number of users and the timeframe on which aggregation is performed. Although differentially private mechanisms can indeed reduce the extent of the attacks, they also yield a significant loss in utility. Moreover, a strategic adversary mimicking the behavior of the defense mechanism can greatly limit the protection they provide. Overall, our work presents a novel methodology geared to evaluate membership inference on aggregate location data in real-world settings and can be used by providers to assess the quality of privacy protection before data release or by regulators to detect violations.

Authors (3)

Apostolos Pyrgelis (24 papers)
Carmela Troncoso (54 papers)
Emiliano De Cristofaro (117 papers)

Citations (253)

View on Semantic Scholar

Summary

The paper introduces a game-theoretic framework to model membership inference attacks as a classification problem, highlighting potential privacy risks.
It utilizes real-world datasets (TFL and SFC) to demonstrate that regular user movement patterns significantly enhance adversarial inference accuracy.
The study shows that while differential privacy can mitigate attack success, it often requires a trade-off between data utility and privacy protection.

Membership Inference on Aggregate Location Data

The paper "Knock Knock, Who's There? Membership Inference on Aggregate Location Data" conducts a comprehensive paper on the vulnerabilities of location datasets to membership inference attacks, particularly emphasizing aggregate location time-series. This critical examination draws on empirical analysis to establish how machine learning methods can enable adversaries to determine if specific users' data have been included in aggregated location datasets. The research pivots on creating a methodology equipped to evaluate privacy threats posed by these attacks across various settings and proposes a framework that effectively encapsulates these risks.

Key Contributions and Methodology

Several aspects stand out in this paper:

Game-Theoretic Model: The authors introduce a game-theoretic framework that models the membership inference attack as a classification problem, which allows the use of supervised machine learning models. This framework effectively mimics real-world scenarios an adversary might exploit, bearing prior knowledge and aiming to identify the presence of a target user's data within aggregate location datasets.
Real-World Datasets: The evaluation uses two real-world mobility datasets—Transport for London (TFL) and San Francisco Cabs (SFC)—to substantiate the feasibility of membership inference attacks. These datasets serve to highlight how properties such as regularity and user sparsity impact adversarial success.
Analysis of Adversarial Knowledge: Through various adversarial settings, the paper reveals that knowledge about a user's movements significantly influences the efficacy of membership inference. It demonstrates that adversaries with data on specific user locations or past memberships in groups can perform noticeably accurate inference attacks, leading to considerable privacy loss, especially in smaller aggregation groups.
Impact of Data Characteristics and Aggregation: The paper shows that data regularity (e.g., commuter behaviors in TFL) and aggregation size crucially determine the vulnerability to membership inference. Larger groups offer more robustness against these attacks, although the intrinsic regularity in TFL still poses significant risks, even in large group settings.
Differential Privacy Strategies: The second part of the paper critically evaluates defense mechanisms based on differential privacy (DP). The authors assess various differential privacy techniques and conclude that while these mechanisms can reduce the likelihood of successful membership inference, they often demand a trade-off with data utility, particularly in large groups.

Key Findings and Implications

The findings underline the substantial privacy risks associated with aggregate location data. Notably:

High Efficacy of Membership Inference: The results demonstrate alarming efficacy of membership inference attacks in specific settings and emphasize the role of adversarial prior knowledge. For example, attacks succeed with high accuracy and AUC scores when adversaries have past access or subset location information.
Insufficiency of Naive Aggregation as Protection: The paper clearly refutes the notion that data aggregation inherently guarantees privacy by highlighting plausible risks even in aggregated datasets, particularly those with high regularity.
Challenges in Privacy-Utility Trade-off: The differential privacy mechanisms evaluated reveal a nuanced trade-off between privacy and utility. While some DP mechanisms provide robustness against inference attacks, they often compromise the utility of released datasets. An adversary modeling the noise added by DP mechanisms can undermine the privacy benefit, indicating a pressing need for more sophisticated defense mechanisms.

Looking Forward

The paper heralds important considerations for developers and policymakers alike, especially as smart services increasingly rely on location analytics. Future research directions might explore adaptive and context-aware differential privacy techniques that align privacy requirements with utility demands without succumbing to adversarial exploits. Moreover, there's a scope to investigate novel adversarial strategies that could evolve beyond the current machine learning paradigms.

Conclusively, the authors successfully illuminate an under-researched but acutely relevant privacy challenge in data-driven ecosystems, providing groundwork for future innovations in secure location data utilization.

PDF Markdown