Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data (1702.06270v2)

Published 21 Feb 2017 in cs.CY and cs.CR

Abstract: Human mobility data has been ubiquitously collected through cellular networks and mobile applications, and publicly released for academic research and commercial purposes for the last decade. Since releasing individual's mobility records usually gives rise to privacy issues, datasets owners tend to only publish aggregated mobility data, such as the number of users covered by a cellular tower at a specific timestamp, which is believed to be sufficient for preserving users' privacy. However, in this paper, we argue and prove that even publishing aggregated mobility data could lead to privacy breach in individuals' trajectories. We develop an attack system that is able to exploit the uniqueness and regularity of human mobility to recover individual's trajectories from the aggregated mobility data without any prior knowledge. By conducting experiments on two real-world datasets collected from both mobile application and cellular network, we reveal that the attack system is able to recover users' trajectories with accuracy about 73%~91% at the scale of tens of thousands to hundreds of thousands users, which indicates severe privacy leakage in such datasets. Through the investigation on aggregated mobility data, our work recognizes a novel privacy problem in publishing statistic data, which appeals for immediate attentions from both academy and industry.

Citations (170)

Summary

  • The paper demonstrates that aggregated mobility data can be exploited to reconstruct individual trajectories with recovery accuracies up to 91%.
  • The authors employ a novel attack system that leverages human mobility regularity to infer trajectories even from anonymized datasets.
  • The study highlights significant privacy risks and calls for advanced privacy-preserving techniques beyond conventional data aggregation.

Evaluation of Privacy Risks in Aggregated Mobility Data

The research paper titled "Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data" addresses the critical issue of privacy breaches in aggregated mobility datasets, challenging the common assumption that user anonymity is maintained through data aggregation. The authors present a compelling argument, supported by experimental evidence, that even aggregated data, devoid of individual identifiers, can lead to significant privacy infringements by enabling the reconstruction of user trajectories.

The paper is grounded in the pervasive collection and public release of human mobility data by cellular networks and mobile applications, which are shared for academic and commercial applications. Traditionally, data aggregators assumed that privacy was preserved by focusing on aggregated data—such as cellular tower occupancy—rather than individual trajectories. The authors, however, demonstrate that this method is flawed, as aggregated data can be exploited to reconstruct individual trajectories.

A novel attack system is developed that captures the regularity and uniqueness inherent in human mobility patterns without requiring prior dataset knowledge. This attack system was tested on two large-scale real-world datasets, demonstrating its capability to recover individual trajectories with accuracy rates between 73% and 91%. This reconstruction is possible due to the predictability of human movements (e.g., commuting patterns) and the distinctive nature of individual mobility sequences.

The research highlights several critical findings:

  • Privacy Vulnerability: Aggregated mobility data does not inherently preserve user privacy, as trajectory recovery is achievable with high accuracy.
  • Influence of Dataset Characteristics: Key dataset attributes, such as spatial and temporal resolution, and the scale of data influence the degree of privacy leakage. Contrary to intuition, lower spatiotemporal resolution increased recovery accuracy, while dataset scale impacted accuracy but not trajectory uniqueness.
  • Robustness of Attack Model: The attack model effectively reconstructed trajectories across varying dataset settings, indicating severe and universally applicable privacy concerns.

The implications of this paper are multifaceted. Practically, it challenges current data sharing practices and encourages reconsideration of privacy-preserving techniques. Theoretically, it invokes broader discussion on privacy models, emphasizing the need for robust mechanisms beyond conventional anonymization and aggregation methods. Recognizing the underlying problem—individual uniqueness and regularity in datasets—could potentially extend into other datasets with similar characteristics.

Future avenues for research could explore more sophisticated privacy-preserving techniques such as differential privacy or enhanced perturbation methods adapted for mobility datasets. Additionally, examining privacy risks associated with other aggregated data types may unveil universal patterns applicable across different data types, necessitating adaptations of current privacy frameworks.

In summary, this paper reveals fundamental flaws in assuming privacy through aggregation, calling for immediate action to address privacy concerns in aggregated datasets. Researchers and industry leaders must heed these findings to safeguard the privacy of individuals in the age of pervasive data collection and sharing.