Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Browsing behavior exposes identities on the Web (2312.15489v2)

Published 24 Dec 2023 in cs.CY, cs.IR, cs.SI, physics.soc-ph, and stat.AP

Abstract: How easy is it to uniquely identify a person based solely on their web browsing behavior? Here we show that when people navigate the Web, their online traces produce fingerprints that identify them. Merely the four most visited web domains are enough to identify 95% of the individuals. These digital fingerprints are stable and render high re-identifiability. We demonstrate that we can re-identify 80% of the individuals in separate time slices of data. Such a privacy threat persists even with limited information about individuals' browsing behavior, reinforcing existing concerns around online privacy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Zuboff, S. Surveillance Capitalism and the Challenge of Collective Action. New Labor Forum 28, 10–29 (2019).
  2. Social media ethics in the data economy: Issues of social responsibility for using Facebook for public relations. Public Relations Review 46, 101980 (2020).
  3. “Reach the right people”: The politics of “interests” in Facebook’s classification system for ad targeting. Big Data & Society 8, 205395172199604 (2021).
  4. Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data. Proceedings of the International AAAI Conference on Web and Social Media 15, 327–338 (2021).
  5. Routines and the predictability of day-to-day web use. Media Psychology 26, 229–251 (2023).
  6. A large-scale characterization of how readers browse wikipedia. ACM Transactions on the Web 17, 1–22 (2023).
  7. Curious rhythms: Temporal regularities of wikipedia consumption. arXiv preprint arXiv:2305.09497 (2023). https://arxiv.org/abs/2305.09497.
  8. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).
  9. The predictability of consumer visitation patterns. Scientific Reports 3, 1645 (2013).
  10. The influence of the circadian and ultradian rhythms to human mobility: Empirical evidences from location-based check-ins. In First Northeast Regional Conference on Complex Systems (NERCCS) (2018).
  11. Pacheco, D. et al. Predictability states in human mobility. arXiv preprint arXiv:2201.01376 (2022). https://arxiv.org/abs/2201.01376.
  12. Technology, autonomy, and manipulation. Internet Policy Review 8 (2019).
  13. Blease, C. Open AI meets open notes: surveillance capitalism, patient privacy and online record access. Journal of Medical Ethics jme–2023–109574 (2023).
  14. Psychology and Surveillance Capitalism: The Risk of Pushing Mental Health Apps During the COVID-19 Pandemic. Journal of Humanistic Psychology 60, 611–625 (2020).
  15. Hormonal Health: Period Tracking Apps, Wellness, and Self-Management in the Era of Surveillance Capitalism. Engaging Science, Technology, and Society 7, 48–66 (2021).
  16. The digital commercialisation of US politics — 2020 and beyond. Internet Policy Review 8 (2019).
  17. Unique in the Crowd: The privacy bounds of human mobility. Scientific Reports 3, 1376 (2013).
  18. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 347, 536–539 (2015).
  19. Hiding in the Crowd: an Analysis of the Effectiveness of Browser Fingerprinting at Large Scale. In Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18, 309–318 (ACM Press, Lyon, France, 2018).
  20. Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns. In Proceedings of the 5th Workshop on Hot Topics in Privacy Enhancing Technologies (HotPETs 2012), 17 (Vigo, Spain, 2012).
  21. Replication: Why We Still Can’t Browse in Peace: On the Uniqueness and Reidentifiability of Web Browsing Histories. In Proceedings of the Sixteenth Symposium on Usable Privacy and Security, 16 (2020).
  22. Robust De-anonymization of Large Sparse Datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), 111–125 (IEEE, Oakland, CA, USA, 2008).
  23. Unique on Facebook: formulation and evidence of (nano)targeting individual users with non-PII data. In Proceedings of the 21st ACM Internet Measurement Conference, 464–479 (ACM, Virtual Event, 2021).
  24. De-anonymizing Web Browsing Data with Social Networks. In Proceedings of the 26th International Conference on World Wide Web, 1261–1269 (International World Wide Web Conferences Steering Committee, Perth Australia, 2017).
  25. We Are Social & Meltwater. Digital 2023 Global Overview Report. https://datareportal.com/reports/digital-2023-global-overview-report (2023). (retrieved on 09 Nov 2023).
  26. Norton. Norton Cyber Safety Insights Report. https://newsroom.gendigital.com/download/2023+NCSIR+US-Global+Report_FINAL.pdf (2023). (retrieved on 09 Nov 2023).
  27. On the Ubiquity of Web Tracking: Insights from a Billion-Page Web Crawl. Journal of Web Science 4, 53–66 (2018).
  28. BuiltWith. Google Adsense Usage Statistics. https://trends.builtwith.com/ads/Google-Adsense (2023). (retrieved on 09 Nov 2023).
  29. A web tracking data set of online browsing behavior of 2,148 users (2021). https://doi.org/10.5281/zenodo.4757574.
  30. Computer age statistical inference, student edition: algorithms, evidence, and data science, vol. 6 (Cambridge University Press, 2021).
Citations (1)

Summary

We haven't generated a summary for this paper yet.