Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Unique and Traceable are Usernames? (1101.5578v3)

Published 28 Jan 2011 in cs.CR

Abstract: Suppose you find the same username on different online services, what is the probability that these usernames refer to the same physical person? This work addresses what appears to be a fairly simple question, which has many implications for anonymity and privacy on the Internet. One possible way of estimating this probability would be to look at the public information associated to the two accounts and try to match them. However, for most services, these information are chosen by the users themselves and are often very heterogeneous, possibly false and difficult to collect. Furthermore, several websites do not disclose any additional public information about users apart from their usernames (e.g., discus- sion forums or Blog comments), nonetheless, they might contain sensitive information about users. This paper explores the possibility of linking users profiles only by looking at their usernames. The intuition is that the probability that two usernames refer to the same physical person strongly depends on the "entropy" of the username string itself. Our experiments, based on crawls of real web services, show that a significant portion of the users' profiles can be linked using their usernames. To the best of our knowledge, this is the first time that usernames are considered as a source of information when profiling users on the Internet.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Daniele Perito (4 papers)
  2. Claude Castelluccia (23 papers)
  3. Mohamed Ali Kaafar (67 papers)
  4. Pere Manils (4 papers)
Citations (227)

Summary

  • The paper introduces an analytical model using Markov Chains and language modeling to quantify the uniqueness of usernames.
  • It validates the model with empirical data from services like eBay and Google, demonstrating that high-entropy usernames are more distinct.
  • The study highlights that linking usernames across platforms can compromise privacy, urging better countermeasures in digital identity protection.

Assessing the Uniqueness and Traceability of Usernames: Implications for Privacy

The paper "How Unique and Traceable are Usernames?" provides an extensive investigation into the challenges and methodologies pertinent to estimating the likelihood that identical usernames across different online services correspond to the same individual. This research touches on critical aspects of online anonymity and privacy by highlighting how usernames, often thought to be innocuous, can be pivotal identifiers linking disparate online profiles together.

Main Contributions

The paper makes several significant contributions:

  1. Introduction of a Novel Problem: The researchers outline the problem of linking online identities via usernames, a simpler yet underexplored vector for user profiling compared to techniques relying on social graphs or other user-specific information.
  2. Analytical Model for Username Uniqueness: The authors introduce an analytical model based on Markov Chains and LLMing techniques to estimate a username's uniqueness. They leverage the concept of Information Surprisal to infer the probability that a username is exclusive to a single individual within a population. This is a nuanced approach emphasizing the entropy associated with the username string.
  3. Username Linkage Across Services: Extending their model, the paper explores how to quantify the probability that two dissimilar usernames, belonging to distinct services, represent the same physical person. By using linguistic similarity metrics and probabilistic analysis, the paper addresses the robustness of tracking users even when they alter their usernames.
  4. Empirical Validation: The research is substantiated by extensive experimental data and analysis using real username datasets from services such as eBay and Google. These illustrate the method’s validity and demonstrate its effectiveness in real-world scenarios, thereby affirming that usernames can carry significant identifying information.

Technical Insights and Implications

The paper presents a clear quantification of username uniqueness using probabilistic models and emphasizes the role of entropy in assessing the likelihood of multiple individuals selecting the same username. High-entropy usernames, as expected, provide stronger assurance of uniqueness across the population. The implication here is substantial: services and users may underestimate the identifying power of usernames, leading to unintended privacy leaks.

Moreover, the work draws attention to potential privacy invasions by adversaries who might exploit username linkages to craft accurate user profiles. While malicious applications concern the researchers, they also underscore potential usages in cyber forensics, where tracking digital footprints could leverage username linkage in legitimate investigation contexts.

Future Directions

Potential future avenues for this research could delve into more sophisticated error-handling methodologies when collecting heterogeneous usernames from different platforms. Additionally, increasing adversarial awareness, improving countermeasure designs, and adopting techniques that mitigate such privacy risks on service providers' ends signify critical areas for further exploration. The development of stronger privacy-preserving systems addressing the risks highlighted by this research is imperative as username policies continue evolving.

Conclusion

This paper elucidates potential privacy risks associated with usernames and provides a methodological description of assessing username linkability. While emphasizing usernames as a reliable profiling source challenges prevailing assumptions about online anonymity, it catalyzes discussions on privacy policies, user awareness, and technological safeguards in the digital ecosystem. For researchers and practitioners in the security and privacy community, this work serves as a reminder of the evolving challenges in protecting user identities in an increasingly interconnected world.

X Twitter Logo Streamline Icon: https://streamlinehq.com