Studying User Footprints in Different Online Social Networks (1301.6870v1)

Published 29 Jan 2013 in cs.SI

Abstract: With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, LinkedIn, Twitter and YouTube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users' online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, FriendFeed, and Profilactic; we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler similarity, Wordnet based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and LinkedIn. In this paper, we present the analysis and results from applying automated classifiers for disambiguating profiles belonging to the same user from different social networks. UserID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.

View on arXiv

Authors (5)

Anshu Malhotra (1 paper)
Luam Totti (1 paper)
Wagner Meira Jr. (27 papers)
Ponnurangam Kumaraguru (129 papers)
Virgilio Almeida (14 papers)

Citations (229)

View on Semantic Scholar

Summary

Analyzing User Footprints Across Diverse Online Social Networks

The paper "Studying User Footprints in Different Online Social Networks" investigates the challenges and methodologies for identifying and correlating user profiles across multiple social networking platforms. As users increasingly engage with diverse platforms such as Facebook, LinkedIn, Twitter, and YouTube, the potential to unify these profiles into a singular digital footprint arises. This effort is critical for areas such as personalization, profile management, and especially in addressing privacy and security concerns inherent in publicly available data.

Methodology

The authors devised a novel approach to identify and link user profiles across different social networks, employing a dataset compiled via Social Graph API and two social aggregators. The paper focuses predominantly on Twitter and LinkedIn, two metadata-rich platforms, and utilizes several similar profile attributes, such as username, display name, and location, to create a composite digital footprint.

To ascertain whether different profiles belong to the same user, the authors employed automated classifiers which generate similarity vectors using context-specific metrics including Jaro-Winkler for string comparisons and WordNet-based ontologies for semantic analysis. The system evaluated the discriminative power of various features like UserID, display name, location, and profile image, identifying UserID and Name as the most potent discriminators.

Results

The classifiers achieved notable success in identifying user profiles with a demonstrated accuracy of 98%, accompanied by precision and recall rates of 99% and 96% respectively. These figures underscore the efficacy of automated classification systems combined with sophisticated similarity metrics in cross-network user disambiguation. Moreover, the paper conducted an evaluation reflecting real-world application, where the correct user profile was among the top three results in 75% of retrievals.

Implications and Future Directions

This research offers substantial implications. It illustrates the feasibility of harnessing digital footprints for profile management and enhanced security measures across social networks. However, the potential for privacy violations, such as identity theft and profile cloning, necessitates a cautious approach when integrating and linking user data.

Theoretically, this work contributes to the understanding of identity unification across disparate platforms, proposing a scalable model that leverages public data without requiring cross-platform standardization or user authentication. Practically, it provides a framework that could be generalized to include additional social networks, given adaptations for missing or proprietary data.

Future developments may encompass expanding the attribute set leveraged for profile connection, incorporating additional social platforms, and enhancing adaptability to discrepancies in data availability and user settings. The exploration of advanced machine learning methodologies may further refine the precision and scalability of user profile disambiguation systems, holding promise for robust applications in identity management and online security domains.

In conclusion, this research makes significant strides in tackling the complexities of digital footprints, offering a systematic approach to the nuanced challenges of profile disambiguation across online social networks. The findings present a foundational step towards more integrated, secure, and user-centric social networking ecosystems.

PDF Markdown

Related Papers

Find Related Papers