Fame for sale: efficient detection of fake Twitter followers (1509.04098v2)

Published 14 Sep 2015 in cs.SI, cs.CR, and cs.LG

Abstract: $\textit{Fake followers}$ are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere - hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel $\textit{Class A}$ classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier. The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers.

PDF Abstract

Efficient Detection of Fake Twitter Followers

This paper presents a comprehensive paper on the identification of fake Twitter followers, addressing a critical issue in social media platforms that has implications for both user behavior and the broader technological and social contexts. The authors aim to delineate efficient methods for detecting fake Twitter followers by leveraging machine learning techniques, setting up a reliable baseline dataset, and identifying cost-effective features for robust classification.

Key Contributions and Findings

The authors' contributions span several dimensions:

Baseline Dataset Creation: A key foundation of this research is the creation of a publicly available baseline dataset composed of verified human accounts and fake followers purchased from online markets. This dataset forms a cornerstone for training and evaluating detection algorithms.
Feature Evaluation: The paper rigorously evaluates various features that have been proposed for detecting bots and spam on Twitter. The authors review both academic proposals and those found in grey literature, highlighting their relative effectiveness when applied to the problem of fake follower detection.
Cost-efficient Feature Selection: The paper stratifies features into classes based on the data acquisition cost, including profile, timeline, and relationship features. By analyzing the crawling cost, the researchers propose cost-effective classifiers that significantly reduce data collection overhead while maintaining high detection accuracy.
Class A Classifier Proposal: The development and validation of a "Class A" classifier that uses only profile-based features epitomizes the authors' approach to balancing efficiency and effectiveness. Despite avoiding costly timeline and relationship data, this classifier maintained high accuracy in detecting fake followers.

Strong Numerical Results and Methodological Rigor

The empirical validation of classifiers shows that using both Class B and C features achieves a detection accuracy above 95%, with MCC values indicating high correlation with ground-truth labels. Despite slight performance drops, Class A classifiers still maintain commendable accuracy and recall, demonstrating the viability of using profile-level features to detect fake accounts at a reduced computational cost. The results are underscored by an information fusion-based sensitivity analysis which ranks the global importance of features, highlighting the friends-to-followers ratio as a critical determinant of account authenticity.

Implications and Future Directions

This research has far-reaching implications for the development of scalable solutions in digital fraud detection. By fostering a better understanding of cost-effective feature engineering, the paper paves the way for deploying practical tools capable of processing vast amounts of data in real-time, essential for large-scale platforms like Twitter.

The paper's framework also suggests potential pathways for future developments in AI applied to social network analysis. The authors suggest continuous adaptation of feature sets to counter evolving strategies in fake follower generation. Moreover, integrating content-based features and behavioral analysis into existing models could further enhance detection capabilities.

Conclusion

Overall, this paper provides a structured methodology for detecting fake followers with a focus on efficiency and scalability. It builds an essential bridge between existing academic knowledge and practical applications within social media ecosystems. As the issue of fake engagement continues to challenge digital platforms, such sophisticated approaches are vital in maintaining the integrity and trustworthiness of online communities.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Stefano Cresci (40 papers)
Roberto Di Pietro (49 papers)
Marinella Petrocchi (34 papers)
Angelo Spognardi (20 papers)
Maurizio Tesconi (31 papers)

Citations (385)

View on Semantic Scholar