Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BotOrNot: A System to Evaluate Social Bots (1602.00975v1)

Published 2 Feb 2016 in cs.SI

Abstract: While most online social media accounts are controlled by humans, these platforms also host automated agents called social bots or sybil accounts. Recent literature reported on cases of social bots imitating humans to manipulate discussions, alter the popularity of users, pollute content and spread misinformation, and even perform terrorist propaganda and recruitment actions. Here we present BotOrNot, a publicly-available service that leverages more than one thousand features to evaluate the extent to which a Twitter account exhibits similarity to the known characteristics of social bots. Since its release in May 2014, BotOrNot has served over one million requests via our website and APIs.

BotOrNot: A System to Evaluate Social Bots

Introduction

The proliferation of automated agents, commonly referred to as social bots, on various social media platforms has raised substantial concerns regarding their impact on information dissemination, user interaction, and overall social media ecosystems. Social bots, or sybil accounts, are programmed to generate content and engage users indistinguishably from human-operated accounts. While some bots offer benign or entertaining interactions, others are employed for malicious purposes such as polluting content, manipulating public opinion, spreading misinformation, and conducting covert propaganda campaigns. This paper presents "BotOrNot," a systematic approach to assessing the likelihood that a Twitter account is automated or human-operated, leveraging a diverse set of features and machine learning techniques.

System Overview

BotOrNot is a publicly accessible service designed to evaluate Twitter accounts using a comprehensive set of features derived from user metadata, interaction patterns, and linguistic attributes. Since its inception in May 2014, BotOrNot has processed over one million requests, illustrating the substantial interest and utility of the platform in the social media research community.

Features and Classification

The classification system underpinning BotOrNot draws from over 1,000 features, which can be categorized into six primary groups:

  • Network Features: These capture various dimensions of information diffusion through analysis of retweeting and mentioning behaviors, as well as hashtag co-occurrences. Statistical measures such as degree distribution and clustering coefficients are extracted from these networks.
  • User Features: Derived from metadata directly associated with the account, such as language preferences, geographic location, and account creation details.
  • Friends Features: Focus on the descriptive statistics of an account's social connections, including follower/followee ratios and the distributions of their activity metrics.
  • Temporal Features: Examines the consistency and patterns of content generation, quantified through tweet rates and inter-tweet intervals.
  • Content Features: Utilize NLP techniques to analyze part-of-speech tagging and other linguistic properties within tweets.
  • Sentiment Features: Incorporate sentiment analysis algorithms tailored for general and Twitter-specific contexts, measuring emotional valence and sentiment polarity.

BotOrNot employs a Random Forest classifier, an ensemble learning method, to process the extensive feature set and output a bot-likelihood score. The classifier is trained on a dataset of approximately 15,000 confirmed social bots and 16,000 authentic human accounts, encompassing over 5.6 million tweets. The system achieved an AUC of 0.95 in ten-fold cross-validation, although actual performance may vary due to the evolving nature of social media and bot characteristics.

Practical Implications and Future Directions

BotOrNot significantly lowers the barrier to sophisticated bot detection for various stakeholders, including social media researchers, journalists, and the general public. By providing both an easy-to-use web interface and an accessible API, users can conduct comprehensive bot analysis without the need for complex classifiers or extensive computational resources.

Potential future developments include enhancing the classifier with newer datasets to account for the dynamic evolution of social bot behaviors. Additionally, integrating BotOrNot into various applications, such as browser plugins for on-the-fly user evaluations, could further broaden its utility and impact.

In conclusion, BotOrNot stands as a valuable tool for the systematic evaluation of social bots across Twitter, fostering a deeper understanding and enabling more effective management of automated accounts within social media ecosystems. This research underscores the importance of adaptive and robust methodologies in the ongoing effort to maintain the integrity of online social platforms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Clayton A. Davis (5 papers)
  2. Onur Varol (33 papers)
  3. Emilio Ferrara (197 papers)
  4. Alessandro Flammini (67 papers)
  5. Filippo Menczer (102 papers)
Citations (846)
X Twitter Logo Streamline Icon: https://streamlinehq.com