Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline (2306.00037v4)

Published 31 May 2023 in cs.SI, cs.AI, and cs.LG

Abstract: Twitter, as one of the most popular social networks, provides a platform for communication and online discourse. Unfortunately, it has also become a target for bots and fake accounts, resulting in the spread of false information and manipulation. This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development. Through this pipeline, we develop a comprehensive bot detection model named BotArtist, based on user profile features. SAMLP leverages nine distinct publicly available datasets to train the BotArtist model. To assess BotArtist's performance against current state-of-the-art solutions, we select 35 existing Twitter bot detection methods, each utilizing a diverse range of features. Our comparative evaluation of BotArtist and these existing methods, conducted across nine public datasets under standardized conditions, reveals that the proposed model outperforms existing solutions by almost 10%, in terms of F1-score, achieving an average score of 83.19 and 68.5 over specific and general approaches respectively. As a result of this research, we provide a dataset of the extracted features combined with BotArtist predictions over the 10.929.533 Twitter user profiles, collected via Twitter API during the 2022 Russo-Ukrainian War, over a 16-month period. This dataset was created in collaboration with [Shevtsov et al., 2022a] where the original authors share anonymized tweets on the discussion of the Russo-Ukrainian war with a total amount of 127.275.386 tweets. The combination of the existing text dataset and the provided labeled bot and human profiles will allow for the future development of a more advanced bot detection LLM in the post-Twitter API era.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Twitter bot detection with reduced feature set. In 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), 1–6. IEEE.
  2. Detect me if you can: Spam bot detection using inductive representation learning. In Companion proceedings of the 2019 world wide web conference, 148–153.
  3. The importance of debiasing social media data to better understand e-cigarette-related attitudes and behaviors. Journal of medical Internet research, 18(8): e219.
  4. E-cigarette surveillance with social media data: social bots, emerging topics, and trends. JMIR public health and surveillance, 3(4): e8641.
  5. Analyzing the digital traces of political manipulation: The 2016 Russian interference Twitter campaign. In 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), 258–265. IEEE.
  6. Bot-hunter: a tiered approach to detecting & characterizing automated activity on twitter. In Conference paper. SBP-BRiMS: International conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation, volume 3.
  7. Its all in a name: detecting and labeling bots by their name. Computational and mathematical organization theory, 25: 24–35.
  8. You are known by your friends: Leveraging network metrics for bot detection in twitter. Open Source Intelligence and Cyber Crime: Social Media Analytics, 53–88.
  9. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, 144–152.
  10. Junk news and bots during the French presidential election: What are French voters sharing over Twitter?
  11. Breiman, L. 2001. Random forests. Machine learning, 45: 5–32.
  12. Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. American journal of public health, 108(10): 1378–1384.
  13. Behavior enhanced deep bot detection in social media. In 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 128–130. IEEE.
  14. Detection of Bots and Cyborgs in Twitter: A study on the Chilean Presidential Election in 2017. In Social Computing and Social Media. Design, Human Behavior and Analytics: 11th International Conference, SCSM 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Orlando, FL, USA, July 26-31, 2019, Proceedings, Part I 21, 311–323. Springer.
  15. Debot: Twitter bot detection via warped correlation. In Icdm, volume 18, 28–65.
  16. Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4): 1–4.
  17. Cresci, S. 2020. A decade of social bot detection. Communications of the ACM, 63(10): 72–83.
  18. Fame for sale: Efficient detection of fake Twitter followers. Decision Support Systems, 80: 56–71.
  19. DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intelligent Systems, 31(5): 58–64.
  20. The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th international conference on world wide web companion, 963–972.
  21. Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing, 15(4): 561–576.
  22. FAKE: Evidence of spam and bot activity in stock microblogs on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, volume 12.
  23. Cashtag piggybacking: Uncovering spam and bot activity in stock microblogs on Twitter. ACM Transactions on the Web (TWEB), 13(2): 1–27.
  24. Detecting bots in social-networks using node and structural embeddings. Journal of Big Data, 10(1): 1–37.
  25. Are you human? Detecting bots on Twitter Using BERT. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 631–636. IEEE.
  26. LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In Proceedings of the 34th annual computer security applications conference, 137–146.
  27. Supervised machine learning bot detection techniques to identify social twitter bots. SMU Data Science Review, 1(2): 5.
  28. Heterogeneity-aware twitter bot detection with relational graph transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 3977–3985.
  29. TwiBot-22: Towards graph-based Twitter bot detection. Advances in Neural Information Processing Systems, 35: 35254–35269.
  30. Satar: A self-supervised approach to twitter account representation learning and its application in bot detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 3808–3817.
  31. Twibot-20: A comprehensive twitter bot detection benchmark. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4485–4494.
  32. BotRGCN: Twitter bot detection with relational graph convolutional networks. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 236–239.
  33. Political bots and the Swedish general election. In 2018 ieee international conference on intelligence and security informatics (isi), 124–129. IEEE.
  34. Ferrara, E. 2020. What types of COVID-19 conspiracies are populated by Twitter bots? arXiv preprint arXiv:2004.09531.
  35. Of bots and humans (on twitter). In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 349–354.
  36. Cross-platform state propaganda: Russian trolls on Twitter and YouTube during the 2016 US presidential election. The International Journal of Press/Politics, 25(3): 357–389.
  37. Social bots detection via fusing bert and graph convolutional networks. Symmetry, 14(1): 30.
  38. Dual graph enhanced embedding neural network for CTR prediction. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 496–504.
  39. DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Social Network Analysis and Mining, 12(1): 43.
  40. Bots and Automation over Twitter during the US Election. Computational propaganda project: Working paper series, 21(8).
  41. Heterogeneous graph transformer. In Proceedings of the web conference 2020, 2704–2710.
  42. Preprocessing framework for Twitter bot detection. In 2017 International conference on computer science and engineering (ubmk), 630–634. IEEE.
  43. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
  44. Knauth, J. 2019. Language-agnostic twitter-bot detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), 550–558.
  45. Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities. In Proceedings of the 12th International Conference on Management of Digital EcoSystems, 55–63.
  46. Deep neural networks for bot detection. Information Sciences, 467: 312–322.
  47. Seven months with the devils: A long-term study of content polluters on twitter. In Proceedings of the international AAAI conference on web and social media, volume 5, 185–192.
  48. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  49. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874.
  50. Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 1150–1160.
  51. Graph-hist: Graph classification from latent feature histograms with application to bot detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 5134–5141.
  52. Rtbust: Exploiting temporal patterns for botnet detection on twitter. In Proceedings of the 10th ACM conference on web science, 183–192.
  53. Twitter spammer detection using data stream clustering. Information Sciences, 260: 64–73.
  54. Friendship preference: Scalable and robust category of features for social bot detection. IEEE Transactions on Dependable and Secure Computing, 20(2): 1516–1528.
  55. A new approach to bot detection: striking the balance between precision and recall. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 533–540. IEEE.
  56. Junk news and bots during the German parliamentary election: What are German voters sharing over Twitter?
  57. Spotting political social bots in Twitter: A use case of the 2019 Spanish general election. IEEE Transactions on Network and Service Management, 17(4): 2156–2170.
  58. Bot2Vec: A general approach of intra-community oriented representation learning for bot detection in different types of social networks. Information Systems, 103: 101771.
  59. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1): 5485–5551.
  60. A one-class classification approach for bot detection on Twitter. Computers & Security, 91: 101715.
  61. Detecting political bots on Twitter during the 2019 Finnish parliamentary election. In Annual Hawaii International Conference on System Sciences, 2430–2439. Hawaii International Conference on System Sciences.
  62. An exploratory study of COVID-19 misinformation on Twitter. Online social networks and media, 22: 100104.
  63. Shapley, L. S. 1953. A value for n-person games. Contrib. Theory Games, 2: 307–317.
  64. Russo-Ukrainian War: Prediction and explanation of Twitter suspension. arXiv preprint arXiv:2306.03502.
  65. What Tweets and YouTube comments have in common? Sentiment and graph analysis on data related to US elections 2020. Plos one, 18(1): e0270542.
  66. Identification of Twitter Bots Based on an Explainable Machine Learning Framework: The US 2020 Elections Case Study. In Proceedings of the International AAAI Conference on Web and Social Media, volume 16, 956–967.
  67. Twitter. 2023. Twitter API price list. https://developer.twitter.com/en/products/twitter-api. Accessed: 2023-09-09.
  68. Online human-bot interactions: Detection, estimation, and characterization. In Proceedings of the international AAAI conference on web and social media, volume 11, 280–289.
  69. Graph attention networks. arXiv preprint arXiv:1710.10903.
  70. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. In 2019 First IEEE International conference on trust, privacy and security in intelligent systems and applications (TPS-ISA), 101–109. IEEE.
  71. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8): 1280–1293.
  72. Botometer 101: Social bot practicum for computational social scientists. Journal of Computational Social Science, 5(2): 1511–1528.
  73. The COVID-19 infodemic: twitter versus facebook. Big Data & Society, 8(1): 20539517211013861.
  74. Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies, 1(1): 48–61.
  75. Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 1096–1103.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alexander Shevtsov (5 papers)
  2. Despoina Antonakaki (8 papers)
  3. Ioannis Lamprou (8 papers)
  4. Polyvios Pratikakis (15 papers)
  5. Sotiris Ioannidis (41 papers)

Summary

We haven't generated a summary for this paper yet.