Username Squatting on Online Social Networks: A Study on X (2401.09209v2)
Abstract: Adversaries have been targeting unique identifiers to launch typo-squatting, mobile app squatting and even voice squatting attacks. Anecdotal evidence suggest that online social networks (OSNs) are also plagued with accounts that use similar usernames. This can be confusing to users but can also be exploited by adversaries. However, to date no study characterizes this problem on OSNs. In this work, we define the username squatting problem and design the first multi-faceted measurement study to characterize it on X. We develop a username generation tool (UsernameCrazy) to help us analyze hundreds of thousands of username variants derived from celebrity accounts. Our study reveals that thousands of squatted usernames have been suspended by X, while tens of thousands that still exist on the network are likely bots. Out of these, a large number share similar profile pictures and profile names to the original account signalling impersonation attempts. We found that squatted accounts are being mentioned by mistake in tweets hundreds of thousands of times and are even being prioritized in searches by the network's search recommendation algorithm exacerbating the negative impact squatted accounts can have in OSNs. We use our insights and take the first step to address this issue by designing a framework (SQUAD) that combines UsernameCrazy with a new classifier to efficiently detect suspicious squatted accounts. Our evaluation of SQUAD's prototype implementation shows that it can achieve 94% F1-score when trained on a small dataset.
- “2022 Strengthened Code of Practice on Disinformation” Accessed: 2022-12-01, https://digital-strategy.ec.europa.eu/en/library/2022-strengthened-code-practice-disinformation, 2022
- Josh Aas “Let’s Encrypt: The CA’s Role in Fighting Phishing and Malware” Accessed: 2023-06-01, 2015 URL: https://letsencrypt.org/2015/10/29/phishing-and-malware.html
- “Recognizing human behaviours in online social networks” In Comput. Secur. 74, 2018, pp. 355–370
- Saeideh Bakhshi, David A Shamma and Eric Gilbert “Faces engage us: Photos with faces attract more likes and comments on instagram” In Proceedings of the SIGCHI conference on human factors in computing systems, 2014, pp. 965–974
- Christoph Besel, Juan Echeverria and Shi Zhou “Full Cycle Analysis of a Large-Scale Botnet Attack on Twitter” In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2018, pp. 170–177 DOI: 10.1109/ASONAM.2018.8508708
- “All your contacts are belong to us: Automated identity theft attacks on social networks” In Proceedings of the 18th International Conference on World Wide Web, 2009, pp. 551–560 DOI: 10.1145/1526709.1526784
- Steven Bird, Ewan Klein and Edward Loper “Natural language processing with Python: analyzing text with the natural language toolkit” ” O’Reilly Media, Inc.”, 2009
- Social Blade “Top 100 Most Followed Twitter Accounts” Accessed: 2023-12-01, https://socialblade.com/twitter/top/100
- Thomas Bohm “Letter and symbol misrecognition in highly legible typefaces for general, children, dyslexic, visually impaired and ageing readers” In Information Design Journal 21, 2014 DOI: 10.1075/idj.21.1.05boh
- “Simultaneously Removing Noise and Selecting Relevant Features for High Dimensional Noisy Data” In 2008 Seventh International Conference on Machine Learning and Applications, 2008, pp. 147–152 DOI: 10.1109/ICMLA.2008.87
- “VGGFace2: A Dataset for Recognising Faces across Pose and Age” In 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), 2018, pp. 67–74 DOI: 10.1109/FG.2018.00020
- Nikan Chavoshi, Hossein Hamooni and Abdullah Mueen “Temporal Patterns in Bot Activities”, WWW ’17 Companion Perth, Australia: International World Wide Web Conferences Steering Committee, 2017, pp. 1601–1606 DOI: 10.1145/3041021.3051114
- “SMOTE: Synthetic Minority Over-sampling Technique” In J. Artif. Intell. Res. (JAIR) 16, 2002, pp. 321–357 DOI: 10.1613/jair.953
- Xue-wen Chen and Jong Cheol Jeong “Enhanced recursive feature elimination” In Sixth International Conference on Machine Learning and Applications (ICMLA 2007), 2007, pp. 429–435 DOI: 10.1109/ICMLA.2007.35
- Graham Cluley “How Twitter users can fake a verified account” Accessed: 2023-04-01, https://nakedsecurity.sophos.com/2013/01/17/twitter-fake-verified-account/, 2013
- “Support-vector networks” In Machine learning 20.3 Springer, 1995, pp. 273–297
- David R Cox “The regression analysis of binary sequences” In Journal of the Royal Statistical Society: Series B (Methodological) 20.2 Wiley Online Library, 1958, pp. 215–232
- “Large-Scale Analysis of Pop-Up Scam on Typosquatting URLs” In Proceedings of the 14th International Conference on Availability, Reliability and Security, ARES ’19 Canterbury, CA, United Kingdom: Association for Computing Machinery, 2019 DOI: 10.1145/3339252.3340332
- Fred J. Damerau “A Technique for Computer Detection and Correction of Spelling Errors” In Commun. ACM 7.3 New York, NY, USA: Association for Computing Machinery, 1964, pp. 171–176 DOI: 10.1145/363958.363994
- “SybilInfer: Detecting Sybil Nodes using Social Networks” In NDSS, 2009
- Ashish Dangwal “‘No Weapon Sales To Israel’: How A Lockheed Martin ‘Tweet’ Resulted In A Loss Of Billions Of Dollars To US Defense Giant” Accessed: 2022-12-01, https://eurasiantimes.com/no-weapons-sales-to-israel-how-a-lockheed-martin-tweet-resulted/, 2022
- “The Relationship between Precision-Recall and ROC Curves” In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06 Pittsburgh, Pennsylvania, USA: Association for Computing Machinery, 2006, pp. 233–240 DOI: 10.1145/1143844.1143874
- Rocco De Nicola, Marinella Petrocchi and Manuel Pratelli “On the efficacy of old features for the detection of new bots” In Information Processing & Management 58, 2021, pp. 102685 DOI: 10.1016/j.ipm.2021.102685
- AnHai Doan, Alon Halevy and Zachary Ives “4 - String Matching” In Principles of Data Integration Boston: Morgan Kaufmann, 2012, pp. 95–119 DOI: https://doi.org/10.1016/B978-0-12-416044-6.00004-1
- “Elon Musk said Twitter has seen a ‘massive drop in revenue’ as more brands pause ads” Accessed: 2022-12-01, https://www.edition.cnn.com/2022/11/04/tech/twitter-advertisers/index.html, 2022
- Ahmed ElAzab “Fake accounts detection in twitter based on minimum weighted feature” In World, 2016
- “Elon Musk’s Twitter lays off employees across the company” Accessed: 2022-12-01, https://edition.cnn.com/2022/11/03/tech/twitter-layoffs/index.html, 2022
- External Data Source “dnstwist” IMPACT, 2018 DOI: 10.23721/100/1504360
- “Facebook parent company Meta will lay off 11,000 employees” Accessed: 2022-12-01, https://edition.cnn.com/2022/11/09/tech/meta-facebook-layoffs/index.html, 2022
- “Federal Trade Commission” Accessed: 2023-12-01, https://www.ftc.gov/
- “Hyperparameter Optimization” In Automated Machine Learning: Methods, Systems, Challenges Cham: Springer International Publishing, 2019, pp. 3–33 DOI: 10.1007/978-3-030-05318-5˙1
- “Combating the evolving spammers in online social networks” In Computers & Security 72, 2017 DOI: 10.1016/j.cose.2017.08.014
- Allison Gatlin “Eli Lilly Dives After Fake Twitter Account Promises Free Insulin; Takes Novo Nordisk, Sanofi With It” Accessed: 2022-12-01, https://www.investors.com/news/technology/lly-stock-dives-taking-novo-sanofi-with-it-after-fake-twitter-account-promises-free-insulin/, 2022
- Priscila A. Gimenes, Norton T. Roman and Ariadne M.B.R. Carvalho “Spelling Error Patterns in Brazilian Portuguese” In Computational Linguistics 41.1, 2015, pp. 175–183 DOI: 10.1162/COLI˙a˙00216
- Oana Goga, Giridhari Venkatadri and Krishna P Gummadi “The doppelgänger bot attack: Exploring identity impersonation in online social networks” In Proceedings of the 2015 internet measurement conference, 2015, pp. 141–153
- “@spam: the underground on 140 characters or less” In CCS ’10, 2010
- “Introduction to artificial neural networks” In European journal of gastroenterology & hepatology 19, 2008, pp. 1046–54 DOI: 10.1097/MEG.0b013e3282f198a0
- Nuno Guimaraes, Alvaro Figueira and Luis Torgo “Knowledge-Based Reliability Metrics for Social Media Accounts”, 2020 DOI: 10.5220/0010140403390350
- Drew Harwell “A fake tweet sparked panic at Eli Lilly and may have cost Twitter millions” Accessed: 2022-12-01, https://www.washingtonpost.com/technology/2022/11/14/twitter-fake-eli-lilly/, 2022
- Tin Kam Ho “Random decision forests” In Proceedings of 3rd international conference on document analysis and recognition 1, 1995, pp. 278–282 IEEE
- Kris Holt “How to spot a fake verified Twitter account” Accessed: 2023-04-01, https://www.dailydot.com/unclick/how-to-spot-fake-verified-twitter/, 2013
- “Squeeze-and-Excitation Networks”, 2019 arXiv:1709.01507 [cs.CV]
- “Mobile App Squatting”, 2020, pp. 1727–1738 DOI: 10.1145/3366423.3380243
- “BotSlayer: real-time detection of bot amplification on Twitter” In Journal of Open Source Software 4, 2019, pp. 1706 DOI: 10.21105/joss.01706
- “Information Commissioner’s Office” Accessed: 2023-12-01, https://ico.org.uk/
- “Internet Archive: Wayback Machine” Accessed: 2021-12-5, https://archive.org/web/
- ItalianPostNews “Twitter, from Apple to Tesla the fake tweets with the “blue check” that have become memes” Accessed: 2022-12-01, https://www.italianpost.news/twitter-from-apple-to-tesla-the-fake-tweets-with-the-blue-check-that-have-become-memes/, 2022
- Lei Jin, Daniel Takabi and James Joshi “Towards Active Detection of Identity Clone Attacks on Online Social Networks” In CODASPY’11 - Proceedings of the 1st ACM Conference on Data and Application Security and Privacy, 2011, pp. 27–38 DOI: 10.1145/1943513.1943520
- “Kaggle Bots dataset” Accessed: 2021-10-14, https://www.kaggle.com/vikasg/russian-troll-tweets, 2017
- “Kaggle Fake Account Dataset” Accessed: 2021-10-14, https://www.kaggle.com/bitandatom/social-network-fake-account-dataset
- “Kaggle Popular Accounts Dataset” Accessed: 2021-09-30, https://www.kaggle.com/parulpandey/100-mostfollowed-twitter-accounts-as-of-dec2019
- “Kaggle spammer dataset” Accessed: 2021-10-14, https://www.kaggle.com/free4ever1/instagram-fake-spammer-genuine-accounts
- SE Kelly, I Bourgeault and R Dingwall “The SAGE handbook of qualitative methods in health research” In R. ingwall R. De Vries & I. Bourgeault (Eds.), London: Sage, 2010
- “Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse”, 2017 DOI: 10.1145/3133956.3134002
- “Detecting social network profile cloning”, 2011, pp. 295–300 DOI: 10.1109/PERCOMW.2011.5766886
- “Skill Squatting Attacks on Amazon Alexa” In 27th USENIX Security Symposium (USENIX Security 18) Baltimore, MD: USENIX Association, 2018, pp. 33–47 URL: https://www.usenix.org/conference/usenixsecurity18/presentation/kumar
- APSS Lab “SQUAD” In GitHub repository GitHub, https://github.com/APSS-Imperial/SQUAD, 2023
- APSS Lab “SQUAD Framework” Google, https://sites.google.com/view/squad-framework/home, 2023
- Kyumin Lee, James Caverlee and Steve Webb “Uncovering Social Spammers: Social Honeypots + Machine Learning”, SIGIR ’10 Geneva, Switzerland: Association for Computing Machinery, 2010, pp. 435–442 DOI: 10.1145/1835449.1835522
- “WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream” In Dependable and Secure Computing, IEEE Transactions on 10, 2013, pp. 183–195 DOI: 10.1109/TDSC.2013.3
- Let’s Encrypt “Let’s Encrypt — Free SSL/TLS Certificates” Accessed: 2023-06-01, https://letsencrypt.org, 2017
- Steven Loria “textblob Documentation” In Release 0.15 2, 2018
- Michal Majka “naivebayes: High Performance Implementation of the Naive Bayes Algorithm in R” R package version 0.9.7, 2019 URL: https://CRAN.R-project.org/package=naivebayes
- “Why allowing profile name reuse is a bad idea”, 2016, pp. 1–6 DOI: 10.1145/2905760.2905762
- “What’s in a Name? Understanding Profile Name Reuse on Twitter” In Proceedings of the 26th International Conference on World Wide Web, 2017, pp. 1161–1170
- Mary McHugh “Interrater reliability: The kappa statistic” In Biochemia medica : časopis Hrvatskoga društva medicinskih biokemičara / HDMB 22, 2012, pp. 276–82 DOI: 10.11613/BM.2012.031
- Antonio Mucherino, Petraq J. Papajorgji and Panos M. Pardalos “k-Nearest Neighbor Classification” In Data Mining in Agriculture New York, NY: Springer New York, 2009, pp. 83–106 DOI: 10.1007/978-0-387-88615-2˙4
- “Soundsquatting: Uncovering the Use of Homophones in Domain Squatting”, 2014, pp. 291–308 DOI: 10.1007/978-3-319-13257-0˙17
- “Oberlo” Accessed: 2023-11-01, https://www.oberlo.com/blog/twitter-statistics, 2023
- “On Profiling Bots in Social Media”, 2016 DOI: 10.1007/978-3-319-47880-7
- “Purposeful sampling for qualitative data collection and analysis in mixed method implementation research” In Administration and policy in mental health and mental health services research 42 Springer, 2015, pp. 533–544
- “Scikit-learn: Machine Learning in Python” In Journal of Machine Learning Research 12, 2011, pp. 2825–2830
- Jeffrey Pennington, Richard Socher and Christopher Manning “GloVe: Global Vectors for Word Representation” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar: Association for Computational Linguistics, 2014, pp. 1532–1543 DOI: 10.3115/v1/D14-1162
- “The False positive problem of automatic bot detection in social science research” In PLoS ONE 15, 2020 DOI: 10.1371/journal.pone.0241045
- “Musk Says Apple Cutting Twitter Ads—Here Are The Other Companies Rethinking Their Ties” Accessed: 2022-12-01, https://www.forbes.com/sites/nicholasreimann/2022/11/28/musk-says-apple-cutting-twitter-ads-here-are-the-other-companies-rethinking-their-ties/?sh=5efc41b77032, 2022
- Tech Report “The Top 50 Most Popular Followed X / Twitter Accounts” Accessed: 2023-12-01, https://techreport.com/statistics/top-most-followed-x-twitter-accounts/
- Ellen Riloff, Siddharth Patwardhan and Janyce Wiebe “Feature subsumption for opinion analysis” In Proceedings of the 2006 conference on empirical methods in natural language processing, 2006, pp. 440–448
- “Detection of Novel Social Bots by Ensembles of Specialized Classifiers” In Proceedings of the 29th ACM International Conference on Information & Knowledge Management ACM, 2020 DOI: 10.1145/3340531.3412698
- Sivanesh Seelan, K. Kavin and A. Hassan “Frustrate Twitter from automation: How far a user can be trusted?” In 2013 International Conference on Human Computer Interactions, ICHCI 2013, 2013, pp. 1–5 DOI: 10.1109/ICHCI-IEEE.2013.6887787
- “The spread of fake news by social bots”, 2017
- Gaurav Sood “virustotal: R Client for the virustotal API” R package version 0.2.2, 2021
- Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna “Detecting Spammers on Social Networks” In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC New York, NY, USA: ACM, 2010, pp. 1–9 DOI: 10.1145/1920261.1920263
- Sysomos.com “Inside Twitter: An In-Depth Look Inside the Twitter World” Accessed: 2023-06-01, https://www.key4biz.it/files/000270/00027033.pdf
- “Email Typosquatting”, 2017 DOI: 10.1145/3131365.3131399
- “The Long ”Taile” of Typosquatting Domain Names”, 2014
- “Characterizing Social Bots Spreading Financial Disinformation”, 2020, pp. 376–392 DOI: 10.1007/978-3-030-49570-1˙26
- “Facing Reciprocity: How Photos and Avatars Promote Interaction in Micro-communities” In Group Decision and Negotiation 32.2 Springer, 2023, pp. 435–467
- “The Digital Services Act package” Accessed: 2022-12-01, https://digital-strategy.ec.europa.eu/en/policies/digital-services-act-package, 2022
- “Design and Evaluation of a Real-Time URL Spam Filtering Service” In Proceedings - IEEE Symposium on Security and Privacy, 2011, pp. 447–462 DOI: 10.1109/SP.2011.25
- Twitter “About Twitter Blue” Accessed: 2022-12-01, https://help.twitter.com/en/using-twitter/twitter-blue, 2022
- Twitter “Academic Research Access Deprecated.” Accessed: 2023-03-30, https://twitter.com/TwitterDev/status/1641222788911624192
- “Twitter - Bug Boundy Program” Accessed: 2023-03-25, https://hackerone.com/twitter?type=team
- “Twitter - Country Settings” Accessed: 2021-11-21, https://help.twitter.com/en/managing-your-account/how-to-change-country-settings
- “Twitter - Country Withheld Content” Accessed: 2021-11-21, https://help.twitter.com/en/rules-and-policies/tweet-withheld-by-country
- “Twitter - Rules” Accessed: 2023-04-03, https://help.twitter.com/en/safety-and-security/report-twitter-impersonation
- “Twitter - Rules” Accessed: 2023-04-03, https://help.twitter.com/en/rules-and-policies/twitter-rules.html
- “Twitter - Rules and Policies” Accessed: 2021-11-19, https://help.twitter.com/en/rules-and-policies/notices-on-twitter
- “Twitter - Suspension Rules” Accessed: 2021-11-19, https://blog.twitter.com/en_us/topics/company/2020/suspension
- “Twitter Academic API” Accessed: 2021-11-05, https://developer.twitter.com/en/products/twitter-api/academic-research
- “Twitter Badge” Accessed: 2021-10-16, https://help.twitter.com/en/managing-your-account/about-twitter-verified-accounts
- “Twitter bots” Accessed: 2021-10-27, https://blog.twitter.com/en_us/topics/company/2020/bot-or-not
- “Twitter Get-Users” Accessed: 2021-10-16, https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/follow-search-get-users/api-reference/get-users-search, 2021
- “Twitter Policies” Accessed: 2023-05-10, https://help.twitter.com/en/rules-and-policies/twitter-impersonation-and-deceptive-identities-policy, 2023
- “Twitter Tweet-Lookup” Accessed: 2021-12-01, https://github.com/twitterdev/Twitter-API-v2-sample-code/tree/main/Tweet-Lookup
- “Twitter User Gender Classification” Accessed: 2023-06-01, https://www.kaggle.com/datasets/crowdflower/twitter-user-gender-classification, 2016
- “Twitter User-Lookup” Accessed: 2021-10-16, https://github.com/twitterdev/Twitter-API-v2-sample-code/blob/main/User-Lookup
- “Twitter Username Policy” Accessed: 2021-10-01, https://help.twitter.com/en/managing-your-account/twitter-username-rules
- “URLCrazy” Accessed: 2021-10-01, https://morningstarsecurity.com/research/urlcrazy
- Jordan Valinsky “Elon Musk rebrands Twitter as X” Accessed: 2023-07-24, https://edition.cnn.com/2023/07/24/tech/twitter-rebrands-x-elon-musk-hnk-intl/index.html, 2023
- Alex Wang “Don’t Follow Me - Spam Detection in Twitter.”, 2010, pp. 142–151 DOI: 10.7312/wang15140-003
- Jess Weatherbed “Elon Musk says Twitter will begin manually authenticating Blue, Grey, and Gold accounts as soon as next week” Accessed: 2022-12-01, https://www.theverge.com/2022/11/25/23477550/twitter-manual-verification-blue-checkmark-gold-grey, 2022
- “Wikipedia Feature Scaling” Accessed: 2021-12-27, https://en.wikipedia.org/wiki/Feature_scaling
- “Wikipedia Levenshtein Distance” Accessed: 2021-10-18, https://en.wikipedia.org/wiki/Levenshtein_distance
- Matthew L Williams, Pete Burnap and Luke Sloan “Towards an ethical framework for publishing Twitter data in social research: Taking into account users’ views, online context and algorithmic estimation” In Sociology 51.6 Sage Publications Sage UK: London, England, 2017, pp. 1149–1168
- “Top 10 algorithms in data mining” In Knowledge and information systems 14.1 Springer, 2008, pp. 1–37
- “Deep Entity Classification: Abusive Account Detection for Online Social Networks” In 30th USENIX Security Symposium (USENIX Security 21) USENIX Association, 2021, pp. 4097–4114 URL: https://www.usenix.org/conference/usenixsecurity21/presentation/xu-teng
- Chao Yang, Robert Harkreader and Guofei Gu “Die Free or Live Hard? Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers” In Information Forensics and Security, IEEE Transactions on 8, 2011, pp. 318–337 DOI: 10.1109/TIFS.2013.2267732
- Morteza Yousefi Kharaji, Fatemeh Salehi Rizi and Mohammad Khayyambashi “A New Approach for Finding Cloned Profiles in Online Social Networks” In ACEEE International Journal on Network Security, 2014
- Koosha Zarei, Reza Farahbakhsh and Noel Crespi “Deep Dive on Politician Impersonating Accounts in Social Media”, 2019 DOI: 10.1109/ISCC47284.2019.8969645
- “Impersonation on Social Media: A Deep Neural Approach to Identify Ingenuine Content”, 2020 DOI: 10.1109/ASONAM49781.2020.9381437
- Chao Michael Zhang and Vern Paxson “Detecting and Analyzing Automated Activity on Twitter” In PAM, 2011
- “Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems” In 2019 IEEE Symposium on Security and Privacy (SP), 2019, pp. 1381–1396 IEEE
- “Detecting spammers on social Networks” In Neurocomputing 42, 2015 DOI: 10.1016/j.neucom.2015.02.047