HOFA: Twitter Bot Detection with Homophily-Oriented Augmentation and Frequency Adaptive Attention (2306.12870v1)
Abstract: Twitter bot detection has become an increasingly important and challenging task to combat online misinformation, facilitate social content moderation, and safeguard the integrity of social platforms. Though existing graph-based Twitter bot detection methods achieved state-of-the-art performance, they are all based on the homophily assumption, which assumes users with the same label are more likely to be connected, making it easy for Twitter bots to disguise themselves by following a large number of genuine users. To address this issue, we proposed HOFA, a novel graph-based Twitter bot detection framework that combats the heterophilous disguise challenge with a homophily-oriented graph augmentation module (Homo-Aug) and a frequency adaptive attention module (FaAt). Specifically, the Homo-Aug extracts user representations and computes a k-NN graph using an MLP and improves Twitter's homophily by injecting the k-NN graph. For the FaAt, we propose an attention mechanism that adaptively serves as a low-pass filter along a homophilic edge and a high-pass filter along a heterophilic edge, preventing user features from being over-smoothed by their neighborhood. We also introduce a weight guidance loss to guide the frequency adaptive attention module. Our experiments demonstrate that HOFA achieves state-of-the-art performance on three widely-acknowledged Twitter bot detection benchmarks, which significantly outperforms vanilla graph-based bot detection techniques and strong heterophilic baselines. Furthermore, extensive studies confirm the effectiveness of our Homo-Aug and FaAt module, and HOFA's ability to demystify the heterophilous disguise challenge.
- Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In international conference on machine learning. PMLR, 21–29.
- Detect me if you can: Spam bot detection using inductive representation learning. In Companion Proceedings of The 2019 World Wide Web Conference. 148–153.
- Online extremism and the communities that sustain it: Detecting the ISIS supporting community on Twitter. PloS one 12, 12 (2017), e0181405.
- Jonathon M Berger and Jonathon Morgan. 2015. The ISIS Twitter Census: Defining and describing the population of ISIS supporters on Twitter. (2015).
- David M Beskow and Kathleen M Carley. 2018. Bot-hunter: a tiered approach to detecting & characterizing automated activity on twitter. In Conference paper. SBP-BRiMS: International conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation, Vol. 3.
- Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 3950–3957.
- Detecting social bots by jointly modeling deep behavior and content information. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1995–1998.
- Feature set identification for detecting suspicious URLs using Bayesian classification in social networks. Information Sciences 289 (2014), 133–147.
- Stefano Cresci. 2020. A decade of social bot detection. Commun. ACM 63, 10 (2020), 72–83.
- Fame for sale: Efficient detection of fake Twitter followers. Decision Support Systems 80 (2015), 56–71.
- The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th international conference on world wide web companion. 963–972.
- Demystifying Misconceptions in Social Bots Research. arXiv preprint arXiv:2303.17251 (2023).
- Detecting Bots in Social-Networks Using Node and Structural Embeddings. (2022).
- Language Modeling on Location-Based Social Networks. ISPRS International Journal of Geo-Information 11, 2 (2022), 147.
- An integrated model for textual social media data with spatio-temporal dimensions. Information Processing & Management 57, 5 (2020), 102219.
- Data augmentation for deep graph learning: A survey. ACM SIGKDD Explorations Newsletter 24, 2 (2022), 61–77.
- Edits: Modeling and mitigating data bias for graph neural networks. In Proceedings of the ACM Web Conference 2022. 1259–1269.
- Are you human? Detecting bots on Twitter Using BERT. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 631–636.
- LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In Proceedings of the 34th annual computer security applications conference. 137–146.
- On nearest-neighbor graphs. Discrete & Computational Geometry 17 (1997), 263–282.
- From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. arXiv preprint arXiv:2305.08283 (2023).
- Heterogeneity-aware twitter bot detection with relational graph transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3977–3985.
- TwiBot-22: Towards graph-based Twitter bot detection. arXiv preprint arXiv:2206.04564 (2022).
- Satar: A self-supervised approach to twitter account representation learning and its application in bot detection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3808–3817.
- Twibot-20: A comprehensive twitter bot detection benchmark. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4485–4494.
- BotRGCN: Twitter bot detection with relational graph convolutional networks. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 236–239.
- Emilio Ferrara. 2017. Disinformation and social bot operations in the run up to the 2017 French presidential election. arXiv preprint arXiv:1707.00086 (2017).
- Emilio Ferrara. 2020. # covid-19 on twitter: Bots, conspiracies, and social media activism. arXiv preprint arXiv: 2004.09531 (2020).
- The rise of social bots. Commun. ACM 59, 7 (2016), 96–104.
- Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428 (2019).
- Selva Dilan GÖLBAŞI and Selma METİNTAS. 2020. Covid-19 pandemisi ve infodemi. ESTÜDAM Halk Sağlığı Dergisi 5 (2020), 126–137.
- Online Conspiracy Groups: Micro-Bloggers, Bots, and Coronavirus Conspiracy Talk on Twitter. American Sociological Review 87, 6 (2022), 919–949.
- Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
- Array programming with NumPy. Nature 585, 7825 (2020), 357–362.
- DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Social Network Analysis and Mining 12, 1 (2022), 43.
- Realformer: Transformer likes residual attention. arXiv preprint arXiv:2012.11747 (2020).
- Maryam Heidari and James H Jones. 2020. Using bert to extract topic-independent sentiment features for social media bot detection. In 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 0542–0547.
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- Jürgen Knauth. 2019. Language-agnostic twitter-bot detection. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 550–558.
- Sneha Kudugunta and Emilio Ferrara. 2018. Deep neural networks for bot detection. Information Sciences 467 (2018), 312–322.
- Seven months with the devils: A long-term study of content polluters on twitter. In Proceedings of the international AAAI conference on web and social media, Vol. 5. 185–192.
- Sangho Lee and Jong Kim. 2014. Early filtering of ephemeral malicious accounts on Twitter. Computer communications 54 (2014), 48–57.
- BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency. arXiv preprint arXiv:2208.08320 (2022).
- Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739 (2020).
- Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
- Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning. PMLR, 6565–6576.
- Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods. Advances in Neural Information Processing Systems 34 (2021), 20887–20902.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- BotMoE: Twitter Bot Detection with Community-Aware Mixtures of Modal-Specific Experts. arXiv:2304.06280 [cs.SI]
- Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30. Atlanta, Georgia, USA, 3.
- Graph-hist: Graph classification from latent feature histograms with application to bot detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 5134–5141.
- Rtbust: Exploiting temporal patterns for botnet detection on twitter. In Proceedings of the 10th ACM conference on web science. 183–192.
- Twitter spammer detection using data stream clustering. Information Sciences 260 (2014), 64–73.
- BotWalk: Efficient adaptive exploration of Twitter bot networks. In Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017. 467–474.
- StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020).
- Lynnette Hui Xian Ng and Kathleen M Carley. 2022. BotBuster: Multi-platform Bot Detection Using A Mixture of Experts. arXiv preprint arXiv:2207.13658 (2022).
- Netprobe: a fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th international conference on World Wide Web. 201–210.
- Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.
- Geom-gcn: Geometric graph convolutional networks. arXiv preprint arXiv:2002.05287 (2020).
- Automatic Differentiation In Pytorch. 2018. Pytorch.
- Mauricio Quezada and Barbara Poblete. 2019. A lightweight representation of news events on social media. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1049–1052.
- Detecting political bots on Twitter during the 2019 Finnish parliamentary election. (2020).
- Identifying and Characterizing New Expressions of Community Framing during Polarization. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 841–851.
- Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15. Springer, 593–607.
- MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark. arXiv preprint arXiv:2301.01123 (2023).
- Over-Sampling Strategy in Feature Space for Graphs based Class-imbalanced Bot Detection. arXiv preprint arXiv:2302.06900 (2023).
- Kate Starbird. 2019. Disinformation’s spread: bots, trolls and all of us. Nature 571, 7766 (2019), 449–450.
- BotPercent: Estimating Twitter bot populations from groups to crowds. arXiv preprint arXiv:2302.00381 (2023).
- The devil’s triangle: Ethical considerations on developing bot detection methods. In 2016 AAAI Spring Symposium Series.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
- Improving fairness in graph neural networks via mitigating sensitive attribute leakage. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1938–1948.
- Feng Wei and Uyen Trang Nguyen. 2019. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. In 2019 First IEEE International conference on trust, privacy and security in intelligent systems and applications (TPS-ISA). IEEE, 101–109.
- Representation learning on graphs with jumping knowledge networks. In International conference on machine learning. PMLR, 5453–5462.
- Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 1096–1103.
- Graph contrastive learning with augmentations. Advances in neural information processing systems 33 (2020), 5812–5823.
- Disinformation warfare: Understanding state-sponsored trolls on Twitter and their influence on the web. In Companion proceedings of the 2019 world wide web conference. 218–226.
- Graph neural networks with heterophily. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11168–11176.
- Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems 33 (2020), 7793–7804.
- Sen Ye (2 papers)
- Zhaoxuan Tan (35 papers)
- Zhenyu Lei (17 papers)
- Ruijie He (5 papers)
- Hongrui Wang (9 papers)
- Qinghua Zheng (56 papers)
- Minnan Luo (61 papers)