SoMeR: Multi-View User Representation Learning for Social Media (2405.05275v1)
Abstract: User representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations have widespread applications in recommendation systems and advertising; however, existing methods typically rely on specific features like text content, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address this limitation, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text content, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of timestamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through two applications: 1) Identifying inauthentic accounts involved in coordinated influence operations by detecting users posting similar content simultaneously, and 2) Measuring increased polarization in online discussions after major events by quantifying how users with different beliefs moved farther apart in the embedding space. SoMeR's ability to holistically model users enables new solutions to important problems around disinformation, societal tensions, and online behavior understanding.
- H. AlMahmoud and S. AlKhalifa. Tsim: a system for discovering similar users on twitter. Journal of Big Data, 5(1):39, 2018.
- Who, what, when, and where: Multi-dimensional collaborative recommendations using tensor factorization on sparse user-generated data. In Proceedings of the 24th international conference on world wide web, pages 130–140, 2015.
- P. R. Center. The partisan divide on political values grows even wider. Trust, facts and democracy, 2017.
- # roeoverturned: Twitter dataset on the abortion rights controversy. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, pages 997–1005, 2023.
- Learning transferable user representations with sequential behaviors via contrastive pre-training. In 2021 IEEE International Conference on Data Mining (ICDM), pages 51–60, 2021. 10.1109/ICDM51629.2021.00015.
- Political polarization on twitter. In Proceedings of the international aaai conference on web and social media, volume 5, pages 89–96, 2011.
- A contextual framework to find similarity between users on twitter. In Proceedings of Second Doctoral Symposium on Computational Intelligence: DoSCI 2021, pages 793–805. Springer, 2022.
- You shall know a user by the company it keeps: Dynamic representations for social media users in nlp. arXiv preprint arXiv:1909.00412, 2019.
- Echo chambers: Emotional contagion and group polarization on facebook. Scientific reports, 6(1):37825, 2016.
- Why do liberals drink lattes? American Journal of Sociology, 120(5):1473–1511, 2015.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247, 2017.
- user2vec: Social media user representation based on distributed document embeddings. In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), pages 1–5, 2019. 10.1109/IDAP.2019.8875952.
- User representation learning for social networks: An empirical study. Applied Sciences, 11(12):5489, 2021.
- X. He and T.-S. Chua. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 355–364, 2017.
- The event-driven nature of online political hostility: How offline political events make online interactions more hostile. PNAS nexus, 2(11):pgad382, 2023.
- Towards universal sequence representation learning for recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 585–593, 2022.
- Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 2333–2338, 2013.
- S. Iyengar and M. Krupenkin. The strengthening of partisan affect. Political Psychology, 39:201–218, 2018.
- Political polarization drives online conversations about covid-19 in the united states. Human Behavior and Emerging Technologies, 2(3):200–211, 2020.
- Insights from the long-tail: Learning latent representations of online user behavior in the presence of skew and sparsity. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 297–306, 2018.
- S. Li and H. Zhao. A survey on representation learning for user modeling. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 4997–5003, 2021.
- Unmasking the web of deceit: Uncovering coordinated activity to expose information operations on twitter. arXiv preprint arXiv:2310.09884, 2023.
- Birds of a feather: Homophily in social networks. Annual review of sociology, 27(1):415–444, 2001.
- Awarp: Fast warping distance for sparse time series. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 350–359. IEEE, 2016.
- A language framework for modeling social media account behavior. EPJ Data Science, 12(1):33, 2023.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Uncovering coordinated networks on social media: methods and case studies. In Proceedings of the international AAAI conference on web and social media, volume 15, pages 455–466, 2021.
- S. Pan and T. Ding. Social media-based user embedding: A literature review. arXiv preprint arXiv:1907.00725, 2019.
- Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710, 2014.
- Leveraging intra-user and inter-user representation learning for automated hate speech detection. arXiv preprint arXiv:1804.03124, 2018.
- Tracking a year of polarized twitter discourse on abortion. arXiv preprint arXiv:2311.16831, 2023.
- N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Characterizing and detecting hateful users on twitter. In Proceedings of the International AAAI Conference on Web and Social Media, volume 12, 2018.
- Trollmagnifier: Detecting state-sponsored troll accounts on reddit. In 2022 IEEE symposium on security and privacy (SP), pages 2161–2175. IEEE, 2022.
- Suicide ideation detection via social and temporal user representations using hyperbolic learning. In K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2176–2190, Online, June 2021. Association for Computational Linguistics. 10.18653/v1/2021.naacl-main.176.
- Scaling law for recommendation models: Towards general-purpose user representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 4596–4604, 2023.
- S. Tipirneni and C. K. Reddy. Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(6):1–17, 2022.
- L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- I. Waller and A. Anderson. Quantifying social organization and political polarization in online platforms. Nature, 600(7888):264–268, 2021.
- Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1235–1244, 2015.
- Community preserving network embedding. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Time series data augmentation for deep learning: A survey. arXiv preprint arXiv:2002.12478, 2020.
- One person, one model, one world: Learning continual user representation without forgetting. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 696–705, 2021.
- A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 2114–2124, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383325. 10.1145/3447548.3467401.
- Representation learning via dual-autoencoder for recommendation. Neural Networks, 90:83–89, 2017.