Link Prediction for Social Networks using Representation Learning and Heuristic-based Features (2403.08613v1)
Abstract: The exponential growth in scale and relevance of social networks enable them to provide expansive insights. Predicting missing links in social networks efficiently can help in various modern-day business applications ranging from generating recommendations to influence analysis. Several categories of solutions exist for the same. Here, we explore various feature extraction techniques to generate representations of nodes and edges in a social network that allow us to predict missing links. We compare the results of using ten feature extraction techniques categorized across Structural embeddings, Neighborhood-based embeddings, Graph Neural Networks, and Graph Heuristics, followed by modeling with ensemble classifiers and custom Neural Networks. Further, we propose combining heuristic-based features and learned representations that demonstrate improved performance for the link prediction task on social network datasets. Using this method to generate accurate recommendations for many applications is a matter of further study that appears very promising. The code for all the experiments has been made public.
- Watch your step: Learning node embeddings via graph attention. Advances in neural information processing systems, 31, 2018.
- Friends and neighbors on the web. Social networks, 25(3):211–230, 2003.
- Recommendation of users in social networks: A semantic and social based classification approach. Expert Systems, 38(2):e12634, 2021.
- Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
- A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9):1616–1637, 2018.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
- A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31(5):833–852, 2019.
- Graph-based features for supervised link prediction. pages 1237 – 1244, 09 2011.
- CSIRO’s Data61. Stellargraph machine learning library. https://github.com/stellargraph/stellargraph, 2018.
- Learning structural node embeddings via diffusion wavelets. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1320–1329, 2018.
- node2vec: Scalable feature learning for networks, 2016.
- Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017.
- Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017.
- Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, 1953.
- Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Attention models in graphs: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 13(6):1–25, 2019.
- Snap datasets: Stanford large network dataset collection, 2014.
- Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web, pages 641–650, 2010.
- Signed networks in social media. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 1361–1370, 2010.
- The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007.
- Mining social networks using heat diffusion processes for marketing candidates selection. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 233–242, 2008.
- Efficient estimation of word representations in vector space. In ICLR, 2013.
- The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
- DeepWalk. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, aug 2014.
- Don’t walk, skip! online learning of multi-scale network embeddings, 2016.
- Network embedding as matrix factorization. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, feb 2018.
- Trust management for the semantic web. In Proceedings of the Second International Conference on Semantic Web Conference, LNCS-ISWC’03, page 351–368, Berlin, Heidelberg, 2003. Springer-Verlag.
- Multi-scale attributed node embedding, 2019.
- Product recommendation and rating prediction based on multi-modal social networks. In Proceedings of the fifth ACM conference on Recommender systems, pages 61–68, 2011.
- Deep graph infomax, 2018.
- Mengjia Xu. Understanding graph embedding methods and their applications. SIAM Review, 63(4):825–853, 2021.
- Nodesketch: Highly-efficient graph embeddings via recursive sketching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, page 1162–1172, New York, NY, USA, 2019. Association for Computing Machinery.
- Graph neural networks: A review of methods and applications. AI Open, 1:57–81, 2020.