Link Prediction Accuracy on Real-World Networks Under Non-Uniform Missing Edge Patterns (2401.15140v3)
Abstract: Real-world network datasets are typically obtained in ways that fail to capture all edges. The patterns of missing data are often non-uniform as they reflect biases and other shortcomings of different data collection methods. Nevertheless, uniform missing data is a common assumption made when no additional information is available about the underlying missing-edge pattern, and link prediction methods are frequently tested against uniformly missing edges. To investigate the impact of different missing-edge patterns on link prediction accuracy, we employ 9 link prediction algorithms from 4 different families to analyze 20 different missing-edge patterns that we categorize into 5 groups. Our comparative simulation study, spanning 250 real-world network datasets from 6 different domains, provides a detailed picture of the significant variations in the performance of different link prediction algorithms in these different settings. With this study, we aim to provide a guide for future researchers to help them select a link prediction algorithm that is well suited to their sampled network data, considering the data collection process and application domain.
- Search in power-law networks. Physical review E, 64(4):046135, 2001.
- Sampling-based algorithm for link prediction in temporal networks. Information Sciences, 374:1–14, 2016.
- Online sampling of temporal networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(4):1–27, 2021.
- Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2):1–56, 2013.
- Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th international conference on World Wide Web, pages 835–844, 2007.
- Link prediction in schema-rich heterogeneous information network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 449–460. Springer, 2016.
- Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98–101, 2008.
- The colorado index of complex networks. Retrieved July, 20(2018):22, 2016.
- ”The Colorado Index of Complex Networks .”. https://icon.colorado.edu/, 2016.
- A survey on network embedding. IEEE transactions on knowledge and data engineering, 31(5):833–852, 2018.
- Community detection, link prediction, and layer interdependence in multilayer networks. Physical Review E, 95(4):042317, 2017.
- Metric convergence in social network sampling. In Proceedings of the 5th ACM workshop on HotPlanet, pages 45–50, 2013.
- A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20(1):18–36, 2004.
- Stacking models for nearly optimal link prediction in complex networks. Proceedings of the National Academy of Sciences, 117(38):23393–23400, 2020.
- Detectability thresholds and optimal algorithms for community structure in dynamic networks. Physical Review X, 6(3):031005, 2016.
- Walking in facebook: A case study of unbiased sampling of osns. In 2010 Proceedings IEEE Infocom, pages 1–9. Ieee, 2010.
- node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864, 2016.
- Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008.
- Modeling social networks with sampled or missing data. Center for Statistics in the Social Sciences, Univ. Washington. Available at http://www. csss. washington. edu/Papers, 2007.
- Sequential stacking link prediction algorithms for temporal networks. 2023.
- A multilayer approach to multiplexity and link prediction in online geo-social networks. EPJ Data Science, 5:1–17, 2016.
- Metropolis algorithms for representative subgraph sampling. In 2008 Eighth IEEE International Conference on Data Mining, pages 283–292. IEEE, 2008.
- Using modern methods for missing data analysis with the social relations model: A bridge to social network analysis. Social networks, 54:26–40, 2018.
- Protrec: A probability-based approach for recovering missing proteins based on biological networks. Journal of Proteomics, 250:104392, 2022.
- Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
- Reducing large internet topologies for faster simulations. In International Conference on Research in Networking, pages 328–341. Springer, 2005.
- Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935–20940, 2013.
- Link prediction techniques, applications, and performance: A survey. Physica A: Statistical Mechanics and its Applications, 553:124289, 2020.
- Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM SIGMETRICS Performance evaluation review, 40(1):319–330, 2012.
- Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 631–636, 2006.
- Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177–187, 2005.
- Deep dynamic network embedding for link prediction. IEEE Access, 6:29219–29230, 2018.
- Link prediction for egocentrically sampled networks. Journal of Computational and Graphical Statistics, pages 1–24, 2023.
- The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007.
- Link prediction in complex networks: A survey. Physica A, 390(6):1150–1170, 2011.
- Gökçen Eraslan Lucas Hu, Thomas Kipf. Link prediction experiments. 2018.
- A survey of link prediction in complex networks. ACM computing surveys (CSUR), 49(4):1–33, 2016.
- Farokh Marvasti. Nonuniform sampling: theory and practice. Springer Science & Business Media, 2012.
- Mark EJ Newman. Community detection in networks: Modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:1606.02319, 2016.
- Tiago P Peixoto. Parsimonious module inference in large networks. Physical review letters, 110(14):148701, 2013.
- Sampling social networks using shortest paths. Physica A: Statistical Mechanics and its Applications, 424:254–268, 2015.
- Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 390–403, 2010.
- Little Ball of Fur: a python library for graph sampling. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 3133–3140, 2020.
- Fast sequence-based embedding with diffusion graphs. In International Workshop on Complex Networks, pages 99–107. Springer, 2018.
- Subnets of scale-free networks are not scale-free: sampling properties of networks. Proceedings of the National Academy of Sciences, 102(12):4221–4224, 2005.
- Link prediction potentials for biological networks. International Journal of Data Mining and Bioinformatics, 20(2):161–184, 2018.
- A model for social networks. Physica A: Statistical Mechanics and its Applications, 371(2):851–860, 2006.
- Proximity networks and epidemics. Physica A, 378(1):68–75, 2007.
- Consistencies and inconsistencies between model selection and link prediction in networks. Physical Review E, 97(6):062316, 2018.
- Organized crime and trust:: On the conceptualization and empirical relevance of trust in the context of criminal networks. Global Crime, 6(2):159–184, 2004.
- David Bruce Wilson. Generating random spanning trees more quickly than the cover time. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 296–303, 1996.
- A balanced modularity maximization link prediction model in social networks. Information Processing & Management, 53(1):295–307, 2017.
- A novel multilayer model for missing link prediction and future link forecasting in dynamic complex networks. Physica A, 492:2166–2197, 2018.
- Wayne W Zachary. An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452–473, 1977.
- Tao Zhou. Progresses and challenges in link prediction. Iscience, 24(11):103217, 2021.
- Predicting missing links via local information. The European Physical Journal B, 71(4):623–630, 2009.
- Leveraging history for faster sampling of online social networks. arXiv preprint arXiv:1505.00079, 2015.