Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Link Prediction Accuracy on Real-World Networks Under Non-Uniform Missing Edge Patterns (2401.15140v3)

Published 26 Jan 2024 in math.DS and cs.SI

Abstract: Real-world network datasets are typically obtained in ways that fail to capture all edges. The patterns of missing data are often non-uniform as they reflect biases and other shortcomings of different data collection methods. Nevertheless, uniform missing data is a common assumption made when no additional information is available about the underlying missing-edge pattern, and link prediction methods are frequently tested against uniformly missing edges. To investigate the impact of different missing-edge patterns on link prediction accuracy, we employ 9 link prediction algorithms from 4 different families to analyze 20 different missing-edge patterns that we categorize into 5 groups. Our comparative simulation study, spanning 250 real-world network datasets from 6 different domains, provides a detailed picture of the significant variations in the performance of different link prediction algorithms in these different settings. With this study, we aim to provide a guide for future researchers to help them select a link prediction algorithm that is well suited to their sampled network data, considering the data collection process and application domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Search in power-law networks. Physical review E, 64(4):046135, 2001.
  2. Sampling-based algorithm for link prediction in temporal networks. Information Sciences, 374:1–14, 2016.
  3. Online sampling of temporal networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(4):1–27, 2021.
  4. Network sampling: From static to streaming graphs. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2):1–56, 2013.
  5. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th international conference on World Wide Web, pages 835–844, 2007.
  6. Link prediction in schema-rich heterogeneous information network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 449–460. Springer, 2016.
  7. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98–101, 2008.
  8. The colorado index of complex networks. Retrieved July, 20(2018):22, 2016.
  9. ”The Colorado Index of Complex Networks .”. https://icon.colorado.edu/, 2016.
  10. A survey on network embedding. IEEE transactions on knowledge and data engineering, 31(5):833–852, 2018.
  11. Community detection, link prediction, and layer interdependence in multilayer networks. Physical Review E, 95(4):042317, 2017.
  12. Metric convergence in social network sampling. In Proceedings of the 5th ACM workshop on HotPlanet, pages 45–50, 2013.
  13. A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20(1):18–36, 2004.
  14. Stacking models for nearly optimal link prediction in complex networks. Proceedings of the National Academy of Sciences, 117(38):23393–23400, 2020.
  15. Detectability thresholds and optimal algorithms for community structure in dynamic networks. Physical Review X, 6(3):031005, 2016.
  16. Walking in facebook: A case study of unbiased sampling of osns. In 2010 Proceedings IEEE Infocom, pages 1–9. Ieee, 2010.
  17. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855–864, 2016.
  18. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008.
  19. Modeling social networks with sampled or missing data. Center for Statistics in the Social Sciences, Univ. Washington. Available at http://www. csss. washington. edu/Papers, 2007.
  20. Sequential stacking link prediction algorithms for temporal networks. 2023.
  21. A multilayer approach to multiplexity and link prediction in online geo-social networks. EPJ Data Science, 5:1–17, 2016.
  22. Metropolis algorithms for representative subgraph sampling. In 2008 Eighth IEEE International Conference on Data Mining, pages 283–292. IEEE, 2008.
  23. Using modern methods for missing data analysis with the social relations model: A bridge to social network analysis. Social networks, 54:26–40, 2018.
  24. Protrec: A probability-based approach for recovering missing proteins based on biological networks. Journal of Proteomics, 250:104392, 2022.
  25. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
  26. Reducing large internet topologies for faster simulations. In International Conference on Research in Networking, pages 328–341. Springer, 2005.
  27. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935–20940, 2013.
  28. Link prediction techniques, applications, and performance: A survey. Physica A: Statistical Mechanics and its Applications, 553:124289, 2020.
  29. Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM SIGMETRICS Performance evaluation review, 40(1):319–330, 2012.
  30. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 631–636, 2006.
  31. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177–187, 2005.
  32. Deep dynamic network embedding for link prediction. IEEE Access, 6:29219–29230, 2018.
  33. Link prediction for egocentrically sampled networks. Journal of Computational and Graphical Statistics, pages 1–24, 2023.
  34. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007.
  35. Link prediction in complex networks: A survey. Physica A, 390(6):1150–1170, 2011.
  36. Gökçen Eraslan Lucas Hu, Thomas Kipf. Link prediction experiments. 2018.
  37. A survey of link prediction in complex networks. ACM computing surveys (CSUR), 49(4):1–33, 2016.
  38. Farokh Marvasti. Nonuniform sampling: theory and practice. Springer Science & Business Media, 2012.
  39. Mark EJ Newman. Community detection in networks: Modularity optimization and maximum likelihood are equivalent. arXiv preprint arXiv:1606.02319, 2016.
  40. Tiago P Peixoto. Parsimonious module inference in large networks. Physical review letters, 110(14):148701, 2013.
  41. Sampling social networks using shortest paths. Physica A: Statistical Mechanics and its Applications, 424:254–268, 2015.
  42. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 390–403, 2010.
  43. Little Ball of Fur: a python library for graph sampling. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 3133–3140, 2020.
  44. Fast sequence-based embedding with diffusion graphs. In International Workshop on Complex Networks, pages 99–107. Springer, 2018.
  45. Subnets of scale-free networks are not scale-free: sampling properties of networks. Proceedings of the National Academy of Sciences, 102(12):4221–4224, 2005.
  46. Link prediction potentials for biological networks. International Journal of Data Mining and Bioinformatics, 20(2):161–184, 2018.
  47. A model for social networks. Physica A: Statistical Mechanics and its Applications, 371(2):851–860, 2006.
  48. Proximity networks and epidemics. Physica A, 378(1):68–75, 2007.
  49. Consistencies and inconsistencies between model selection and link prediction in networks. Physical Review E, 97(6):062316, 2018.
  50. Organized crime and trust:: On the conceptualization and empirical relevance of trust in the context of criminal networks. Global Crime, 6(2):159–184, 2004.
  51. David Bruce Wilson. Generating random spanning trees more quickly than the cover time. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 296–303, 1996.
  52. A balanced modularity maximization link prediction model in social networks. Information Processing & Management, 53(1):295–307, 2017.
  53. A novel multilayer model for missing link prediction and future link forecasting in dynamic complex networks. Physica A, 492:2166–2197, 2018.
  54. Wayne W Zachary. An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452–473, 1977.
  55. Tao Zhou. Progresses and challenges in link prediction. Iscience, 24(11):103217, 2021.
  56. Predicting missing links via local information. The European Physical Journal B, 71(4):623–630, 2009.
  57. Leveraging history for faster sampling of online social networks. arXiv preprint arXiv:1505.00079, 2015.

Summary

We haven't generated a summary for this paper yet.