The maximum capability of a topological feature in link prediction (2206.15101v3)
Abstract: Networks offer a powerful approach to modeling complex systems by representing the underlying set of pairwise interactions. Link prediction is the task that predicts links of a network that are not directly visible, with profound applications in biological, social, and other complex systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a feature can be leveraged to infer missing links. Here, we aim to unveil the capability of a topological feature in link prediction by identifying its prediction performance upper bound. We introduce a theoretical framework that is compatible with different indexes to gauge the feature, different prediction approaches to utilize the feature, and different metrics to quantify the prediction performance. The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links. Because a family of indexes based on the same feature shares the same upper bound, the potential of all others can be estimated from one single index. Furthermore, a feature's capability is lifted in the supervised prediction, which can be mathematically quantified, allowing us to estimate the benefit of applying machine learning algorithms. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks. The findings have applications in feature and method selection, and shed light on network characteristics that make a topological feature effective in link prediction.
- Barabási AL. 2016. Network science. Cambridge: Cambridge University Press.
- Newman MEJ. 2018. Networks. Oxford: Oxford University Press.
- Nature. 453: 98–101.
- Guimerà R, Sales-Pardo M. 2009. Missing and spurious interactions and the reconstruction of complex networks. Proc Natl Acad Sci U S A. 106: 22073–22078.
- Guimerà R. 2020. One model to rule them all in network science? Proc Natl Acad Sci U S A. 117: 25195–25197.
- Nat Commun. 13: 3043.
- Science. 297: 1551–1555.
- Barzel B, Barabási AL. 2013. Network link prediction by global silencing of indirect correlations. Nat Biotechnol. 31: 720–725.
- Proc Natl Acad Sci U S A. 115: E4304–E4311.
- Proc Natl Acad Sci U S A. 118: e2025581118.
- Science. 322: 104–110.
- Nat Commun. 10: 1240.
- Proc Natl Acad Sci U S A. 118: e2102141118.
- Proc Natl Acad Sci U S A. 113: 14207–14212.
- Sci Adv. 2: e1600028.
- Nat Mach Intell. 4: 246–257.
- Chaos. 29: 103102.
- Inf Sci. 495: 37–51.
- ACM Comput Surv. 49: 69.
- Physica A. 553: 124289.
- Zhou T. 2021. Progresses and challenges in link prediction. Iscience. 24: 103217.
- Phys Rep. 1017: 1–96.
- Phys Rep. 948: 1–148.
- Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. 2003. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New York (NY): Association for Computing Machinery. p. 556–559.
- Lü L, Zhou T. 2011. Link prediction in complex networks: a survey. Physica A, 390: 1150–1170.
- Lee YL, Zhou T. 2021. Collaborative filtering approach to link prediction. Physica A. 578: 126107.
- Proc Natl Acad Sci U S A. 115: E11221–E11230.
- Proc Natl Acad Sci U S A. 117: 23393–23400.
- Expert Syst Appl. 165: 113896.
- Chaos Solit Fractals. 145: 110769.
- World Wide Web. 25: 2487–2513.
- R Soc Open Sci. 4: 160863.
- Proc Natl Acad Sci U S A. 112: 2325–2330.
- Nat Commun. 11: 574.
- Natl Sci Rev. 7: 929–937.
- Broido AD, Clauset A. 2019. Scale-free networks are rare. Nat Commun. 10: 1017.
- Adamic LA, Adar E. 2003. Friends and neighbors on the web. Soc Networks. 25: 211–230.
- Eur Phys J B. 71: 623–630.
- Phys Rev E. 73: 026120.
- Phys Rev E. 80: 046122.
- Chin Phys B. 31: 068902.
- Piscataway (NJ): Institute of Electrical and Electronics Engineers. p. 66–71.
- J Phys Complex. 3: 015006.
- New York (NY): Association for Computing Machinery. p. 138–143.
- Chaos. 29: 061103.
- BioRxiv.
- Preprints.
- Muscoloni A, Cannistraci CV. 2023. “Stealing fire or stacking knowledge” by machine intelligence to model link prediction in complex networks. Iscience. 26: 105697.
- Physica A. 564: 125532.
- Mathematics. 11: 3023.
- Lichtnwalter R, Chawla NV. 2012. Link prediction: fair and effective evaluation. In: Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Piscataway (NJ): Institute of Electrical and Electronics Engineers. p. 376–383.
- Muscoloni A, Cannistraci CV. 2022. Early retrieval problem and link prediction evaluation via the area under the magnified ROC. Preprints.
- Zhou T. 2023. Discriminating abilities of threshold-free evaluation metrics in link prediction. Physica A. 615: 128529.
- New York (NY): Association for Computing Machinery. p. 1100–1108.
- Mahapatra S, Sahu SS. 2021. Improved prediction of protein-protein interaction using a hybrid of functional-link siamese neural network and gradient boosting machines. Brief Bioinform. 22: bbab255.
- Knowl Based Syst. 203: 106168.
- Eur Phys J B. 85: 3.
- Physica A. 454: 24–33.
- Yang J, Zhang X D. 2016. Predicting missing links in complex networks based on common neighbors and distance. Sci Rep. 6: 38208.
- Physica A. 539: 122950.
- Sci Rep. 10: 364.
- Physica A. 616: 128546.
- Sci Rep. 3: 1613.
- Bergstra J, Bengio Y. 2012. Random search for hyper-parameter optimization. J Mach Learn Res. 13: 281–305.
- Knowl Inf Syst. 45: 751–782.
- Phys Rev E. 101: 052318.
- Chaos Solit Fractals. 151: 111230.
- Fawcett T. 2006. An introduction to roc analysis. Pattern Recognit Lett. 27: 861–874.