Inconsistency of evaluation metrics in link prediction (2402.08893v2)
Abstract: Link prediction is a paradigmatic and challenging problem in network science, which aims to predict missing links, future links and temporal links based on known topology. Along with the increasing number of link prediction algorithms, a critical yet previously ignored risk is that the evaluation metrics for algorithm performance are usually chosen at will. This paper implements extensive experiments on hundreds of real networks and 25 well-known algorithms, revealing significant inconsistency among evaluation metrics, namely different metrics probably produce remarkably different rankings of algorithms. Therefore, we conclude that any single metric cannot comprehensively or credibly evaluate algorithm performance. Further analysis suggests the usage of at least two metrics: one is the area under the receiver operating characteristic curve (AUC), and the other is one of the following three candidates, say the area under the precision-recall curve (AUPR), the area under the precision curve (AUC-Precision), and the normalized discounted cumulative gain (NDCG). In addition, as we have proved the essential equivalence of threshold-dependent metrics, if in a link prediction task, some specific thresholds are meaningful, we can consider any one threshold-dependent metric with those thresholds. This work completes a missing part in the landscape of link prediction, and provides a starting point toward a well-accepted criterion or standard to select proper evaluation metrics for link prediction.
- Baraba´´𝑎\acute{a}over´ start_ARG italic_a end_ARGsi A-L. 2016. \textcolorblueNetwork science. Cambridge University Press.
- Newman MEJ. 2018. \textcolorblueNetworks. Oxford University Press.
- Lu¨¨𝑢\ddot{u}over¨ start_ARG italic_u end_ARG L, Zhou T. 2011. \textcolorblueLink prediction in complex networks: a survey. Physica A. 390(6):1150-1170.
- Divakaran A, Mohan A. 2020. \textcolorblueTemporal link prediction: a survey. New Generation Computing. 38:213-258.
- Zhou T. 2021. \textcolorblueProgresses and challenges in link prediction. iScience. 24(11):103217.
- Chen C, Liu Y-Y. 2023. \textcolorblueA survey on hyperlink prediction. IEEE Transactions on Neural Networks and Learning Systems. (in press).
- Bi Y, Wang P. 2022. \textcolorblueExploring drought-responsive crucial genes in Sorghum. iScience. 25(11):105347.
- Peixoto TP. 2018. \textcolorblueReconstructing networks with unknown and heterogeneous errors. Physical Review X. 8(4):041011.
- Liben-Nowell D, Kleinberg J. 2007. \textcolorblueThe link-prediction problem for social networks. Journal of the American Society for Information Science and Technology. 58(7):1019-1031.
- Guimerà R, Sales-Pardo M. 2009. \textcolorblueMissing and spurious interactions and the reconstruction of complex networks. PNAS. 106(52):22073-22078.
- Liu W, Lu¨¨𝑢\ddot{u}over¨ start_ARG italic_u end_ARG L. 2010. \textcolorblueLink prediction based on local random walk. Europhysics Letters. 89(5):58007.
- Menon AK, Elkan C. 2011. \textcolorblueLink prediction via matrix factorization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer Press). pp.437-452.
- Zhang M, Chen Y. 2018. \textcolorblueLink prediction based on graph neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS Press). pp. 5171-5181.
- Hanely JA, McNeil BJ. 1982. \textcolorblueThe meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 143(1):29-36.
- Bradley AP. 1997. \textcolorblueThe use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition. 30(7):1145-1159.
- Zhou T. 2023. \textcolorblueDiscriminating abilities of threshold-free evaluation metrics in link prediction. Physica A. 615(1):128529.
- Davis J, Goadrich M. 2006. \textcolorblueThe relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ACM Press). pp. 233-240.
- Mattews BW. 1975. \textcolorblueComparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure. 405(2):442-451.
- Js¨¨𝑠\ddot{s}over¨ start_ARG italic_s end_ARGrvelin K, Keka¨¨𝑎\ddot{a}over¨ start_ARG italic_a end_ARGla¨¨𝑎\ddot{a}over¨ start_ARG italic_a end_ARGinen J. 2002. \textcolorblueCumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems. 20(4):422-446.
- Saito T, Rehmsmeier M. 2015. \textcolorblueThe precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE. 10(3):e0118432.
- Menand Nicolas, C. Seshadhri. 2024. \textcolorblueLink prediction using low-dimensional node embeddings: the measurement problem. PNAS. 121(8):e2312527121.
- Muscoloni A, Cannistraci CV. 2022. \textcolorblueEarly retrieval problem and link prediction evaluation via the area under the magnified ROC. Preprints: 2022090277.
- Muscoloni A, Cannistraci CV. 2023. \textcolorblue“Stealing fire or stacking knowledge” by machine intelligence to model link prediction in complex networks. iScience. 26(1):105697.
- Spearman C. 1987. \textcolorblueThe proof and measurement of association between two things. The American Journal of Psychology. 100(3/4):441-471.
- Lu¨¨𝑢\ddot{u}over¨ start_ARG italic_u end_ARG J, Wang P. 2020. \textcolorblueModeling and analysis of bio-molecular networks. Springer Singapore Press.
- Kendall MG. 1938. \textcolorblueA new measure of rank correlation. Biometrika. 30(1/2):81-93.
- Buckland M, Gey F. 1994. \textcolorblueThe relationship between precision and recall. Journal of the Association for Information Science and Technology. 45(1):12-19.
- Swets JA. 1988. \textcolorblueMeasuring the accuracy of diagnostic systems. Science. 240(4857): 1285-1293.
- Jones KS. 1972. \textcolorblueA statistical interpretation of term specificity and its application in retrieval. Journal of Documentation. 28(1): 11-21.
- Sasaki Y. 2007. \textcolorblue The truth of the F-measure. Teach Tutor Mater. 1: 1-5.
- Youden WJ. 1950. \textcolorblueIndex for rating diagnostic tests. Cancer. 3(1): 32-35.
- Salton G, McGill MJ. 1986. \textcolorblueIntroduction to modern information retrieval. McGraw-Hill Book Company Press.
- Kotnis B, Nastase V. 2017. \textcolorblueAnalysis of the Impact of Negative Sampling on Link Prediction in Knowledge Graphs. arXiv: 1708.06816.
- Newman MEJ. 2001. \textcolorblueClustering and preferential attachment in growing networks. Physical Review E. 64(2):025102.
- Adamic LA, Adar E. 2003. \textcolorblueFriends and neighbors on the web. Social Networks. 25(3):211-230.
- Jaccard P. 1901. \textcolorblueDistribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Socie´normal-´𝑒\acute{e}over´ start_ARG italic_e end_ARGte´normal-´𝑒\acute{e}over´ start_ARG italic_e end_ARG Vaudoise des Sciences Naturelles. 37:241-272.
- Liu W, Lu¨¨𝑢\ddot{u}over¨ start_ARG italic_u end_ARG L. 2010. \textcolorblueLink prediction based on local randomwalk. Europhysics Letters. 89(5):58007.
- Katz L. 1953. \textcolorblueA new status index derived from sociometric analysis. Psychometrika. 18:39-43.
- Sørensen T.A. 1948. \textcolorblueA method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter. 5:1-34.
- Chebotarev PY, Shamis EA. 1997. \textcolorblueA matrix-forest theorem and measuring relations in small social group. Avtomatika i Telemekhanika. 9:125-137.