Role Similarity Metric Based on Spanning Rooted Forest (2110.07872v2)
Abstract: As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost. In this paper, we propose a new role similarity metric, namely \textsf{ForestSim}. We prove that \textsf{ForestSim} is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm, namely \textsf{ForestSimSearch}, which is able to process a top-k query in $O(k)$ time once the precomputation is finished. Moreover, we speed up the precomputation by using a fast approximate algorithm to compute the diagonal entries of the forest matrix, which reduces the time and space complexity of the precomputation to $O(\epsilon{-2}m\log5{n}\log{\frac{1}{\epsilon}})$ and $O(m\log3{n})$, respectively. Finally, we conduct extensive experiments on 26 real-world networks. The results show that \textsf{ForestSim} works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods.
- Knowledge and Information Systems, 11, 105–129.
- Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543.
- Proceedings of the 29th IEEE International Conference on Data Engineering, pp. 589–600. IEEE.
- Proceedings of the VLDB Endowment, 7, 13–24.
- Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 922–930.
- Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1445–1454.
- Information Sciences, 495, 37–51.
- Communications in Nonlinear Science and Numerical Simulation, 78, 104867.
- PLOS ONE, 11, e0157436.
- Physical Review E, 68, 015101.
- Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1231–1239.
- Cancer Informatics, 6, CIN.S680.
- Bioinformatics, 31, 1632–1639.
- Proceedings of the sixth ACM International Conference on Web Search and Data Mining, pp. 667–676.
- Proceedings of the seventh International AAAI Conference on Web and Social Media. The AAAI Press.
- IEEE Transactions on Knowledge and Data Engineering, 19, 355–369.
- IEEE Access, 6, 36420–36427.
- Proceedings of the fourth ACM Conference on Recommender Systems, pp. 183–190.
- Data Mining and Knowledge Discovery, 30, 147–180.
- Physica A: Statistical Mechanics and its Applications, 389, 2849–2857.
- Physica A: Statistical Mechanics and its Applications, 492, 1958–1966.
- Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 653–658.
- Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663–671.
- ACM Transactions on Knowledge Discovery from Data, 8, 1–37.
- Proceedings of the 16th International Conference on Web-Age Information Management, pp. 123–134. Springer.
- Data Science and Engineering, 4, 76–92.
- Software Foundations for Data Interoperability and Large Scale Graph Data Analytics, pp. 33–48. Springer.
- Proceedings of the 36th IEEE International Conference on Data Engineering, pp. 1950–1953. IEEE.
- Proceedings of the 28th IEEE International Conference on Data Engineering, pp. 774–785. IEEE.
- Proceedings of the VLDB Endowment, 11.
- Proceedings of the VLDB Endowment, 4, 992–1003.
- IEEE Transactions on Dependable and Secure Computing, 16, 344–357.
- ACM Transactions on Information Systems, 36, 1–30.
- IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1268–1274.
- Proceedings of the 19th IEEE International Conference on Data Mining, pp. 339–348. IEEE.
- Proceedings of the 2021 SIAM International Conference on Data Mining, pp. 136–144. SIAM.
- Chebotarev, P. (2008) Spanning forests and the golden ratio. Discrete Applied Mathematics, 156, 813–821.
- Journal of Chemical Information and Computer Sciences, 21, 196–204.
- Merris, R. (1997) Doubly stochastic graph matrices. Publikacije Elektrotehničkog Fakulteta. Serija Matematika, 1, 64–71.
- Zhang, X.-D. (2011) Vertex degrees and doubly stochastic graph matrices. Journal of Graph Theory, 66, 104–114.
- IEEE Transactions on Cybernetics , ?, 1–11.
- Proceedings of the 19th IEEE International Conference on Data Mining, pp. 478–487. IEEE.
- Journal of Systems and Software, 82, 772–788.
- Contemporary Mathematics, 26, 1.
- Achlioptas, D. (2003) Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66, 671–687.
- Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, pp. 573–582. IEEE.
- Kunegis, J. (2013) Konect: the koblenz network collection. Proceedings of the 22th World Wide Web Conference, pp. 1343–1350. ACM.
- Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 4292–4293. AAAI.
- Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 807–816.
- Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394.
- Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1320–1329.