Efficient Enumeration of Large Maximal k-Plexes (2402.13008v3)
Abstract: Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjacent to all but at most $k$ vertices. In this paper, we propose a fast branch-and-bound algorithm as well as its task-based parallel version to enumerate all maximal $k$-plexes with at least $q$ vertices. Our algorithm adopts an effective search space partitioning approach that provides a lower time complexity, a new pivot vertex selection method that reduces candidate vertex size, an effective upper-bounding technique to prune useless branches, and three novel pruning techniques by vertex pairs. Our parallel algorithm uses a timeout mechanism to eliminate straggler tasks, and maximizes cache locality while ensuring load balancing. Extensive experiments show that compared with the state-of-the-art algorithms, our sequential and parallel algorithms enumerate large maximal $k$-plexes with up to $5 \times$ and $18.9 \times$ speedup, respectively. Ablation results also demonstrate that our pruning techniques bring up to $7 \times$ speedup compared with our basic algorithm.
- Laboratory for Web Algorithmics (LAW). https://law.di.unimi.it/datasets.php.
- Online Appendices. https://github.com/chengqihao/Maximal-kPlex/blob/main/OnlineAppendix.pdf.
- SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
- An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4(1):2, 2003.
- Clique relaxations in social network analysis: The maximum k-plex problem. Oper. Res., 59(1):133–142, 2011.
- V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. CoRR, cs.DS/0310049, 2003.
- Efficient enumeration of maximal k-plexes. In SIGMOD, pages 431–444. ACM, 2015.
- P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, pages 595–601. ACM, 2004.
- C. Bron and J. Kerbosch. Finding all cliques of an undirected graph (algorithm 457). Commun. ACM, 16(9):575–576, 1973.
- Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic acids research, 31(9):2443–2450, 2003.
- Efficient maximum k-plex computation over large sparse graphs. Proc. VLDB Endow., 16(2):127–139, 2022.
- Fast enumeration of large k-plexes. In KDD, pages 115–124. ACM, 2017.
- D2K: scalable community detection in massive networks via small-diameter k-plexes. In KDD, pages 1272–1281. ACM, 2018.
- Scaling up maximal k-plex enumeration. In CIKM, pages 345–354. ACM, 2022.
- An exact algorithm for maximum k-plexes in massive graphs. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, pages 1449–1455. ijcai.org, 2018.
- Tracking evolving communities in large linked networks. Proceedings of the National Academy of Sciences, 101(suppl 1):5249–5253, 2004.
- Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(suppl_1):i213–i221, 2005.
- A new upper bound based on vertex partitioning for the maximum k-plex problem. In IJCAI, pages 1689–1696. ijcai.org, 2021.
- J. M. Lewis and M. Yannakakis. The node-deletion problem for hereditary properties is np-complete. J. Comput. Syst. Sci., 20(2):219–230, 1980.
- Uncovering the overlapping community structure of complex networks by maximal cliques. Physica A: Statistical Mechanics and its Applications, 415:398–406, 2014.
- A graph-theoretic generalization of the clique concept. Journal of Mathematical sociology, 6(1):139–154, 1978.
- An empirical analysis of phishing blacklists. In 6th Conference on Email and Anti-Spam (CEAS). Carnegie Mellon University, 2009.
- Koobface: The evolution of the social botnet. In eCrime, pages 1–10. IEEE, 2010.
- Improving functional modularity in protein-protein interactions graphs using hub-induced subgraphs. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 371–382. Springer, 2006.
- Listing maximal k-plexes in large real-world graphs. In WWW, pages 1517–1527. ACM, 2022.
- Mining spam email to identify common origins for forensic application. In R. L. Wainwright and H. Haddad, editors, ACM Symposium on Applied Computing, pages 1433–1437. ACM, 2008.
- D. Weiss and G. Warner. Tracking criminals on facebook: A case study from a digital forensics reu program. In Proceedings of Annual ADFSL Conference on Digital Forensics, Security and Law, 2015.
- A fast algorithm to compute maximum k-plexes in social network analysis. In AAAI, pages 919–925. AAAI Press, 2017.
- K. Yu and C. Long. Maximum k-biplex search on bipartite graphs: A symmetric-bk branching approach. Proc. ACM Manag. Data, 1(1):49:1–49:26, 2023.
- Efficient algorithms for maximal k-biplex enumeration. In SIGMOD, pages 860–873. ACM, 2022.
- Relaxed graph color bound for the maximum k-plex problem. CoRR, abs/2301.07300, 2023.
- Improving maximum k-plex solver via second-order reduction and graph color bounding. In AAAI, pages 12453–12460. AAAI Press, 2021.
- Enumerating maximal k-plexes with worst-case time guarantee. In AAAI, pages 2442–2449. AAAI Press, 2020.