BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs (2405.04428v2)
Abstract: Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.
- Pivot-based maximal biclique enumeration. In IJCAI, pages 3558–3564, 2020.
- Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487–499. Santiago, Chile, 1994.
- V. Batagelj and M. Zaversnik. An o (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049, 2003.
- Faster maximal clique enumeration in large real-world link streams. arXiv preprint arXiv:2302.00360, 2023.
- C. Borgelt. Frequent item set mining. Wiley interdisciplinary reviews: data mining and knowledge discovery, 2(6):437–456, 2012.
- Graph structure in the web. Computer networks, 33(1-6):309–320, 2000.
- C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM, 16(9):575–577, 1973.
- Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic acids research, 31(9):2443–2450, 2003.
- Impact of the mesoscale structure of a bipartite ecological interaction network on its robustness through a probabilistic modeling. Environmetrics, 33(2):e2709, 2022.
- Efficient maximal biclique enumeration for large sparse bipartite graphs. Proceedings of the VLDB Endowment, 15(8):1559–1571, 2022.
- Transfer policy and football club performance: evidence from network analysis. International Journal of Sport Finance, 15(3):95–109, 2020.
- A. Conte and E. Tomita. On the overall and delay complexity of the cliques and bron-kerbosch algorithms. Theoretical Computer Science, 899:1–24, 2022.
- P. Damaschke. Enumerating maximal bicliques in bipartite graphs with favorable degree sequences. Information Processing Letters, 114(6):317–321, 2014.
- Shared-memory parallel maximal clique enumeration. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pages 62–71. IEEE, 2018.
- Indices, graphs and null models: analyzing bipartite ecological networks. 2009.
- Listing all maximal cliques in sparse graphs in near-optimal time. In International Symposium on Algorithms and Computation, pages 403–414. Springer, 2010.
- Enumeration aspects of maximal cliques and bicliques. Discrete applied mathematics, 157(7):1447–1459, 2009.
- J.-L. Guillaume and M. Latapy. Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications, 371(2):795–813, 2006.
- D. Hermelin and G. Manoussakis. Efficient enumeration of maximal induced bicliques. Discrete Applied Mathematics, 303:253–261, 2021.
- An effective algorithm for extracting maximal bipartite cliques. In International Conference on Data Science, E-learning and Information Systems 2021, pages 76–81, 2021.
- Graphs in molecular biology. BMC bioinformatics, 8(6):1–14, 2007.
- J. Kunegis. KONECT – The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion, pages 1343–1350, 2013. Données disponibles à l’adresse: http://konect.cc/networks.
- Biclique communities. Physical review E, 78(1):016108, 2008.
- Name disambiguation in scientific cooperation network by exploiting user feedback. Artificial Intelligence Review, 41:563–578, 2014.
- Searching maximum quasi-bicliques from protein-protein interaction network. Journal of Biomedical Science and Engineering, 1(3):200, 2008.
- K. Makino and T. Uno. New algorithms for enumerating all maximal cliques. In Algorithm Theory-SWAT 2004: 9th Scandinavian Workshop on Algorithm Theory, Humlebæk, Denmark, July 8-10, 2004. Proceedings 9, pages 260–272. Springer, 2004.
- Summarizing online user reviews using bicliques. In SOFSEM 2016: Theory and Practice of Computer Science: 42nd International Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 23-28, 2016, Proceedings 42, pages 569–579. Springer, 2016.
- M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001.
- Bipartite graphs in systems biology and medicine: a survey of methods and applications. GigaScience, 7(4):giy014, 2018.
- Efficient algorithm for maximal biclique enumeration on bipartite graphs. In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery: Volume 2, pages 3–13. Springer, 2020.
- Motifs in bipartite ecological networks: uncovering indirect interactions. Oikos, 128(2):154–170, 2019.
- Exact biclustering algorithm for the analysis of large gene expression data sets. BMC bioinformatics, 13(Suppl 18):A10, 2012.
- Co-clustering: a versatile tool for data analysis in biomedical informatics. IEEE Transactions on Information Technology in Biomedicine, 11(4):493–494, 2007.
- M. J. Zaki and M. Ogihara. Theoretical foundations of association rules. In 3rd ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pages 71–78, 1998.
- On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC bioinformatics, 15:1–18, 2014.
- Bipartite network projection and personal recommendation. Physical review E, 76(4):046115, 2007.
- Measuring influence in online social network based on the user-content bipartite graph. Computers in Human Behavior, 52:184–189, 2015.