Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity (2209.04562v5)
Abstract: Community detection is a classic network problem with extensive applications in various fields. Its most common method is using modularity maximization heuristics which rarely return an optimal partition or anything similar. Partitions with globally optimal modularity are difficult to compute, and therefore have been underexplored. Using structurally diverse networks, we compare 30 community detection methods including our proposed algorithm that offers optimality and approximation guarantees: the Bayan algorithm. Unlike existing methods, Bayan globally maximizes modularity or approximates it within a factor. Our results show the distinctive accuracy and stability of maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well performing algorithms, among which Bayan stands out as the most reliable method for small networks. A Python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for Python.
- Modularity-maximizing graph communities via mathematical programming. The European Physical Journal B 66, 3 (2008), 409–418.
- Deciphering network community structure by surprise. PLOS ONE 6, 9 (2011), 1–8.
- Column generation algorithms for exact modularity maximization in networks. Physical Review E 82, 4 (2010), 046112.
- Dataset of networks used in assessing the Bayan algorithm for community detection, 2023. FigShare https://doi.org/10.6084/m9.figshare.22442785.
- Heuristic modularity maximization algorithms for community detection rarely return an optimal partition or anything similar. In Computational Science – ICCS 2023 (Cham, 2023), J. Mikyška, C. de Mulatier, M. Paszynski, V. V. Krzhizhanovskaya, J. J. Dongarra, and P. M. Sloot, Eds., Springer Nature Switzerland, pp. 612–626. https://doi.org/10.1007/978-3-031-36027-5_48.
- PyGenStability: Multiscale community detection with generalized Markov Stability, Mar. 2023. arXiv:2303.05385.
- Biemann, C. Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing (USA, 2006), TextGraphs-1, Association for Computational Linguistics, pp. 73–80.
- Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.
- Hierarchical graph clustering using node pair sampling. In MLG 2018 - 14th International Workshop on Mining and Learning with Graphs (London, UK, 2018).
- On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20, 2 (2007), 172–188.
- Reformulation of a model for hierarchical divisive graph modularity maximization. Annals of Operations Research 222 (2014), 213–226.
- Global vs local modularity for network community detection. PLOS ONE 13, 10 (2018), e0205284.
- Finding community structure in very large networks. Physical Review E 70, 6 (2004), 066111.
- Community detection via semi-synchronous label propagation algorithms. In 2010 IEEE international workshop on: business applications of social network analysis (BASNA) (2010), IEEE, pp. 1–8.
- The igraph software package for complex network research. InterJournal Complex Systems 1 (2006), 1695.
- Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005, 09 (2005), P09008.
- Network clustering via maximizing modularity: Approximation algorithms and theoretical limits. In 2015 IEEE International Conference on Data Mining (Nov. 2015), pp. 101–110. ISSN: 1550-4786.
- Toward optimal community detection: From trees to general weighted networks. Internet Mathematics 11, 3 (2015), 181–200.
- Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 1 (2007), 36–41.
- Community detection in networks: A user guide. Physics Reports 659 (2016), 1–44.
- 20 years of network community detection. Nature Physics 18 (2022), 848–850.
- Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 200 (1937), 675–701.
- Performance of modularity maximization in practical contexts. Physical Review E 81, 4 (2010), 046106.
- Gurobi Optimization Inc. Gurobi optimizer reference manual, 2023. url: https://gurobi.com/documentation/10.0/refman/index.html date accessed 16 Feb 2023.
- Community detection in networks: Structural communities versus ground truth. Physical Review E 90, 6 (2014), 062805.
- Artificial Benchmark for Community Detection (ABCD)—Fast random graph model with community structure. Network Science 9, 2 (2021), 153–178.
- Stochastic blockmodels and community structure in networks. Physical Review E 83 (2011), 016107.
- Counting the number of metastable states in the modularity landscape: Algorithmic detectability limit of greedy algorithms in community detection. Physical Review E 99, 1 (2019), 010301.
- An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal 49, 2 (1970), 291–307.
- On the power of Louvain for graph clustering. In Advances in Neural Information Processing Systems 33 (NeurIPS’20) (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds.
- Community detection via measure space embedding. Advances in Neural Information Processing Systems 28 (2015).
- Limits of modularity maximization in community detection. Physical Review E 84, 6 (2011), 066122.
- Benchmark graphs for testing community detection algorithms. Physical Review E 78, 4 (2008), 046110.
- Community structure in directed networks. Physical Review Letters 100, 11 (2008), 118703.
- Li, J. D. A two-step rejection procedure for testing multiple hypotheses. Journal of Statistical Planning and Inference 138, 6 (2008), 1521–1527.
- EdMot: An edge enhancement approach for motif-aware community detection. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (2019), pp. 479–487.
- A scalable redefined stochastic blockmodel. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 3 (2021), 1–28.
- Detecting mesoscale structures by surprise. Communications Physics 5, 1 (2022), 1–16.
- Miltenberger, M. Interactive visualizations of Mittelmann benchmarks. https://github.com/mattmilten/mittelmann-plots date accessed: 2023-06-27.
- Computing an upper bound of modularity. The European Physical Journal B 86 (2013), 1–7.
- Structure and inference in annotated networks. Nature Communications 7, 1 (2016), 11863.
- Newman, M. E. J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577–8582.
- Newman, M. E. J. Equivalence between modularity optimization and maximum likelihood methods for community detection. Physical Review E 94, 5 (2016), 052315.
- The ground truth about metadata and community detection in networks. Science Advances 3, 5 (2017), e1602548.
- Peixoto, T. P. Efficient monte carlo and greedy heuristic for the inference of stochastic block models. Physical Review E 89, 1 (2014), 012804.
- Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10, 2 (2006), 191–218.
- High quality, scalable and parallel community detection for large real graphs. In Proceedings of the 23rd international conference on World wide web (2014), pp. 225–236.
- Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76 (Sep 2007), 036106.
- Statistical mechanics of community detection. Physical Review E 74, 1 (2006), 016110.
- An information-theoretic framework for resolving community structure in complex networks. Proceedings of the National Academy of Sciences 104, 18 (2007), 7327–7331.
- Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 4 (2008), 1118–1123.
- GEMSEC: Graph embedding with self clustering. In Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining (2019), pp. 65–72.
- An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In Seventh IEEE international conference on data mining (ICDM 2007) (2007), IEEE, pp. 643–648.
- The many facets of community detection in complex networks. Applied Network Science 2, 1 (2017), 1–13.
- Community detection in the stochastic block model by mixed integer programming. arXiv preprint arXiv:2101.12336 (2021).
- Optimality of community structure in complex networks, 2017. arXiv preprint arXiv:1712.05110.
- General optimization technique for high-quality community detection in complex networks. Physical Review E 90, 1 (2014), 012811.
- Detecting communities using asymptotical surprise. Physical Review E 92, 2 (2015), 022816.
- Significant scales in community structure. Scientific reports 3, 1 (2013), 1–10.
- Narrow scope for resolution-limit-free community detection. Physical Review E 84, 1 (2011), 016114.
- From Louvain to Leiden: guaranteeing well-connected communities. Scientific Reports 9, 1 (2019).
- Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, 95 (2010), 2837–2854.
- Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42 (2015), 181–213.
- Statistical inference of assortative community structures. Phys. Rev. Res. 2 (2020), 043271.
- Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proceedings of the National Academy of Sciences 111, 51 (2014), 18144–18149.
- A community detection algorithm based on graph compression for large-scale social networks. Information Sciences 551 (2021), 358–372.