Finding overlapping communities in networks by label propagation (0910.5516v3)

Published 28 Oct 2009 in physics.soc-ph and cs.SI

Abstract: We propose an algorithm for finding overlapping community structure in very large networks. The algorithm is based on the label propagation technique of Raghavan, Albert, and Kumara, but is able to detect communities that overlap. Like the original algorithm, vertices have labels that propagate between neighbouring vertices so that members of a community reach a consensus on their community membership. Our main contribution is to extend the label and propagation step to include information about more than one community: each vertex can now belong to up to v communities, where v is the parameter of the algorithm. Our algorithm can also handle weighted and bipartite networks. Tests on an independently designed set of benchmarks, and on real networks, show the algorithm to be highly effective in recovering overlapping communities. It is also very fast and can process very large and dense networks in a short time.

Citations (961)

View on Semantic Scholar

Summary

The paper introduces COPRA, a novel algorithm that extends label propagation to detect overlapping community structures.
The method employs a parameter v to manage vertices’ membership in multiple communities, validated on synthetic and real-world networks.
COPRA shows improved scalability and stability, outperforming comparable algorithms in modularity and execution speed.

Evaluating Overlapping Community Detection through Label Propagation: Analysis of COPRA

The paper by Steve Gregory introduces a novel approach to detecting overlapping community structures in networks using the Community Overlap Propagation Algorithm (COPRA), an extension of the Label Propagation Algorithm (LPA) by Raghavan, Albert, and Kumara (RAK). This essay will summarize and analyze the key contributions, results, and implications of the COPRA algorithm.

Core Concept and Methodology

The revised approach utilizes label propagation for identifying communities in networks. COPRA allows each vertex to belong to multiple communities, designated by the parameter $v$ . This parameter represents the maximum number of communities a vertex can simultaneously belong to, where higher values of $v$ denote increased potential for overlap. The key extension from the RAK algorithm is that the label propagation step in COPRA incorporates belonging coefficients, ensuring that each vertex can more accurately represent its membership in multiple communities.

Experimental Evaluation

The efficacy of COPRA was evaluated extensively using both synthetic benchmark datasets and real-world networks. The primary quantitative metrics included modularity and the Normalized Mutual Information (NMI).

Synthetic Networks:
- Disjoint Community Benchmarks: The results illustrated COPRA’s ability to match the performance of the original RAK algorithm with $v = 1$ . For values of $v \leq 9$ , COPRA maintained stability in solutions, demonstrating that the algorithm can effectively manage disjoint structures even when capable of detecting overlap.
- Overlapping Community Benchmarks: For networks with overlapping communities, COPRA showed improved detection capabilities with the appropriate choice of $v$ . For instance, an optimal value of $v \approx 7$ resulted in the highest NMI, closely aligning the detected communities with the ground truth.
Real Networks:
- Performance Metrics: Across various networks such as “netscience”, “jazz”, “protein-protein”, and “blogs”, COPRA consistently outperformed or matched the modularity achieved by other algorithms like CFinder and LFM. Notably, COPRA demonstrated a significant advantage in terms of execution speed and scalability, processing networks with tens of thousands to millions of vertices efficiently.
- Stability and Overlap: The algorithm showed robustness in modularity with minimal standard deviation, highlighting its stability despite inherent non-determinism due to the label propagation method. Furthermore, COPRA’s parameter $v$ was pivotal in optimizing overlap detection, with higher values leading to the discovery of more intricate overlapping community structures.

Implications and Future Research

The introduction of COPRA provides several practical and theoretical implications:

Scalability: COPRA's near-linear time complexity in sparse networks is a substantial advantage for analyzing large-scale networks. This efficiency equips researchers with a powerful tool for real-time community detection without compromising on the quality of the detected structures.
Parameter Sensitivity: The parameter $v$ central to COPRA's operation grants flexibility. Users can optimize $v$ based on the expected degree of overlap in their specific networks. However, excessive overlap indicated by high $v$ values should be moderated to avoid non-convergent states.
Broad Applicability: The algorithm's adaptability across weighted and bipartite networks underscores its versatility. COPRA's effectiveness in different network structures enables its application in diverse fields, from social network analysis to bioinformatics.

Conclusion

Overall, the paper provides a thorough investigation of the COPRA algorithm, demonstrating its capability to effectively identify overlapping communities in large networks through label propagation. By providing a robust alternative to existing algorithms, COPRA paves the way for further advancements in community detection algorithms, particularly those requiring scalability and precision in overlapping structures. Future research could explore the integration of advanced optimization techniques and parallel computing strategies to further improve COPRA's computational performance and applicability in real-time network analysis.

PDF Markdown