Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistency of community detection in networks under degree-corrected stochastic block models (1110.3854v5)

Published 18 Oct 2011 in math.ST, cs.SI, physics.soc-ph, and stat.TH

Abstract: Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs.

Citations (416)

Summary

  • The paper demonstrates that community detection under DC-SBM achieves consistency by accurately incorporating variable node degrees.
  • Modularity and likelihood-based methods are compared, revealing that degree-corrected approaches perform robustly in heterogeneous networks.
  • Empirical evaluations using simulations and political blog data confirm DC-SBM’s superiority in scenarios with high intra-community degree variability.

Consistency of Community Detection in Networks under Degree-Corrected Stochastic Block Models

This paper, authored by Yunpeng Zhao, Elizaveta Levina, and Ji Zhu, provides a comprehensive theoretical analysis of community detection consistency under the degree-corrected stochastic block model (DC-SBM). Community detection is pivotal in network analysis, aimed at identifying clusters of nodes with denser intra-connections compared to inter-connections. The standard stochastic block model (SBM), a traditionally used model for this purpose, assumes homogeneous node degree distributions within communities, which is often impractical in real-world networks where significant degree heterogeneity is common.

Key Contributions and Numerical Results

The authors address this limitation by introducing a rigorous consistency framework for the DC-SBM, allowing for variable node degrees within communities, while preserving the core block structure. The DC-SBM generalizes the classical SBM by incorporating node-specific degree parameters, thus accommodating hubs and varying connectivity patterns observed in real-life networks.

The paper asserts several pivotal findings and theoretical implications:

  1. Modularity Methods: The consistency of community detection methods is scrutinized under both SBM and DC-SBM. Notably, modularity methods relying on degree-correction, like Newman-Girvan modularity, manifest strong consistency across broader model classes, compared to traditional modularity approaches which necessitate parameter constraints affirming stronger intra-community ties for consistency.
  2. Likelihood-Based Methods: Likelihood-based criteria derived from DC-SBM are shown to achieve consistency without demanding parameter constraints, unlike modularity-based methods. These methods offer robustness across varying network topologies, albeit at the expense of higher parameter estimation complexity.
  3. Empirical Evaluation: Through simulations and real network data (political blogs), the authors demonstrate that the DC-SBM criteria and Newman-Girvan modularity outperform in scenarios with high intra-community degree variability. Conversely, in networks with lesser degree variability, simpler SBM-based methods could yield superior practical results owing to lesser model complexity.
  4. Parameter Constraint Analysis: Theoretical analysis elucidates the necessity for parameter constraints in modularity-based methods to ensure consistency. Specifically, the likelihood of ties being stronger within communities than between them becomes a critical prerequisite.

The numerical experiments underscore the versatility of DC-SBM in fitting networks with disparate degree distributions, aligning well with theoretical predictions. While the DC-SBM provides a superior fit for networks with marked heterogeneity in node degrees, its implementation is balanced by computational complexity due to the multitude of parameters involved.

Theoretical Implications and Future Directions

This work significantly advances theoretical understanding of community detection models, particularly reinforcing the notion that clear alignment between the assumed model and true underlying network structure is paramount for methodological consistency. It offers a statistical foundation that could be adapted for broader classes of models, evidenced by its applicability to both dense and sparse network structures.

Looking forward, the paper opens avenues for extending these consistency results to cases with dynamically growing nodes or communities, and exploring approximations for computational efficiency in practice. Despite the superior performance of DC-SBM in varied settings, practical deployment necessitates balancing model fidelity, computational burden, and real-world applicability. Future research could further probe the interplay of these factors, optimizing DC-SBM usability in large-scale networks.

Overall, this work addresses a fundamental gap in network science by juxtaposing established methods against innovative model formulations, thus enriching both the theoretical and practical landscapes of community detection.