Community detection in networks: Modularity optimization and maximum likelihood are equivalent (1606.02319v1)

Published 7 Jun 2016 in cs.SI and physics.soc-ph

Abstract: We demonstrate an exact equivalence between two widely used methods of community detection in networks, the method of modularity maximization in its generalized form which incorporates a resolution parameter controlling the size of the communities discovered, and the method of maximum likelihood applied to the special case of the stochastic block model known as the planted partition model, in which all communities in a network are assumed to have statistically similar properties. Among other things, this equivalence provides a mathematically principled derivation of the modularity function, clarifies the conditions and assumptions of its use, and gives an explicit formula for the optimal value of the resolution parameter.

Citations (246)

View on Semantic Scholar

Summary

The paper shows that maximizing the modularity function is mathematically equivalent to performing maximum likelihood estimation under a planted partition model.
It details how the resolution parameter γ is derived and used to adjust method sensitivity to different network structures.
The study highlights limitations of modularity maximization, encouraging further exploration into methods for detecting heterogeneous community sizes.

Analysis of Equivalence Between Modularity Optimization and Maximum Likelihood in Community Detection

The paper "Community detection in networks: Modularity optimization and maximum likelihood are equivalent" by M. E. J. Newman presents a rigorous analysis of the equivalence between two prominent methods for community detection in networks: modularity optimization and maximum likelihood estimation applied to the stochastic block model. This equivalence not only provides a clearer theoretical foundation for modularity maximization but also highlights its assumptions and potential limitations, offering valuable insights for further research in network science.

Modularity Optimization

The modularity optimization method, which has been widely utilized for identifying community structures in networks, is characterized by a modularity function that quantifies the quality of network divisions into communities. By maximizing this function, researchers identify the division with the most significant intra-group connections and minimal intergroup connections. The generalized modularity function includes a resolution parameter, $\gamma$ , which influences the scale of the detection, with different values highlighting different community sizes.

Maximum Likelihood and Stochastic Block Model

In parallel, the stochastic block model (SBM) offers a probabilistic framework for community detection, positing that a network is generated by a block model where connection probabilities depend on node group membership. The traditional SBM often falls short due to its assumption of a Poisson degree distribution, which may not align well with empirical networks. The degree-corrected block model extends this by accommodating more complex degree distributions.

Equivalence and Implications

Newman demonstrates that maximizing the modularity function is mathematically equivalent to finding a maximum likelihood estimate for the planted partition model—a special case of the SBM—under certain parameter conditions. This equivalence provides a principled derivation of the modularity function, lending substantial support to the modularity optimization method by aligning it with a rigorously defined statistical framework. It also illuminates the appropriate choice of the resolution parameter $\gamma$ , derived as:

$\gamma = \frac{in-out}{\log in - \log out}$

Where $in$ and $out$ are parameters representing intra-group and inter-group connection densities, respectively.

Further, the paper highlights specific limitations inherent in modularity maximization. Notably, the method assumes statistical uniformity among communities, potentially limiting its effectiveness in networks with heterogeneous community sizes or varying connectivity patterns. Additionally, the exploration of $\gamma$ suggests that modularity optimization inherently favors community divisions of identical size, which may not be optimal for all networks.

Methodological Contributions

The paper proposes an iterative scheme to empirically estimate the resolution parameter $\gamma$ , allowing researchers to adapt modularity optimization to real-world networks more effectively. This method can iterate between community detection and parameter estimation to refine community structures, offering a practical tool despite lacking formal convergence guarantees for networks not generated by the planted partition model.

Conclusion

Newman's work extends the theoretical understanding of community detection by anchoring modularity maximization within a statistical inference framework. This alignment validates modularity maximization's effectiveness under specific conditions and invites further exploration of its theoretical underpinnings and practical applications. Future research could explore more generalized SBM frameworks, enhancing the robustness and applicability of these insights across diverse network types and scales.

PDF Markdown