Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters (0810.1355v1)

Published 8 Oct 2008 in cs.DS, physics.data-an, and physics.soc-ph

Abstract: A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. We employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the "best" possible community--according to the conductance measure--over a wide range of size scales. We study over 100 large real-world social and information networks. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales; and communities of larger size scales gradually "blend into" the expander-like core of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as testbeds of community detection algorithms. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network datasets.

Citations (1,923)

View on Semantic Scholar

Summary

The paper demonstrates that natural communities are typically small (~100 nodes) with quality diminishing in larger clusters, as reflected by conductance measures.
The analysis employs approximation algorithms and the Network Community Profile plot to uncover a core-periphery structure in over 100 real-world networks.
The findings challenge standard network models, prompting the development of refined community detection methods and deeper exploration of core dynamics.

Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

The paper "Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters" by Leskovec, Lang, Dasgupta, and Mahoney provides a comprehensive analysis of community structures within large social and information networks. The authors challenge the conventional wisdom around clustering in these networks, proposing that meaningful communities are typically small and that larger clusters blend into the network in ways that defy traditional community detection methods.

Overview of the Methodology

The authors' approach diverges from standard practices that begin with the premise that communities or clusters in networks are sets of nodes with better intra-connectivity than inter-connectivity. Instead, they use approximation algorithms for the graph partitioning problem to characterize, as a function of size, the statistical and structural properties of graph partitions that could be construed as communities. They introduce the Network Community Profile (NCP) plot, which tracks the conductance score of the best possible community over a range of sizes.

The authors paper over 100 large real-world networks, such as social, technological, and web networks, ranging from thousands up to tens of millions of nodes.

Key Empirical Findings

Natural Community Size: Tight-knit communities exist only up to a size of about 100 nodes. Above this size, good communities become less distinct.
Inverse Conductance Relationship: As communities increase in size beyond approximately 100 nodes, their quality, as measured by conductance, generally worsens. This observation suggests that larger communities tend to blend into an expander-like core of the network, making them less easily definable.
Core-Periphery Structure: Large networks exhibit a core-periphery structure wherein the core is significantly expander-like, and the peripheral nodes form smaller, well-defined communities or "whiskers."

Theoretical and Practical Implications

The authors argue that the observed behavior contradicts common network generation models. For example:

Preferential Attachment Models: These models produce networks without the small well-defined community structures observed, resulting in expander-like properties that do not match real-world networks.
Copying Models: Similarly fail to reproduce the nuanced community structures seen in actual data.
Hierarchical Models and Geometric Models: Do not account for the gradual blending of communities into the network core.

Instead, the authors found that a Forest Fire Model can reproduce the empirical observations by progressively burning edges and mimicking network growth patterns observed in real-world data.

Future Directions and Speculations

The findings suggest several areas for future research and methodology adjustments:

Improving Community Detection Algorithms: There is a need to develop algorithms that can better identify the small, well-defined communities up to 100 nodes and to understand how larger communities transition into the core.
Understanding Core-Periphery Dynamics: More research is needed to explore the underlying causes of the core-periphery structure and its implications for network functionality.
Alternative Measures of Community Quality: Given the limitations of conductance, alternative measures that account for both connection density and internal coherence might provide more nuanced insights into community structure.

Conclusion

The paper demonstrates that large-scale networks have a fundamentally different community structure than what is commonly assumed. Well-defined communities are small, typically not exceeding 100 nodes. Larger clusters become less distinct and blend into a network’s core. These findings challenge existing network generation models and call for revised approaches in community detection algorithms. The Forest Fire Model provides a promising direction, capturing many essential features observed in real-world networks. Future research focusing on these areas could enhance our understanding and analysis of complex networks' community structures.

PDF Markdown