Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Review of Stochastic Block Models and Extensions for Graph Clustering (1903.00114v2)

Published 1 Mar 2019 in stat.ML and cs.LG

Abstract: There have been rapid developments in model-based clustering of graphs, also known as block modelling, over the last ten years or so. We review different approaches and extensions proposed for different aspects in this area, such as the type of the graph, the clustering approach, the inference approach, and whether the number of groups is selected or estimated. We also review models that combine block modelling with topic modelling and/or longitudinal modelling, regarding how these models deal with multiple types of data. How different approaches cope with various issues will be summarised and compared, to facilitate the demand of practitioners for a concise overview of the current status of these areas of literature.

Citations (182)

Summary

Overview of Stochastic Block Models and Extensions for Graph Clustering

Clement Lee and Darren J Wilkinson present a comprehensive examination of stochastic block models (SBMs), a prevalent model in graph clustering, emphasizing their extension and adaptation to address specific challenges in real-world networks. SBMs serve as a foundational technique for inferring latent structures within networks and are specifically utilized for clustering purposes based on relational data. Initially conceived through deterministic block structures, SBMs were later developed to handle probabilistic block structures that have become essential in detecting community structures within graphs.

The review details various adaptations and extensions to SBMs that cater to different types of graphs, including binary, directed, and valued graphs, with examples ranging from symmetric adjacency matrices in undirected graphs to complex probabilistic scenarios in directed hypergraphs. A significant adaptation discussed is the degree-corrected SBM (DC-SBM), which accounts for variations in node degrees, thus addressing degree heterogeneity often present in real-world networks. This adaptation ensures that nodes are more accurately clustered based on their connectivity patterns rather than just their density within community structures.

The paper further explores the intersection of longitudinal and topic modeling with SBMs, emphasizing approaches that integrate multiple layers of information for dynamic networks. For instance, dynamic SBMs capture temporal changes in network structures, allowing for the discovery of evolving relationships and community formations over time. This multifaceted approach is exemplary in analyses of datasets such as the Enron email corpus, where both longitudinal information and textual data provide rich insights into communication patterns and organizational structure.

Notably, Lee and Wilkinson delve into the computational strategies for SBM inference, contrasting Monte Carlo methods with variational approaches. The former is praised for its simplicity, while the latter offers scalable inference in expansive datasets. A critical concern addressed in the review is the determination of the number of groups, K, within SBMs. The authors outline strategies ranging from model selection criteria like BIC and ICL to the incorporation of hierarchical nonparametric Bayesian processes designed to infer K directly from the data.

The insights offered by the review have broad implications for theoretical advancements in understanding the latent structure within networks and practical applications in fields like social network analysis, bioinformatics, and information science. As graph-based data continues to flourish, the need for refined models and inference methods like those reviewed, which are capable of accounting for complex interdependencies and dynamics inherent to such data, becomes ever more pertinent.

Moreover, while the integration of SBM with topic modeling is still an emerging field, efforts such as the stochastic topic block model exemplify promising pathways for exploiting textual data alongside relational data in network analysis, suggesting significant potential for interdisciplinary applications in areas like digital communication and collaborative platforms.

Future developments in the field of SBM are poised to focus on hybrid models that seamlessly integrate multiple data types while maintaining computational efficiency and scalability. The paper stands as a foundational text, mapping current capabilities and setting the stage for continued innovation in graph clustering and network analysis.