Consistency of spectral clustering in stochastic block models (1312.2050v3)

Published 7 Dec 2013 in math.ST, stat.ML, and stat.TH

Abstract: We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical $k$-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.

Citations (582)

View on Semantic Scholar

Summary

The paper establishes error bounds showing that spectral clustering can consistently recover communities in SBM when the maximum expected node degree scales logarithmically with the number of nodes.
The paper extends these results to degree-corrected models using spherical k-median spectral clustering to accommodate realistic degree heterogeneity.
The paper leverages sharper combinatorial spectral bounds over traditional concentration inequalities, offering robust insights for community recovery in sparse networks.

Consistency of Spectral Clustering in Stochastic Block Models

The paper "Consistency of Spectral Clustering in Stochastic Block Models" by Jing Lei and Alessandro Rinaldo provides an in-depth analysis of spectral clustering methods applied to stochastic block models (SBM). The authors focus particularly on establishing consistency results under varying sparsity conditions.

Summary of Contributions

Theoretical Guarantees for Spectral Clustering: The authors derive error bounds for spectral clustering within the SBM framework. They establish that spectral clustering can consistently recover latent communities even when the maximum expected node degree scales logarithmically with the number of nodes. The setup compares favorably with many existing methods, which require denser networks for reliability.
Extension to Degree-Corrected Models: A significant contribution is the extension of these results to degree-corrected stochastic block models (DCBM) using spherical $k$ -median spectral clustering. This allows for the accommodation of degree heterogeneity within communities, which is a realistic aspect of many network datasets.
Spectral Perturbation and Bounding Techniques: The analysis utilizes a combinatorial bound on the spectrum of binary random matrices. This bound is sharper than traditional concentration inequalities, such as the matrix Bernstein inequality, offering potential utility beyond this paper.

Numerical Results and Methodology

The paper rigorously addresses the conditions under which spectral clustering achieves successful community recovery. Key parameters include: - Number of communities ( $K$ ) - Network sparsity (characterized by the scaling of $\alpha_n$ with respect to node count $n$ ) - Eigenvalue gaps in the stochastic matrix ( $\gamma_n$ )

The authors provide a range of conditions for these parameters under which community recovery is not only possible but optimal:

When the relative difference in connectivity within and between communities is bounded away from zero
As long as average node degrees exceed logarithmic growth relative to $n$

Implications and Future Directions

The findings have important implications for the practical application of spectral clustering in network analysis. Particularly:

Applicability to Sparse Networks: Demonstrating consistent results under sparsity broadens the applicability to realistic, large-scale networks where dense connections are improbable.
Potential for Broader Techniques: The combinatorial bounds introduced may inspire new approaches in related domains where matrix concentration inequalities are applied.

For future research directions, the authors suggest exploring extensions of their analysis to spectral clustering using the graph Laplacian. Regularization and normalizations within the Laplacian framework have shown empirical promise, suggesting a fruitful area for theoretical exploration.

The paper, while methodologically complex, provides a valuable contribution to our understanding of spectral clustering's capabilities and limitations under diverse network conditions. The combination of theoretical robustness and practical applicability makes it a noteworthy paper in network data analysis.

PDF Markdown