Determining the Number of Communities in Sparse and Imbalanced Settings
Abstract: Community structures represent a crucial aspect of network analysis, and various methods have been developed to identify these communities. However, a common hurdle lies in determining the number of communities K, a parameter that often requires estimation in practice. Existing approaches for estimating K face two notable challenges: the weak community signal present in sparse networks and the imbalance in community sizes or edge densities that result in unequal per-community expected degree. We propose a spectral method based on a novel network operator whose spectral properties effectively overcome both challenges. This operator is a refined version of the non-backtracking operator, adapted from a "centered" adjacency matrix. Its leading eigenvalues are more concentrated than those of the adjacency matrix for sparse networks, while they also demonstrate enhanced signal under imbalance scenarios, a benefit attributed to the centering step. This is justified, either theoretically or numerically, under the null model K = 1, in both dense and ultra-sparse settings. A goodness-of-fit test based on the leading eigenvalue can be applied to determine the number of communities K.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.