Abstract: Graph clustering involves the task of dividing nodes into clusters, so that the edge density is higher within clusters as opposed to across clusters. A natural, classic and popular statistical setting for evaluating solutions to this problem is the stochastic block model, also referred to as the planted partition model. In this paper we present a new algorithm--a convexified version of Maximum Likelihood--for graph clustering. We show that, in the classic stochastic block model setting, it outperforms existing methods by polynomial factors when the cluster size is allowed to have general scalings. In fact, it is within logarithmic factors of known lower bounds for spectral methods, and there is evidence suggesting that no polynomial time algorithm would do significantly better. We then show that this guarantee carries over to a more general extension of the stochastic block model. Our method can handle the settings of semi-random graphs, heterogeneous degree distributions, unequal cluster sizes, unaffiliated nodes, partially observed graphs and planted clique/coloring etc. In particular, our results provide the best exact recovery guarantees to date for the planted partition, planted k-disjoint-cliques and planted noisy coloring models with general cluster sizes; in other settings, we match the best existing results up to logarithmic factors.
The paper introduces a convex optimization algorithm that refines maximum likelihood estimation for clustering nodes in sparse and heterogeneous graphs.
It employs nuclear norm regularization to relax combinatorial constraints, thereby enabling effective clustering under both SBM and its generalized variants.
Numerical results show polynomial performance improvements, offering robust community detection in applications like social network analysis.
Improved Graph Clustering: A Convex Optimization Approach
This paper tackles the complex problem of graph clustering within the framework of the stochastic block model (SBM) and its extensions. The authors introduce a convex optimization algorithm for graph clustering, which enhances maximum likelihood estimation via a relaxation to address challenges posed by sparse graphs and heterogeneous edge distributions.
The proposed method redefines the SBM problem by incorporating a convex relaxation of the maximum likelihood estimator. Specifically, the optimization task aims to partition nodes into clusters so that intra-cluster edge density exceeds inter-cluster edge density. The solution involves a nuanced use of nuclear norm regularization to foster low-rank solutions, easing the requirement of exact cluster matrices, which are typically intractable due to combinatorial constraints. The algorithm's novelty lies in its capacity to handle various realistic graph characteristics: sparsity, non-uniform degree distributions, variable cluster sizes, and the presence of unaffiliated nodes.
Numerical results demonstrate that, for the traditional SBM context, this approach achieves polynomial improvements over existing methods. It performs closely to known lower bounds for spectral techniques and suggests that further polynomial improvements are improbable within polynomial-time algorithms. The convex program guarantees optimality under weaker conditions than many precursors, factoring high-dimensional regimes where clusters grow as a function of node count, n.
In more general graph settings—the generalized stochastic block model (GSBM)—the method proves robust. It accommodates semi-random perturbations, common in real datasets where purely random models fall short. Here, the algorithm's guarantees are rivaled by few others, allowing significant heterogeneity in node degree, cluster size, and edge probabilities. It extends naturally to special cases including planted partition models, disjoint cliques, and even planted coloring problems, reflecting its versatility.
The strong numerical results illustrate its effectiveness across sparse and noisier graphs, where edge probabilities p and q (within and across clusters, respectively) differ by polynomial factors. Theoretical analyses comprehensively define performance regimes, relating p, q, the minimum cluster size K, and unaffiliated node counts to success conditions, resulting in more lenient requirements than those in prior literature. In particular, the algorithm is effective for cluster sizes K=Ω(n), aligning with conjectured polynomial-time barriers for smaller cliques.
The significance arises in diverse applications: detecting community structure in sparse social networks, segmenting users or nodes when traditional clustering is unreliable, and accommodating dynamic graph structures. Moreover, the approach inherently adjusts to partially observed graphs. Despite the increased computational demands of solving a semi-definite program (SDP), the results emphasize a trade-off between computational efficiency and clustering fidelity, suggesting a balance suited to challenging graph environments.
Future developments in this domain could explore faster implementations, advanced rounding methods for continuous solutions, and adaptions for evolving data scenarios. Extending theoretical guarantees to more complex or non-standard graph models will likely reveal further applications in network science and beyond. This paper marks a milestone in graph clustering, combining robust theoretical guarantees with empirical success across varied scenarios.