Bad Communities with High Modularity (1209.2678v3)
Abstract: In this paper we discuss some problematic aspects of Newman's modularity function QN. Given a graph G, the modularity of G can be written as QN = Qf -Q0, where Qf is the intracluster edge fraction of G and Q0 is the expected intracluster edge fraction of the null model, i.e., a randomly connected graph with same expected degree distribution as G. It follows that the maximization of QN must accomodate two factors pulling in opposite directions: Qf favors a small number of clusters and Q0 favors many balanced (i.e., with approximately equal degrees) clusters. In certain cases the Q0 term can cause overestimation of the true cluster number; this is the opposite of the well-known under estimation effect caused by the "resolution limit" of modularity. We illustrate the overestimation effect by constructing families of graphs with a "natural" community structure which, however, does not maximize modularity. In fact, we prove that we can always find a graph G with a "natural clustering" V of G and another, balanced clustering U of G such that (i) the pair (G; U) has higher modularity than (G; V) and (ii) V and U are arbitrarily different.