Finding Endogenously Formed Communities (1201.4899v2)
Abstract: A central problem in e-commerce is determining overlapping communities among individuals or objects in the absence of external identification or tagging. We address this problem by introducing a framework that captures the notion of communities or clusters determined by the relative affinities among their members. To this end we define what we call an affinity system, which is a set of elements, each with a vector characterizing its preference for all other elements in the set. We define a natural notion of (potentially overlapping) communities in an affinity system, in which the members of a given community collectively prefer each other to anyone else outside the community. Thus these communities are endogenously formed in the affinity system and are "self-determined" or "self-certified" by its members. We provide a tight polynomial bound on the number of self-determined communities as a function of the robustness of the community. We present a polynomial-time algorithm for enumerating these communities. Moreover, we obtain a local algorithm with a strong stochastic performance guarantee that can find a community in time nearly linear in the of size the community. Social networks fit particularly naturally within the affinity system framework -- if we can appropriately extract the affinities from the relatively sparse yet rich information from social networks, our analysis then yields a set of efficient algorithms for enumerating self-determined communities in social networks. In the context of social networks we also connect our analysis with results about $(\alpha,\beta)$-clusters introduced by Mishra, Schreiber, Stanton, and Tarjan \cite{msst}. In contrast with the polynomial bound we prove on the number of communities in the affinity system model, we show that there exists a family of networks with superpolynomial number of $(\alpha,\beta)$-clusters.