Approximate Closest Community Search in Networks (1505.05956v2)

Published 22 May 2015 in cs.SI and cs.DB

Abstract: Recently, there has been significant interest in the study of the community search problem in social and information networks: given one or more query nodes, find densely connected communities containing the query nodes. However, most existing studies do not address the "free rider" issue, that is, nodes far away from query nodes and irrelevant to them are included in the detected community. Some state-of-the-art models have attempted to address this issue, but not only are their formulated problems NP-hard, they do not admit any approximations without restrictive assumptions, which may not always hold in practice. In this paper, given an undirected graph G and a set of query nodes Q, we study community search using the k-truss based community model. We formulate our problem of finding a closest truss community (CTC), as finding a connected k-truss subgraph with the largest k that contains Q, and has the minimum diameter among such subgraphs. We prove this problem is NP-hard. Furthermore, it is NP-hard to approximate the problem within a factor $(2-\varepsilon)$, for any $\varepsilon >0 $. However, we develop a greedy algorithmic framework, which first finds a CTC containing Q, and then iteratively removes the furthest nodes from Q, from the graph. The method achieves 2-approximation to the optimal solution. To further improve the efficiency, we make use of a compact truss index and develop efficient algorithms for k-truss identification and maintenance as nodes get eliminated. In addition, using bulk deletion optimization and local exploration strategies, we propose two more efficient algorithms. One of them trades some approximation quality for efficiency while the other is a very efficient heuristic. Extensive experiments on 6 real-world networks show the effectiveness and efficiency of our community model and search algorithms.

Citations (183)

View on Semantic Scholar

Summary

The paper introduces the Closest Truss Community model using a k-truss framework to efficiently identify communities around query nodes while mitigating free riders.
It proves the NP-hardness of the community search and employs a greedy 2-approximation algorithm enhanced by bulk deletion to maintain truss properties.
Empirical evaluations on real datasets demonstrate significant improvements in community cohesiveness and scalability compared to traditional methods.

Insights into Approximate Closest Community Search in Networks

The paper "Approximate Closest Community Search in Networks" addresses an important problem in the analysis of social and information networks: efficiently finding densely connected communities linked to specified query nodes while mitigating the "free rider" problem. The authors introduce a new approach based on $k$ -truss subgraphs, focusing on distinguishing communities not just by density but also by their internal connectivity and proximity to the query nodes.

Methodology and Problem Definition

The authors focus on a novel model called the Closest Truss Community (CTC) that uses the $k$ -truss framework. A $k$ -truss is defined as a subgraph where each edge participates in at least $(k-2)$ triangles. The community search problem is framed in a way that seeks the subgraph with the highest possible $k$ value that includes all the query nodes and minimizes the graph's diameter. This dual optimization criterion uniquely positions the CTC model to avoid irrelevant subgraph components, often referred to as "free riders."

The authors rigorously prove the NP-hardness of the CTC problem and demonstrate its resistance to approximation within a factor of $(2-\varepsilon)$ . They present a heuristic algorithm that achieves a 2-approximation of the optimal solution. This greedy algorithm works by initially identifying a maximal $k$ -truss containing the query nodes and then iteratively removing nodes to minimize the graph's diameter, maintaining the truss property.

Algorithmic Solutions and Implementation

To identify the initial $k$ -truss, the authors develop an efficient algorithm exploiting a truss index, which effectively guides the expansion of the connected $k$ -truss. The approach identifies the maximal truss with the largest k using existing truss decomposition techniques. For maintaining the truss properties during node deletions, an efficient maintenance algorithm is integrated into the greedy framework. Such maintenance ensures that community connectivity and truss density are preserved after each update.

Further, the authors optimize the process with a bulk deletion strategy. This involves removing nodes in batches rather than one at a time, thus reducing computational overhead while slightly relaxing the problem's approximation guarantee. The introduction of this technique addresses scalability and efficiency issues crucial for analyzing large real-world networks.

Evaluation and Experimentation

The paper empirically validates the proposed algorithms over multiple real-world datasets, demonstrating both effectiveness and efficiency. The experiments not only underscore the avoidance of free rider subgraphs but also exhibit improvement over existing benchmarks in terms of community cohesiveness as measured by the traditional metrics of density and diameter.

Implications and Future Directions

This research offers significant theoretical and practical implications by effectively combining elements of dense subgraph mining with proximity criteria. Such an approach enhances the semantic meaning of community structures detected in social networks by ensuring relevance to the specified nodes.

The model has potential benefits for a wide range of applications, including personalized recommendation systems, community discovery in protein interaction networks, and beyond. Future work could extend these concepts to directed and weighted networks. Another promising direction would be exploring the intersection of probabilistic networks with truss-based models, which would involve addressing additional layers of computational complexity.

In summary, this paper advances the field by presenting a closer approximation to realistic social structures compared to previous models. It reconciles the dichotomy of density and proximity in community search problems, providing a scalable and practical solution suitable for real-world application scenarios.

PDF Markdown