A Survey of Community Search Over Big Graphs (1904.12539v2)

Published 29 Apr 2019 in cs.DB

Abstract: With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many real applications, such as event organization, friend recommendation, and so on. Consequently, how to efficiently find high-quality communities from big graphs is an important research topic in the era of big data. Recently a large group of research works, called community search, have been proposed. They aim to provide efficient solutions for searching high-quality communities from large networks in real-time. Nevertheless, these works focus on different types of graphs and formulate communities in different manners, and thus it is desirable to have a comprehensive review of these works. In this survey, we conduct a thorough review of existing community search works. Moreover, we analyze and compare the quality of communities under their models, and the performance of different solutions. Furthermore, we point out new research directions. This survey does not only help researchers to have a better understanding of existing community search solutions, but also provides practitioners a better judgment on choosing the proper solutions.

Citations (254)

View on Semantic Scholar

Summary

The paper surveys methodologies and cohesiveness metrics ($k$-core, $k$-truss, $k$-clique, $k$-ECC) for efficiently finding dense communities within large-scale graphs.
It compares these metrics based on computational efficiency (e.g., $k$-core's $O(m+n)$ vs. $k$-truss's $O(m^{1.5})$), cohesiveness levels, and suitability for different graph types.
The survey highlights applications in social networks and bioinformatics, discusses challenges like scalability, and suggests future work on complex graphs and multi-cohesiveness integration.

Understanding Community Search Over Big Graphs: A Survey Review

The paper "A Survey of Community Search Over Big Graphs" provides an exhaustive overview of the field of community search in large-scale graph structures. This survey systematically classifies and evaluates existing methods for efficiently retrieving densely connected subgroups—or communities—from big graphs, which are a pivotal structural component inherent in many real-world networks such as social media and biological interaction networks. As the demand for accurate and efficient community search grows with the expansion of big data applications, this paper becomes an essential reference for researchers and practitioners in the field.

The paper primarily explores different community search methodologies applied to both simple and attributed graphs, emphasizing four principal cohesiveness metrics: $k$ -core, $k$ -truss, $k$ -clique, and $k$ -edge connected component (ECC). These metrics underpin the formation and retrieval of communities defined by varying constraints and performance trade-offs, thereby framing the discourse and comparative analysis within the survey.

Metrics and Community Search Approaches

$k$ -Core-Based Approaches: The $k$ -core metric, measuring subgraphs where each vertex connects to at least $k$ others, is favored for its linear computational efficiency ( $O(m+n)$ time complexity) but has limited structural cohesiveness compared to other metrics. The survey discusses advancements in $k$ -core algorithms, from global and local search methods to index-based solutions like the ShellStruct, addressing scalability and personalization for dynamic queries.
$k$ -Truss-Based Approaches: This metric, an extension of $k$ -core considering triangle connectivity, ensures a higher level of cohesiveness by focusing on edge support within triangles. Despite its computational overhead ( $O(m^{1.5})$ time complexity), $k$ -truss communities demonstrate superior cohesiveness, making it suitable for applications prioritizing dense connectivity. The development of EquiTruss-like indices further optimizes query performance by facilitating efficient triangle connectivity preservation.
$k$ -Clique and Variants: These models involve complete subgraphs, inherently conducive to detecting overlapping communities. Yet, $k$ -clique detection poses NP-hard challenges, often requiring approximation strategies and sophisticated indexing solutions to alleviate computational costs.
$k$ -ECC-Based Methods: These focus on edge connectivity, aiming for strongly connected components and addressing dynamic graph updates efficiently. While they offer a comprehensive account of network robustness, the associated algorithms need improvement in terms of handling attributed graphs and providing more nuanced graph dynamic support.

Comparative Analysis and Future Directions

The paper notably contrasts community search methods concerning scalability, cohesiveness, algorithmic complexity, and adaptability to dynamic graphs across several graph domains, including keyword-based, location-based, and temporal graphs. It recognizes how distinct metrics cater to different application needs while reflecting on common challenges like scalability, parameter suggestion, and multi-cohesiveness.

In terms of future work, the authors propose further exploration into augmenting CS algorithms for complex network structures, including uncertain, signed, or multidimensional graphs. There's also a move towards integrating diverse cohesiveness metrics in multi-attributed contexts to articulate distinct community semantics.

Implications and Practical Applications

The survey not only bolsters theoretical comprehension of community search algorithms in computational graph theory but also signifies practical implications across domains such as online social networking, bioinformatics (for protein complexes), and e-commerce targeting. Given its comprehensive focus, this paper likely serves as a foundational guide for subsequent research and technological innovation in community identification methodologies.

Overall, this survey paper holds considerable value for researchers engaged in community detection and graph processing, providing a detailed state-of-art encapsulation and setting a precedent for tackling complex, big graph datasets in the evolving landscape of network science.

PDF Markdown