Neural Attributed Community Search at Billion Scale (2403.18874v1)
Abstract: Community search has been extensively studied in the past decades. In recent years, there is a growing interest in attributed community search that aims to identify a community based on both the query nodes and query attributes. A set of techniques have been investigated. Though the recent methods based on advanced learning models such as graph neural networks (GNNs) can achieve state-of-the-art performance in terms of accuracy, we notice that 1) they suffer from severe efficiency issues; 2) they directly model community search as a node classification problem and thus cannot make good use of interdependence among different entities in the graph. Motivated by these, in this paper, we propose a new neurAL attrIbuted Community sEarch model for large-scale graphs, termed ALICE. ALICE first extracts a candidate subgraph to reduce the search scope and subsequently predicts the community by the Consistency-aware Net , termed ConNet. Specifically, in the extraction phase, we introduce the density sketch modularity that uses a unified form to combine the strengths of two existing powerful modularities, i.e., classical modularity and density modularity. Based on the new modularity metric, we first adaptively obtain the candidate subgraph, formed by the k-hop neighbors of the query nodes, with the maximum modularity. Then, we construct a node-attribute bipartite graph to take attributes into consideration. After that, ConNet adopts a cross-attention encoder to encode the interaction between the query and the graph. The training of the model is guided by the structure-attribute consistency and the local consistency to achieve better performance. Extensive experiments over 11 real-world datasets including one billion-scale graph demonstrate the superiority of ALICE in terms of accuracy, efficiency, and scalability.
- Martín Arjovsky and Léon Bottou. 2017. Towards Principled Methods for Training Generative Adversarial Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Hk4_qw5xe
- Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.
- arXiv.org submitters. 2023. arXiv Dataset. https://doi.org/10.34740/KAGGLE/DSV/6101996
- Michael J Barber. 2007. Modularity and community detection in bipartite networks. Physical Review E 76, 6 (2007), 066102.
- Sourav S Bhowmick and Boon Siew Seah. 2015. Clustering and summarizing protein-protein interaction networks: A survey. IEEE Transactions on Knowledge and Data Engineering 28, 3 (2015), 638–658.
- Albrecht Böttcher and David Wenzel. 2008. The Frobenius norm and the commutator. Linear algebra and its applications 429, 8-9 (2008), 1864–1885.
- Index-based optimal algorithms for computing steiner components with maximum connectivity. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 459–474.
- Learning on attribute-missing graphs. IEEE transactions on pattern analysis and machine intelligence 44, 2 (2020), 740–757.
- Knowledge graph-based event embedding framework for financial quantitative investments. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2221–2230.
- DMCS : Density Modularity based Community Search. In SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 889–903.
- How Powerful are Graph Neural Networks?. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Online search of overlapping communities. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data. 277–288.
- Local search of communities in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 991–1002.
- Graph neural networks for social recommendation. In The world wide web conference. 417–426.
- Effective community search for large attributed graphs. Proceedings of the VLDB Endowment 9, 12 (2016), 1233–1244.
- A survey of community search over big graphs. The VLDB Journal 29 (2020), 353–392.
- Santo Fortunato and Marc Barthelemy. 2007. Resolution limit in community detection. Proceedings of the national academy of sciences 104, 1 (2007), 36–41.
- ICS-GNN: lightweight interactive community search via graph neural network. Proceedings of the VLDB Endowment 14, 6 (2021), 1006–1018.
- Unsupervised graph alignment with wasserstein distance discriminator. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 426–435.
- Kantorovich duality for general transport costs and applications. Journal of Functional Analysis 273, 11 (2017), 3327–3405.
- Resolution limit revisited: community detection using generalized modularity density. Journal of Physics: Complexity 4, 2 (2023), 025001.
- Graph representation learning for single-cell biology. Current Opinion in Systems Biology 28 (2021), 100347.
- Querying minimal steiner maximum-connected subgraphs in large graphs. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1241–1250.
- Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1311–1322.
- Xin Huang and Laks VS Lakshmanan. 2017. Attribute-driven community search. Proceedings of the VLDB Endowment 10, 9 (2017), 949–960.
- Community search over big graphs. Synthesis Lectures on Data Management 14, 6 (2019), 1–206.
- Approximate Closest Community Search in Networks. Proc. VLDB Endow. 9, 4 (2015), 276–287. https://doi.org/10.14778/2856318.2856323
- Query driven-graph neural networks for community search: from non-attributed, attributed, to interactive attributed. Proceedings of the VLDB Endowment 15, 6 (2022), 1243–1255.
- ABC: attributed bipartite co-clustering. Proceedings of the VLDB Endowment 15, 10 (2022), 2134–2147.
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl
- Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79–86.
- Christopher Manning and Hinrich Schutze. 1999. Foundations of statistical natural language processing. MIT press.
- Provably powerful graph networks. Advances in neural information processing systems 32 (2019).
- Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013).
- Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.
- Random graph models of social networks. Proceedings of the national academy of sciences 99, suppl_1 (2002), 2566–2572.
- Attributed graph models: Modeling network structure with correlated attributes. In Proceedings of the 23rd international conference on World wide web. 831–842.
- Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Link Prediction with Non-Contrastive Learning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=9Jaz4APHtWD
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Cédric Villani et al. 2009. Optimal transport: old and new. Vol. 338. Springer.
- Neural Subgraph Counting with Wasserstein Estimator. In Proceedings of the 2022 International Conference on Management of Data. 160–175.
- Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022. IEEE, 245–258.
- Efficient bitruss decomposition for large-scale bipartite graphs. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 661–672.
- Efficient personalized maximum biclique search. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 498–511.
- Cohesive Subgraph Discovery over Uncertain Bipartite Graphs. IEEE Transactions on Knowledge and Data Engineering (2023).
- A survey of typical attributed graph queries. World Wide Web 24 (2021), 297–346.
- Towards efficient shortest path counting on billion-scale graphs. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2579–2592.
- Robust local community detection: on free rider effect and its elimination. Proceedings of the VLDB Endowment 8, 7 (2015), 798–809.
- Temporal and Heterogeneous Graph Neural Network for Financial Time Series Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3584–3593.
- Exploiting the Community Structure of Fraudulent Keywords for Fraud Detection in Web Search. Journal of Computer Science and Technology 36 (2021), 1167–1183.
- Index-based densest clique percolation community search in networks. IEEE Transactions on Knowledge and Data Engineering 30, 5 (2017), 922–935.
- Finding critical users in social communities: The collapsed core and truss problems. IEEE Transactions on Knowledge and Data Engineering 32, 1 (2018), 78–91.
- When engagement meets similarity: efficient (k, r)-core computation on social networks. arXiv preprint arXiv:1611.03254 (2016).
- Graph neural networks and their current applications in bioinformatics. Frontiers in genetics 12 (2021), 690049.