Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VEC-SBM: Optimal Community Detection with Vectorial Edges Covariates (2402.18805v1)

Published 29 Feb 2024 in cs.SI and stat.ML

Abstract: Social networks are often associated with rich side information, such as texts and images. While numerous methods have been developed to identify communities from pairwise interactions, they usually ignore such side information. In this work, we study an extension of the Stochastic Block Model (SBM), a widely used statistical framework for community detection, that integrates vectorial edges covariates: the Vectorial Edges Covariates Stochastic Block Model (VEC-SBM). We propose a novel algorithm based on iterative refinement techniques and show that it optimally recovers the latent communities under the VEC-SBM. Furthermore, we rigorously assess the added value of leveraging edge's side information in the community detection process. We complement our theoretical results with numerical experiments on synthetic and semi-synthetic data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. E. Abbe. Community detection and stochastic block models. Foundations and Trends® in Communications and Information Theory, 14(1-2):1–162, 2018.
  2. An ℓpsubscriptℓ𝑝{\ell_{p}}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT theory of PCA and spectral clustering. The Annals of Statistics, 50(4):2359 – 2385, 2022.
  3. Learning latent block structure in weighted networks. Journal of Complex Networks, 3(2):221–248, 2014.
  4. Embedded topics in the stochastic block model. Statistics and Computing, 33(5):95, 2023a.
  5. The deep latent position topic model for clustering and representation of networks with textual edges, 2023b.
  6. The stochastic topic block model for the clustering of vertices in networks with textual edges. Statistics and Computing, 28(1):11–31, Oct 2016. ISSN 1573-1375.
  7. G. Braun. Strong consistency guarantees for clustering high-dimensional bipartite graphs with the spectral method, 2023.
  8. An iterative clustering algorithm for the contextual stochastic block model with optimality guarantees. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 2257–2291, 2022.
  9. A. Cerqueira and E. Levina. A pseudo-likelihood approach to community detection in weighted networks, 2023.
  10. X. Chen and A. Zhang. Optimal clustering in anisotropic gaussian mixture models. ArXiv, 2021.
  11. Contextual stochastic block models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  12. C. Gao and A. Y. Zhang. Iterative algorithm for discrete structure recovery. The Annals of Statistics, 50(2):1066 – 1094, 2022.
  13. Community detection in degree-corrected block models. The Annals of Statistics, 46(5):2153 – 2185, 2018.
  14. Exact Clustering in Tensor Block Model: Statistical Optimality and Computational Limit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(5):1666–1698, 2022.
  15. Stochastic blockmodels: First steps. Social Networks, 5(2):109 – 137, 1983.
  16. Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics, 49, 12 2021.
  17. J.-B. Léger. Blockmodels: A r-package for estimating in latent block model and stochastic block model, with various probability functions, with or without covariates. arXiv: Computation, 2016.
  18. J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1), Feb 2015.
  19. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007.
  20. Simple alternating minimization provably solves complete dictionary learning, 2022.
  21. Y. Lu and H. H. Zhou. Statistical and computational guarantees of lloyd’s algorithm and its variants. ArXiv, 2016.
  22. Recovering unbalanced communities in the stochastic block model with application to clustering with a faulty oracle. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  23. S. Paul and Y. Chen. Spectral and matrix factorization methods for consistent community detection in multi-layer networks. Ann. Statist., 48(1):230–250, 2020.
  24. M. Rudelson and R. Vershynin. Hanson-Wright inequality and sub-gaussian concentration. Electronic Communications in Probability, 18(none):1 – 9, 2013.
  25. Heterogeneous matrix factorization: When features differ by datasets, 2023.
  26. D. Stöger and M. Soltanolkotabi. Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. In Neural Information Processing Systems, 2021.
  27. Sample Efficient Linear Meta-Learning, 2021.
  28. Multilayer stochastic block models reveal the multilayer structure of complex networks. Phys. Rev. X, 6:011036, Mar 2016.
  29. Optimal rates for community estimation in the weighted stochastic block model. The Annals of Statistics, 2017.
  30. S.-Y. Yun and A. Proutiere. Optimal cluster recovery in the labeled stochastic block model. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
  31. Minimax rates of community detection in stochastic block models. The Annals of Statistics, 44(5):2252 – 2280, 2016.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets