Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model (2306.12968v2)

Published 18 Jun 2023 in cs.SI, cs.LG, and stat.ML

Abstract: In this paper, we investigate the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters whose sizes grow linearly with the total number of nodes. We derive the necessary and sufficient conditions under which the expected number of misclassified nodes is less than $ s $, for any number $ s = o(n) $. To achieve this, we propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability. IAC is a novel two-phase algorithm that consists of a one-shot spectral clustering step followed by iterative likelihood-based cluster assignment improvements. This approach is based on the instance-specific lower bound and notably does not require any knowledge of the model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $ \mathcal{O}(n\, \text{polylog}(n)) $, making it scalable and practical for large-scale problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Emmanuel Abbe. Community detection and stochastic block models. Foundations and Trends in Communications and Information Theory, 14(1–2):1–162, 2018.
  2. Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms. In FOCS, 2015a.
  3. Recovering communities in the general stochastic block model without knowing the parameters. In NeurIPS, 2015b.
  4. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2016.
  5. Non-backtracking spectrum of random graphs: Community detection and non-regular ramanujan graphs. In FOCS, 2015.
  6. Sourav Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177–214, 2015.
  7. Amin Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing, 19(2):227–284, 2010.
  8. Inference and phase transitions in the detection of modules in sparse networks. Physical Review Letters, 107, 2011.
  9. Spectral techniques applied to sparse random graphs. Random Structures & Algorithms, 27(2):251–275, 2005.
  10. Achieving optimal misclassification proportion in stochastic block models. Journal of Machine Learning Research, 18(1):1980–2024, 2017.
  11. Achieving exact cluster recovery threshold via semidefinite programming. IEEE Transactions on Information Theory, 62(5):2788–2797, 2016.
  12. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
  13. Community detection in the labelled stochastic block model. arXiv preprint arXiv:1209.2910, 2012.
  14. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
  15. On the complexity of best arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17:1–42, 2016.
  16. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935–20940, 2013.
  17. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
  18. Concentration and regularization of random graphs. Random Structures & Algorithms, 51(3):538–561, 2017.
  19. Reconstruction in the labeled stochastic block model. In 2013 IEEE Information Theory Workshop, 2013.
  20. Torgny Lindvall. Lectures on the Coupling Method. Dover Books on Mathematics Series. Courier Corporation, 2002.
  21. Laurent Massoulié. Community detection thresholds and the weak ramanujan property. In STOC, 2013.
  22. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3):431–461, 2015a.
  23. Consistency thresholds for the planted bisection model. In STOC, 2015b.
  24. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69:026113, 2004.
  25. Terence Tao. Topics in random matrix theory, volume 132. American Mathematical Society, 2012.
  26. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer New York, NY, 2008.
  27. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2018.
  28. Optimal non-convex exact recovery in stochastic block model via projected power method. In ICML, 2021.
  29. Accurate community detection in the stochastic block model via spectral algorithms. arXiv preprint arXiv:1412.7335, 2014a.
  30. Community detection via random and adaptive sampling. In COLT, 2014b.
  31. Optimal cluster recovery in the labeled stochastic block model. In NeurIPS, 2016.
  32. Minimax rates of community detection in stochastic block models. The Annals of Statistics, 44(5):2252 – 2280, 2016.

Summary

We haven't generated a summary for this paper yet.