Community detection in the hypergraph stochastic block model and reconstruction on hypertrees (2402.06856v2)
Abstract: We study the weak recovery problem on the $r$-uniform hypergraph stochastic block model ($r$-HSBM) with two balanced communities. In this model, $n$ vertices are randomly divided into two communities, and size-$r$ hyperedges are added randomly depending on whether all vertices in the hyperedge are in the same community. The goal of weak recovery is to recover a non-trivial fraction of the communities given the hypergraph. Pal and Zhu (2021); Stephan and Zhu (2022) established that weak recovery is always possible above a natural threshold called the Kesten-Stigum (KS) threshold. For assortative models (i.e., monochromatic hyperedges are preferred), Gu and Polyanskiy (2023) proved that the KS threshold is tight if $r\le 4$ or the expected degree $d$ is small. For other cases, the tightness of the KS threshold remained open. In this paper we determine the tightness of the KS threshold for a wide range of parameters. We prove that for $r\le 6$ and $d$ large enough, the KS threshold is tight. This shows that there is no information-computation gap in this regime and partially confirms a conjecture of Angelini et al. (2015). On the other hand, we show that for $r\ge 5$, there exist parameters for which the KS threshold is not tight. In particular, for $r\ge 7$, the KS threshold is not tight if the model is disassortative (i.e., polychromatic hyperedges are preferred) or $d$ is large enough. This provides more evidence supporting the existence of an information-computation gap in these cases. Furthermore, we establish asymptotic bounds on the weak recovery threshold for fixed $r$ and large $d$. We also obtain a number of results regarding the broadcasting on hypertrees (BOHT) model, including the asymptotics of the reconstruction threshold for $r\ge 7$ and impossibility of robust reconstruction at criticality.
- Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
- Spectral detection on sparse hypergraphs. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 66–73. IEEE, 2015.
- Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the information-computation gap. arXiv preprint arXiv:1512.09080, 2015.
- The kesten-stigum reconstruction bound is tight for roughly symmetric binary channels. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 518–530. IEEE, 2006.
- Information-theoretic thresholds for community detection in sparse networks. In Conference on Learning Theory, pages 383–416. PMLR, 2016.
- On the purity of the limiting gibbs state for the ising model on the bethe lattice. Journal of Statistical Physics, 79:473–482, 1995.
- Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011.
- Broadcasting on trees and the ising model. Annals of Applied Probability, pages 410–433, 2000.
- Non-linear log-Sobolev inequalities for the Potts semigroup and applications to reconstruction problems. Communications in Mathematical Physics, 404(2):769–831, 2023.
- Weak recovery threshold for the hypergraph stochastic block model. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 885–920. PMLR, 12–15 Jul 2023.
- Yuzhou Gu. Channel Comparison Methods and Statistical Problems on Graphs. PhD thesis, Massachusetts Institute of Technology, 2023.
- Robust reconstruction on trees is determined by the second eigenvalue. The Annals of Probability, 32(3B):2630–2649, 2004.
- A symmetric entropy bound on the non-reconstruction regime of Markov chains on Galton-Watson trees. Electronic Communications in Probability, 14:587–596, 2009.
- Additional limit theorems for indecomposable multidimensional galton-watson processes. The Annals of Mathematical Statistics, 37(6):1463–1481, 1966.
- Large degree asymptotics and the reconstruction threshold of the asymmetric binary channels. Journal of Statistical Physics, 174:1161–1188, 2019.
- Laurent Massoulié. Community detection thresholds and the weak ramanujan property. In Proceedings of the forty-sixth annual ACM Symposium on Theory of Computing, pages 694–703, 2014.
- Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162:431–461, 2015.
- A proof of the block model threshold conjecture. Combinatorica, 38(3):665–708, 2018.
- Elchanan Mossel. Survey: Information flow on trees. arXiv preprint math/0406446, 2004.
- Exact phase transitions for stochastic block models and reconstruction on trees. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 96–102, 2023.
- Community detection in the sparse hypergraph stochastic block model. Random Structures & Algorithms, 59(3):407–463, 2021.
- Allan Sly. Reconstruction of random colourings. Communications in Mathematical Physics, 288(3):943–961, 2009.
- Allan Sly. Reconstruction for the Potts model. The Annals of Probability, 39(4):1365 – 1406, 2011.
- Reconstruction of colourings without freezing. arXiv preprint arXiv:1610.02770, 2016.
- Sparse random hypergraphs: Non-backtracking spectra and community detection. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 567–575. IEEE, 2022.
- Yumeng Zhang. Phase Transitions of Random Constraints Satisfaction Problem. University of California, Berkeley, 2017.
- Sparse hypergraph community detection thresholds in stochastic block model. In Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS), 2022.