Robust recovery for stochastic block models, simplified and generalized (2402.13921v1)
Abstract: We study the problem of $\textit{robust community recovery}$: efficiently recovering communities in sparse stochastic block models in the presence of adversarial corruptions. In the absence of adversarial corruptions, there are efficient algorithms when the $\textit{signal-to-noise ratio}$ exceeds the $\textit{Kesten--Stigum (KS) threshold}$, widely believed to be the computational threshold for this problem. The question we study is: does the computational threshold for robust community recovery also lie at the KS threshold? We answer this question affirmatively, providing an algorithm for robust community recovery for arbitrary stochastic block models on any constant number of communities, generalizing the work of Ding, d'Orsi, Nasser & Steurer on an efficient algorithm above the KS threshold in the case of $2$-community block models. There are three main ingredients to our work: (i) The Bethe Hessian of the graph is defined as $H_G(t) \triangleq (D_G-I)t2 - A_Gt + I$ where $D_G$ is the diagonal matrix of degrees and $A_G$ is the adjacency matrix. Empirical work suggested that the Bethe Hessian for the stochastic block model has outlier eigenvectors corresponding to the communities right above the Kesten-Stigum threshold. We formally confirm the existence of outlier eigenvalues for the Bethe Hessian, by explicitly constructing outlier eigenvectors from the community vectors. (ii) We develop an algorithm for a variant of robust PCA on sparse matrices. Specifically, an algorithm to partially recover top eigenspaces from adversarially corrupted sparse matrices under mild delocalization constraints. (iii) A rounding algorithm to turn vector assignments of vertices into a community assignment, inspired by the algorithm of Charikar & Wirth \cite{CW04} for $2$XOR.
- Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
- Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 670–688. IEEE, 2015.
- Hyman Bass. The Ihara-Selberg zeta function of a tree lattice. International Journal of Mathematics, 3(06):717–797, 1992.
- Algorithms approaching the threshold for semi-random planted clique. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1918–1926, 2023.
- Non-backtracking spectrum of random graphs: community detection and non-regular Ramanujan graphs. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1347–1357. IEEE, 2015.
- Local Statistics, Semidefinite Programming, and Community Detection. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1298–1316. SIAM, 2021.
- Coloring random and semi-random k-colorable graphs. Journal of Algorithms, 19(2):204–234, 1995.
- Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60, 2017.
- Maximizing quadratic programs: Extending grothendieck’s inequality. In 45th Annual IEEE Symposium on Foundations of Computer Science, pages 54–60. IEEE, 2004.
- Reaching kesten-stigum threshold in the stochastic block model under node corruptions. In The Thirty Sixth Annual Conference on Learning Theory, pages 4044–4071. PMLR, 2023.
- Robust recovery for stochastic block models. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 387–394. IEEE, 2022.
- The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, pages 233–240, 2006.
- Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical review E, 84(6):066106, 2011.
- Extremal cuts of sparse random graphs. The Annals of Probability, 45(2):1190–1217, 2017.
- Heuristics for semirandom graph problems. Journal of Computer and System Sciences, 63(4):639–671, 2001.
- How well do local algorithms solve semidefinite programs? In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 604–614, 2017.
- Efficient Algorithms for Semirandom Planted CSPs at the Refutation Threshold. arXiv preprint arXiv:2309.16897, 2023.
- Uniqueness of BP fixed point for the potts model and applications to community detection. arXiv preprint arXiv:2303.14688, 2023.
- Community detection in sparse networks via Grothendieck’s inequality. Probability Theory and Related Fields, 165(3-4):1025–1049, 2016.
- Efficient Bayesian estimation from few samples: community detection and related problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390. IEEE, 2017.
- Yasutaka Ihara. On discrete subgroups of the two by two projective linear group over p-adic fields. Journal of the Mathematical Society of Japan, 18(3):219–235, 1966.
- Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935–20940, 2013.
- Additional limit theorems for indecomposable multidimensional galton-watson processes. The Annals of Mathematical Statistics, 37(6):1463–1481, 1966.
- Limit theorems for decomposable multi-dimensional galton-watson processes. Journal of Mathematical Analysis and Applications, 17(2):309–338, 1967.
- Minimax rates for robust community detection. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 823–831. IEEE, 2022.
- Laurent Massoulié. Community detection thresholds and the weak ramanujan property. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 694–703, 2014.
- A new algorithm for the robust semi-random independent set problem. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 738–746. SIAM, 2020.
- Belief propagation, robust reconstruction and optimal recovery of block models. In Conference on Learning Theory, pages 356–370. PMLR, 2014.
- A proof of the block model threshold conjecture. Combinatorica, 38(3):665–708, 2018.
- How robust are reconstruction thresholds for community detection? In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 828–841, 2016.
- Semidefinite programs on sparse random graphs and their application to community detection. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 814–827, 2016.
- Spectral clustering of graphs with the bethe hessian. Advances in neural information processing systems, 27, 2014.
- Robustness of spectral methods for community detection. In Conference on Learning Theory, pages 2831–2860. PMLR, 2019.
- Ising model on locally tree-like graphs: Uniqueness of solutions to cavity equations. IEEE Transactions on Information Theory, 2023.