Planted Bipartite Graph Detection (2302.03658v2)
Abstract: We consider the task of detecting a hidden bipartite subgraph in a given random graph. This is formulated as a hypothesis testing problem, under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi random graph over $n$ vertices with edge density $q$. Under the alternative, there exists a planted $k_{\mathsf{R}} \times k_{\mathsf{L}}$ bipartite subgraph with edge density $p>q$. We characterize the statistical and computational barriers for this problem. Specifically, we derive information-theoretic lower bounds, and design and analyze optimal algorithms matching those bounds, in both the dense regime, where $p,q = \Theta\left(1\right)$, and the sparse regime where $p,q = \Theta\left(n{-\alpha}\right), \alpha \in \left(0,2\right]$. We also consider the problem of testing in polynomial-time. As is customary in similar structured high-dimensional problems, our model undergoes an "easy-hard-impossible" phase transition and computational constraints penalize the statistical performance. To provide an evidence for this statistical computational gap, we prove computational lower bounds based on the low-degree conjecture, and show that the class of low-degree polynomials algorithms fail in the conjecturally hard region.
- T. Cai and Yihong Wu. Statistical and computational limits for sparse matrix detection. Annals of Statistics, 48, 01 2018.
- Mark Jerrum. Large cliques elude the metropolis process. Random Structures and Algorithms, 3(4):347–359, 1992.
- Ludek Kucera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57(2):193–212, 1995.
- Finding a large hidden clique in a random graph. Random Structures and Algorithms, 13(3-4):457–466, 1998.
- A nearly tight sum-of-squares lower bound for the planted clique problem. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), page 428–437, 2016.
- Finding hidden cliques in linear time with high probability. Probability and Computing, page 29–49, 2014.
- Statistical algorithms and a lower bound for detecting planted cliques. J. ACM, 64(2), 2014.
- Computational lower bounds for community detection on random graphs. In Proceedings of The 28th Conference on Learning Theory, 40:899–928, 2015.
- Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. Journal of Machine Learning Research, 17(27):1–57, 2016.
- Reducibility and computational lower bounds for problems with planted sparse structure. In Proceedings of the 31st Conference On Learning Theory, 75:48–166, 2016.
- Planting trees in graphs, and finding them back. In Proceedings of the 32nd Conference on Learning Theory, 99:2341–2371, 2019.
- On the evolution of random graphs. Publication of Mathematics Institute of Hung. Acad. Sci., 5:1761, 1960.
- Béla Bollobás. Threshold functions for small subgraphs. Mathematical Proceedings of the Cambridge Philosophical Society, 90(2):197–206, 1981.
- S. B. Hopkins and D. Steurer. Efficient bayesian estimation from few samples: Community detection and related problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390, 2017.
- Samuel Hopkins B. Statistical Inference and the Sum of Squares Method. PhD thesis, Cornell University, 2018.
- Computational Hardness of Certifying Bounds on Constrained PCA Problems. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020), volume 151, pages 78:1–78:29, 2020.
- Algorithms for heavy-tailed statistics: Regression, covariance estimation, and beyond. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 601–609, 2020.
- Low-degree hardness of random optimization problems. In 2020 IEEE 61th Annual Symposium on Foundations of Computer Science (FOCS), page 324–356, 2020.
- Wasim Huleihel. Inferring hidden structures in random graphs. IEEE Transactions on Signal and Information Processing over Networks, 8:855–867, 2022.
- Community detection in dense random networks. The Annals of Statistics, 42(3):940–969, 2014.
- Community detection in sparse random networks. The Annals of Applied Probability, 25(6):3465–3510, 2015.
- Complexity theoretic lower bounds for sparse principal component detection. In Proceedings of the 26th Annual Conference on Learning Theory, 30:1046–1066, 2013.
- Emmanuel Abbe. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res., 18:6446–6531, 2017.
- Statistical problems with planted structures: Information-theoretical and computational limits. ArXiv, abs/1806.00118, 2018.
- Planting trees in graphs, and finding them back. In Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pages 2341–2371, Jun. 2019.
- Hidden hamiltonian cycle recovery via linear programming. Operations Research, 68(1):53–70, 2020.
- Bipartite graphs and their applications. Cambridge University Press, 1998.
- Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience, 7:1–31, 2018.
- Complexity theoretic lower bounds for sparse principal component detection. In Proceedings of the 26th Annual Conference on Learning Theory, volume 30, pages 1046–1066, 12–14 Jun 2013.
- Computational barriers in minimax submatrix detection. Annals of Statistics, 43(3):1089–1116, 2015.
- Computational and statistical boundaries for submatrix localization in a large noisy matrix. Annals of Statistics, 45(4):1403–1430, 08 2017.
- Do semidefinite relaxations solve sparse pca up to the information limit? The Annals of Statistics, 43(3):1300–1322, 2015.
- Computational lower bounds for community detection on random graphs. In Proceedings of The 28th Conference on Learning Theory, volume 40, pages 899–928, 03–06 Jul 2015.
- Average-case hardness of rip certification. In Advances in Neural Information Processing Systems, pages 3819–3827, 2016.
- Statistical and computational trade-offs in estimation of sparse principal components. The Annals of Statistics, 44(5):1896–1930, 2016.
- Sparse CCA: Adaptive estimation and computational barriers. The Annals of Statistics, 45(5):2074–2101, 2017.
- Reducibility and computational lower bounds for problems with planted sparse structure. In Proceedings of the 31st Conference On Learning Theory, volume 75, pages 48–166, 06–09 Jul 2018.
- Universality of computational lower bounds for submatrix detection. In Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pages 417–468, 25–28 Jun 2019.
- Statistical problems with planted structures: Information-theoretical and computational limits. In Miguel R. D. Rodrigues and Yonina C. Eldar, editors, Information-Theoretic Methods in Data Science. Cambridge University Press, Cambridge, 2020.
- Reducibility and statistical-computational gaps from secret leakage. In Proceedings of Thirty Third Conference on Learning Theory, volume 125, pages 648–847, 09–12 Jul 2020.
- A nearly tight sum-of-squares lower bound for the planted clique problem. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 428–437, 2016.
- Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems. In Proceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages 523–562, Jul 2015.
- Sum-of-squares lower bounds for planted clique. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 87–96. ACM, 2015.
- Sum-of-squares lower bounds for sparse pca. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, page 1612–1620, 2015.
- Sum of squares lower bounds for refuting any csp. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, page 132–145. Association for Computing Machinery, 2017.
- On the integrality gap of degree-4 sum of squares for planted clique. ACM Trans. Algorithms, 14(3), 2018.
- High dimensional estimation via sum-of-squares proofs. In Proceedings of the International Congress of Mathematicians (ICM 2018), volume 4, pages 3389 – 3424, 2019.
- The power of sum-of-squares for detecting hidden structures. Proceedings of the fifty-eighth IEEE Foundations of Computer Science (FOCS), pages 720–731, 2017.
- Lifting sum-of-squares lower bounds: Degree-2 to degree-4. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 840–853, 2020.
- Statistical algorithms and a lower bound for detecting planted cliques. J. ACM, 64(2), April 2017.
- On the complexity of random satisfiability problems with planted solutions. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, page 77–86, 2015.
- Statistical query lower bounds for robust estimation of high-dimensional Gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73–84, 2017.
- Efficient algorithms and lower bounds for robust linear regression. In Society for Industrial and Applied Mathematics (SODA’19), page 2745–2754, 2019.
- Statistical physics of inference: thresholds and algorithms. Advances in Physics, 65(5):453–552, 2016.
- MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel. 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sep 2015.
- Phase transitions and optimal algorithms in high-dimensional gaussian mixture clustering. 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sep 2016.
- Gibbs states and the set of solutions of random constraint satisfaction problems. Proceedings of the National Academy of Sciences, 104(25):10318–10323, 2007.
- Typology of phase transitions in bayesian inference problems. Physical Review E, 99(4), Apr 2019.
- Notes on computational-to-statistical gaps: Predictions using statistical physics. Portugaliae Mathematica, 75(2):159–186, 2018.
- Minimax localization of structural information in large noisy matrices. In Advances in Neural Information Processing Systems, pages 909–917, 2011.
- Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli, 19(5B):2652–2688, 2013.
- Universality of computational lower bounds for submatrix detection. In Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 417–468. PMLR, 25–28 Jun 2019.
- Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. In Mathematical Analysis, its Applications and Computation, pages 1–50. Springer International Publishing, 2022.
- Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
- Computational barriers to estimation from low-degree polynomials. The Annals of Statistics, 50(3):1833 – 1858, 2022.