A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time (2310.17878v2)
Abstract: We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.
- Local graph partitioning using pagerank vectors. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 475–486. IEEE, 2006.
- Almost optimal local graph clustering using evolving sets. Journal of the ACM (JACM), 63(2):1–31, 2016.
- Finding sparse cuts locally using evolving sets. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 235–244, 2009.
- Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. The Annals of Statistics, 43(3):1027–1059, 2015.
- Clustering partially observed graphs via convex optimization. The Journal of Machine Learning Research, 15(1):2213–2238, 2014.
- Testing graph clusterability: Algorithms and lower bounds. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 497–508. IEEE, 2018.
- Fan RK Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997.
- Testing cluster structure of graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of Computing, pages 723–732, 2015.
- Spectral concentration and greedy k-clustering. Computational Geometry, 76:19–32, 2019.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226–231, 1996.
- Efficient graph-based image segmentation. International journal of computer vision, 59:167–181, 2004.
- Santo Fortunato. Community detection in graphs. Physics reports, 486(3-5):75–174, 2010.
- Tight error bounds for structured prediction. arXiv preprint arXiv:1409.5834, 2014.
- Spectral clustering oracles in sublinear time. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1598–1617. SIAM, 2021.
- Community detection in sparse networks via grothendieck’s inequality. Probability Theory and Related Fields, 165(3-4):1025–1049, 2016.
- Learning hierarchical structure of clusterable graphs. CoRR, abs/2207.02581, 2022.
- Learning hierarchical cluster structure of graphs in sublinear time. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 925–939. SIAM, 2023.
- Approximation algorithms for semi-random partitioning problems. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 367–384. ACM, 2012.
- Constant factor approximation for balanced cut in the pie model. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 41–49. ACM, 2014.
- Correlation clustering with noisy partial information. In Proceedings of The 28th Conference on Learning Theory, pages 1321–1342, 2015.
- Learning communities in the presence of errors. In Conference on Learning Theory, pages 1258–1291, 2016.
- Bogdan-Adrian Manghiuc and He Sun. Hierarchical clustering: o(1)𝑜1o(1)italic_o ( 1 )-approximation for well-clustered graphs. Advances in Neural Information Processing Systems, 34:9278–9289, 2021.
- Correlation clustering with noisy input. In Proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms, pages 712–728. Society for Industrial and Applied Mathematics, 2010.
- How robust are reconstruction thresholds for community detection? In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 828–841. ACM, 2016.
- Sublinear-time clustering oracle for signed graphs. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 16496–16528. PMLR, 2022.
- Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004.
- On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14, 2001.
- Flow-based algorithms for local graph clustering. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1267–1286. SIAM, 2014.
- Spectral clustering of protein sequences. Nucleic acids research, 34(5):1571–1580, 2006.
- Pan Peng. Robust clustering oracle and local reconstructor of cluster structure of graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2953–2972. SIAM, 2020.
- Average sensitivity of spectral clustering. In Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash, editors, KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pages 1132–1140. ACM, 2020.
- Communities in networks. Notices of the AMS, 56(9):1082–1097, 2009.
- A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing, 42(1):1–26, 2013.
- A local algorithm for finding well-connected clusters. In International Conference on Machine Learning, pages 396–404. PMLR, 2013.
- Ranran Shen (1 paper)
- Pan Peng (42 papers)