Densest Subhypergraph: Negative Supermodular Functions and Strongly Localized Methods (2310.13792v2)
Abstract: Dense subgraph discovery is a fundamental primitive in graph and hypergraph analysis which among other applications has been used for real-time story detection on social media and improving access to data stores of social networking systems. We present several contributions for localized densest subgraph discovery, which seeks dense subgraphs located nearby given seed sets of nodes. We first introduce a generalization of a recent $\textit{anchored densest subgraph}$ problem, extending this previous objective to hypergraphs and also adding a tunable locality parameter that controls the extent to which the output set overlaps with seed nodes. Our primary technical contribution is to prove when it is possible to obtain a strongly-local algorithm for solving this problem, meaning that the runtime depends only on the size of the input set. We provide a strongly-local algorithm that applies whenever the locality parameter is not too small, and show via counterexample why strongly-local algorithms are impossible below a certain threshold. Along the way to proving our results for localized densest subgraph discovery, we also provide several advances in solving global dense subgraph discovery objectives. This includes the first strongly polynomial time algorithm for the densest supermodular set problem and a flow-based exact algorithm for a heavy and dense subgraph discovery problem in graphs with arbitrary node weights. We demonstrate our algorithms on several web-based data analysis tasks.
- Leman Akoglu and Christos Faloutsos. 2009. RTG: a recursive realistic graph generator using random typing. Data Mining and Knowledge Discovery 19, 2 (July 2009), 194–209. https://doi.org/10.1007/s10618-009-0140-7
- Clustering in Graphs and Hypergraphs with Categorical Edge Labels. In Proceedings of The Web Conference 2020. 706–717. https://doi.org/10.1145/3366423.3380152 arXiv:1910.09943 [physics, stat]
- Local Graph Partitioning using PageRank Vectors. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’06).
- Reid Andersen and Kevin J Lang. 2006. Communities from seed sets. In Proceedings of the 15th international conference on World Wide Web. 223–232.
- Reid Andersen and Kevin J Lang. 2008. An algorithm for improving graph partitions.. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA ’08, Vol. 8). 651–660.
- Dense subgraph maintenance under streaming edge weight updates for real-time story identification. The VLDB journal 23 (2014), 175–199.
- Simplicial closure and higher-order link prediction. Proceedings of the National Academy of Sciences (2018). https://doi.org/10.1073/pnas.1800683115
- The maximum clique problem. Handbook of Combinatorial Optimization: Supplement Volume A (1999), 1–74.
- Flowless: Extracting Densest Subgraphs Without Flow Computations. In Proceedings of The Web Conference 2020 (WWW ’20). Association for Computing Machinery, New York, NY, USA, 573–583. https://doi.org/10.1145/3366423.3380140
- Moses Charikar. 2000. Greedy Approximation Algorithms for Finding Dense Components in a Graph. In Approximation Algorithms for Combinatorial Optimization (Lecture Notes in Computer Science), Klaus Jansen and Samir Khuller (Eds.). Springer, 84–95. https://doi.org/10.1007/3-540-44436-X_10
- Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics, 1531–1555. https://doi.org/10.1137/1.9781611977073.64
- Boris V. Cherkassky and Andrew V. Goldberg. 1995. On Implementing Push-Relabel Method for the Maximum Flow Problem. In Integer Programming and Combinatorial Optimization, Gerhard Goos, Juris Hartmanis, Jan Leeuwen, Egon Balas, and Jens Clausen (Eds.). Vol. 920. Springer Berlin Heidelberg, Berlin, Heidelberg, 157–171. https://doi.org/10.1007/3-540-59408-6_49
- Philip S Chodrow. 2020. Configuration models of random hypergraphs. Journal of Complex Networks 8, 3 (June 2020). https://doi.org/10.1093/comnet/cnaa018
- Generative Hypergraph Clustering: From Blockmodels to Modularity. Science Advances 7, 28 (July 2021), eabh1303. https://doi.org/10.1126/sciadv.abh1303
- Anchored Densest Subgraph. In Proceedings of the 2022 International Conference on Management of Data. ACM, Philadelphia PA USA, 1200–1213. https://doi.org/10.1145/3514221.3517890
- Werner Dinkelbach. 1967. On nonlinear fractional programming. Management science 13, 7 (1967), 492–498.
- Discovering Polarization Niches via Dense Subgraphs with Attractors and Repulsers. Proceedings of the VLDB Endowment 15, 13 (Sept. 2022), 3883–3896. https://doi.org/10.14778/3565838.3565843
- Efficient identification of web communities. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 150–160.
- Flow-based algorithms for improving clusters: A unifying framework, software, and performance. SIAM Rev. 65, 1 (2023), 59–143.
- Debarghya Ghoshdastidar and Ambedkar Dukkipati. 2014. Consistency of Spectral Partitioning of Uniform Hypergraphs under Planted Partition Model. In Advances in Neural Information Processing Systems, Vol. 27. Curran Associates, Inc.
- Piggybacking on social networks. In VLDB 2013-39th International Conference on Very Large Databases, Vol. 6. 409–420.
- A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report. University of California at Berkeley, USA.
- Functional Ball Dropping: A superfast hypergraph generation scheme. In 2022 IEEE International Conference on Big Data (Big Data). IEEE. https://doi.org/10.1109/bigdata55660.2022.10020506
- Faster and Scalable Algorithms for Densest Subgraph and Decomposition. Advances in Neural Information Processing Systems 35 (Dec. 2022), 26966–26979.
- Maintaining Densest Subsets Efficiently in Evolving Hypergraphs. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, Singapore Singapore, 929–938. https://doi.org/10.1145/3132847.3132907
- Rania Ibrahim and David F Gleich. 2020. Local hypergraph clustering using capacity releasing diffusion. Plos one 15, 12 (2020), e0243485.
- Kyle Kloster and David F Gleich. 2014. Heat kernel based community detection. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). 1386–1395.
- A Survey on the Densest Subgraph Problem and Its Variants. https://doi.org/10.48550/arXiv.2303.14467 arXiv:2303.14467 [cs]
- Kevin Lang and Satish Rao. 2004. A Flow-Based Method for Improving the Expansion or Conductance of Graph Cuts. In Conference on Integer Programming and Combinatorial Optimization (IPCO ’04). 325–337. https://doi.org/10.1007/978-3-540-25960-2_25
- Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
- Pan Li and Olgica Milenkovic. 2017. Inhomogeneous hypergraph clustering with applications. Advances in neural information processing systems 30 (2017).
- Strongly local hypergraph diffusions for clustering and semi-supervised learning. In Proceedings of the Web Conference 2021. 2092–2103.
- David W Matula and Farhad Shahrokhi. 1990. Sparsest cuts and bottlenecks in graphs. Discrete Applied Mathematics 27, 1-2 (1990), 113–123.
- Atsushi Miyauchi and Naonori Kakimura. 2018. Finding a Dense Subgraph with Sparse Cut. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM ’18). Association for Computing Machinery, New York, NY, USA, 547–556. https://doi.org/10.1145/3269206.3271720
- Justifying Recommendations Using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 188–197. https://doi.org/10.18653/v1/D19-1018
- Lorenzo Orecchia and Zeyuan Allen Zhu. 2014. Flow-based algorithms for local graph clustering. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms (SODA ’14). SIAM, 1267–1286.
- James B. Orlin. 2009. A Faster Strongly Polynomial Time Algorithm for Submodular Function Minimization. Mathematical Programming 118, 2 (May 2009), 237–251. https://doi.org/10.1007/s10107-007-0189-2
- Mauro Sozio and Aristides Gionis. 2010. The Community-Search Problem and How to Plan a Successful Cocktail Party. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10). Association for Computing Machinery, New York, NY, USA, 939–948. https://doi.org/10.1145/1835804.1835923
- Mechthild Stoer and Frank Wagner. 1997. A simple min-cut algorithm. Journal of the ACM (JACM) 44, 4 (1997), 585–591.
- Charalampos Tsourakakis. 2015. The K-clique Densest Subgraph Problem. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1122–1132. https://doi.org/10.1145/2736277.2741098
- Charalampos Tsourakakis and Tianyi Chen. 2021. Dense Subgraph Discovery: Theory and Applications (Tutorial SDM 2021). https://tsourakakis.com/dense-subgraph-discovery-theory-and-applications-tutorial-sdm-2021/
- Hypergraph Cuts with General Splitting Functions. arXiv:2001.02817 [cs]
- Minimizing Localized Ratio Cut Objectives in Hypergraphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1708–1718.
- The Generalized Mean Densest Subgraph Problem. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD ’21). Association for Computing Machinery, New York, NY, USA, 1604–1614. https://doi.org/10.1145/3447548.3467398
- A correlation clustering framework for community detection. In Proceedings of the 2018 World Wide Web Conference. 439–448.
- Flow-Based Local Graph Clustering with Better Seed Set Inclusion. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM ’19).
- Capacity releasing diffusion for speed and locality. In International Conference on Machine Learning. PMLR, 3598–3607.