Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models (2402.15668v2)
Abstract: Given a graph with positive and negative edge labels, the correlation clustering problem aims to cluster the nodes so to minimize the total number of between-cluster positive and within-cluster negative edges. This problem has many applications in data mining, particularly in unsupervised learning. Inspired by the prevalence of large graphs and constantly changing data in modern applications, we study correlation clustering in dynamic, parallel (MPC), and local computation (LCA) settings. We design an approach that improves state-of-the-art runtime complexities in all these settings. In particular, we provide the first fully dynamic algorithm that runs in an expected amortized constant time, without any dependence on the graph size. Moreover, our algorithm essentially matches the approximation guarantee of the celebrated Pivot algorithm.
- Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):1–27, 2008.
- Large-scale deduplication with constraints using dedupalog. In 2009 IEEE 25th International Conference on Data Engineering, pp. 952–963. IEEE, 2009.
- Correlation clustering. Machine learning, 56:89–113, 2004.
- Becker, H. A survey of correlation clustering. Advanced Topics in Computational Learning Theory, pp. 1–10, 2005.
- Fully dynamic maximal independent set with polylogarithmic update time. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pp. 382–405. IEEE, 2019.
- Almost 3-approximate correlation clustering in constant rounds. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 720–731. IEEE, 2022.
- Single-pass streaming algorithms for correlation clustering. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 819–849. SIAM, 2023.
- Overlapping correlation clustering. Knowledge and information systems, 35:1–32, 2013.
- A parallel algorithm for (3+ε)3𝜀(3+\varepsilon)( 3 + italic_ε )-approximate correlation clustering. arXiv preprint arXiv:2205.07593, 2022.
- Single-Pass Pivot Algorithm for Correlation Clustering. Keep it simple! In NeurIPS 2023, 2023.
- Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005.
- Near optimal lp rounding algorithm for correlation clustering on complete and complete k-partite graphs. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pp. 219–228, 2015.
- Fully dynamic maximal independent set in expected poly-log update time. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pp. 370–381. IEEE, 2019.
- Correlation clustering with sherali-adams. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 651–661. IEEE, 2022.
- Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. arXiv preprint arXiv:2309.17243, 2023.
- Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
- Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2-3):172–187, 2006.
- User-centered recommendation using us-elm based on dynamic graph model in e-commerce. International Journal of Machine Learning and Cybernetics, 10:693–703, 2019.
- Dynamic knowledge graph based fake-review detection. Applied Intelligence, 50:4281–4295, 2020.
- Learning to discover objects in rgb-d images using correlation clustering. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1107–1112. IEEE, 2013.
- Sorting, searching, and simulation in the mapreduce framework. In International Symposium on Algorithms and Computation, pp. 374–383. Springer, 2011.
- Influential nodes detection in dynamic social networks: A survey. Expert Systems with Applications, 159:113642, 2020.
- Local correlation clustering with asymmetric classification errors. In International Conference on Machine Learning, pp. 4677–4686. PMLR, 2021.
- Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, 20(11):1550–1565, 2008.
- A model of computation for mapreduce. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pp. 938–948. SIAM, 2010.
- Brief announcement: Exponential speed-up of local algorithms using non-local communication. In Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, pp. 295–296, 2010.
- Motif clustering and overlapping clustering for social network analysis. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1–9. IEEE, 2017.
- Fast local computation algorithms. arXiv preprint arXiv:1104.1377, 2011.
- Scalable community detection via parallel correlation clustering. Proceedings of the VLDB Endowment, 14(11):2305–2313, 2021.
- Finding communities in dynamic social networks. In 2011 IEEE 11th international conference on data mining, pp. 1236–1241. IEEE, 2011.
- Parameterized correlation clustering in hypergraphs and bipartite graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1868–1876, 2020.
- A scalable approach for general correlation clustering. In Advanced Data Mining and Applications: 9th International Conference, ADMA 2013, Hangzhou, China, December 14-16, 2013, Proceedings, Part II 9, pp. 13–24. Springer, 2013.
- Dynamic knowledge graph alignment. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 4564–4572, 2021.
- An improved constant-time approximation algorithm for maximum~ matchings. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 225–234, 2009.