Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Combinatorial Correlation Clustering (2404.05433v2)

Published 8 Apr 2024 in cs.DS

Abstract: Correlation Clustering is a classic clustering objective arising in numerous machine learning and data mining applications. Given a graph $G=(V,E)$, the goal is to partition the vertex set into clusters so as to minimize the number of edges between clusters plus the number of edges missing within clusters. The problem is APX-hard and the best known polynomial time approximation factor is 1.73 by Cohen-Addad, Lee, Li, and Newman [FOCS'23]. They use an LP with $|V|{1/\epsilon{\Theta(1)}}$ variables for some small $\epsilon$. However, due to the practical relevance of correlation clustering, there has also been great interest in getting more efficient sequential and parallel algorithms. The classic combinatorial \emph{pivot} algorithm of Ailon, Charikar and Newman [JACM'08] provides a 3-approximation in linear time. Like most other algorithms discussed here, this uses randomization. Recently, Behnezhad, Charikar, Ma and Tan [FOCS'22] presented a $3+\epsilon$-approximate solution for solving problem in a constant number of rounds in the Massively Parallel Computation (MPC) setting. Very recently, Cao, Huang, Su [SODA'24] provided a 2.4-approximation in a polylogarithmic number of rounds in the MPC model and in $\tilde{O} (|E|{1.5})$ time in the classic sequential setting. They asked whether it is possible to get a better than 3-approximation in near-linear time? We resolve this problem with an efficient combinatorial algorithm providing a drastically better approximation factor. It achieves a $\sim 2-2/13 < 1.847$-approximation in sub-linear ($\tilde O(|V|)$) sequential time or in sub-linear ($\tilde O(|V|)$) space in the streaming setting. In the MPC model, we give an algorithm using only a constant number of rounds that achieves a $\sim 2-1/8 < 1.876$-approximation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Correlation clustering in data streams. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pages 2237–2246, 2015.
  2. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM, 55(5):1–27, 2008.
  3. Generating labels from clicks. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 172–181, 2009.
  4. Large-scale deduplication with constraints using dedupalog. In Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE), pages 952–963, 2009.
  5. Sublinear time and space algorithms for correlation clustering via sparse-dense decompositions. In Proceedings of the 13th Conference on Innovations in Theoretical Computer Science (ITCS), volume 215 of LIPIcs, pages 10:1–10:20, 2022.
  6. Correlation clustering. Machine learning, 56(1):89–113, 2004.
  7. Almost 3-approximate correlation clustering in constant rounds. In Proceedings of 63rd Annual IEEE Symposium on Foundations of Computer Science, (FOCS), pages 720–731, 2022.
  8. Single-pass streaming algorithms for correlation clustering. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 819–849, 2023.
  9. Single-pass streaming algorithms for correlation clustering. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 819–849, 2023.
  10. Fully dynamic maximal independent set with polylogarithmic update time. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 382–405, 2019.
  11. Differentially private correlation clustering. In International Conference on Machine Learning (ICML), pages 1136–1146, 2021.
  12. Overlapping correlation clustering. Knowledge and Information Systems, 35(1):1–32, 2013.
  13. Massively parallel correlation clustering in bounded arboricity graphs. In 35th International Symposium on Distributed Computing (DISC), volume 209 of LIPIcs, pages 15:1–15:18, 2021.
  14. Correlation clustering in Mapreduce. In Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 641–650, 2014.
  15. Fitting distances by tree metrics minimizing the total error within a constant factor. In Proceedings of 62nd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 468–479, 2021.
  16. Near-optimal correlation clustering with privacy. In Advances in Neural Information Processing Systems (Neurips), 2022.
  17. Fitting metrics and ultrametrics with minimum disagreements. In Proceedings of 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 301–311, 2022.
  18. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005.
  19. Breaking 3-factor approximation for correlation clustering in polylogarithmic rounds. In Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, 2024. preprint at arXiv:2307.06723.
  20. On the hardness of approximating multicut and sparsest-cut. Computational Complexity, 15(2):94–114, 2006.
  21. A (3+ε)3𝜀(3+\varepsilon)( 3 + italic_ε )-Approximate Correlation Clustering Algorithm in Dynamic Streams. In Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, 2024.
  22. A graph-theoretic approach to webpage segmentation. In Proceedings of the 17th International conference on World Wide Web (WWW), pages 377–386, 2008.
  23. Handling correlated rounding error via preclustering: A 1.73-approximation for correlation clustering. In Proceedings of 64th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2023+.
  24. Correlation clustering in constant many parallel rounds. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 2069–2078, 2021.
  25. Online and consistent correlation clustering. In Proceedings of International Conference on Machine Learning (ICML), pages 4157–4179, 2022.
  26. Correlation clustering with Sherali-Adams. In Proceedings of 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 651–661, 2022.
  27. Single-pass pivot algorithm for correlation clustering. keep it simple! In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  28. Single-pass pivot algorithm for correlation clustering. keep it simple! arXiv preprint arXiv:2305.13560, 2023.
  29. Near optimal LP rounding algorithm for correlation clustering on complete and complete k𝑘kitalic_k-partite graphs. In Proceedings of the 47th annual ACM Symposium on Theory of Computing (STOC), pages 219–228, 2015.
  30. Clustering sparse graphs. In Advances in Neural Information Processing Systems (Neurips), pages 2204–2212, 2012.
  31. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2-3):172–187, 2006.
  32. Correlation clustering with a fixed number of clusters. Theory of Computing, 2:249–266, 2006.
  33. Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, 20(11):1550–1565, 2008.
  34. Linear time approximation schemes for the Gale-Berlekamp game and related minimization problems. In Proceedings of the forty-first annual ACM symposium on Theory of computing (STOC), pages 313–322, 2009.
  35. Daogao Liu. Better private algorithms for correlation clustering. CoRR, arXiv abs/2202.10747, 2022.
  36. Robust online correlation clustering. In Advances in Neural Information Processing Systems (Neurips), pages 4688–4698, 2021.
  37. Online correlation clustering. In Proceedings of 27th International Symposium on Theoretical Aspects of Computer Science (STACS), pages 573–584, 2010.
  38. Parallel correlation clustering on big graphs. In Advances in Neural Information Processing Systems (Neurips), pages 82–90, 2015.
  39. Chaitanya Swamy. Correlation clustering: Maximizing agreements via semidefinite programming. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 526–527, 2004.
  40. Nate Veldt. Correlation clustering via strong triadic closure labeling: Fast approximation algorithms and practical lower bounds. In International Conference on Machine Learning (ICML), pages 22060–22083, 2022.
  41. A correlation clustering framework for community detection. In Proceedings of the 2018 ACM World Wide Web Conference (WWW), pages 439–448, 2018.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com